-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When the network configuration on a node changes (e.g., we update a switch port configuration to place the agent on a different network), the network information for the agent does not update. Logging in as the `core` user on the node, it is still running the discovery environment, and we can see what appear to be regular reports that contain the correct network information. Here is an example log from `journalctl`:
Apr 01 17:42:11 host-10-233-1-70 next_step_runne[594509]: time="01-04-2025 17:42:11" level=info msg="Sending step <inventory-b74e056a> reply output <{\"bmc_address\":\"10.2.14.165\",\"bmc_v6address\":\"::/0\",\"boot\":{\"command_line\":\"BOOT_IMAGE=/images/pxeboot/vmlinuz initrd=/images/pxeboot/initrd.img,/images/ignition.img,/images/assisted_installer_custom.img,/images/nmstate.img rw ignition.firstboot ignition.platform.id=metal coreos.live.rootfs_url=https://assisted-image-service-multicluster-engine.apps.hypershift1.nerc.mghpcc.org/boot-artifacts/rootfs?arch=x86_64\\u0026version=4.18\\n\",\"current_boot_mode\":\"bios\"},\"cpu\":{\"architecture\":\"x86_64\",\"count\":40,\"flags\":[\"fpu\",\"vme\",\"de\",\"pse\",\"tsc\",\"msr\",\"pae\",\"mce\",\"cx8\",\"apic\",\"sep\",\"mtrr\",\"pge\",\"mca\",\"cmov\",\"pat\",\"pse36\",\"clflush\",\"dts\",\"acpi\",\"mmx\",\"fxsr\",\"sse\",\"sse2\",\"ss\",\"ht\",\"tm\",\"pbe\",\"syscall\",\"nx\",\"pdpe1gb\",\"rdtscp\",\"lm\",\"constant_tsc\",\"arch_perfmon\",\"pebs\",\"bts\",\"rep_good\",\"nopl\",\"xtopology\",\"nonstop_tsc\",\"cpuid\",\"aperfmperf\",\"pni\",\"pclmulqdq\",\"dtes64\",\"monitor\",\"ds_cpl\",\"vmx\",\"smx\",\"est\",\"tm2\",\"ssse3\",\"sdbg\",\"fma\",\"cx16\",\"xtpr\",\"pdcm\",\"pcid\",\"dca\",\"sse4_1\",\"sse4_2\",\"x2apic\",\"movbe\",\"popcnt\",\"tsc_deadline_timer\",\"aes\",\"xsave\",\"avx\",\"f16c\",\"rdrand\",\"lahf_lm\",\"abm\",\"3dnowprefetch\",\"cpuid_fault\",\"epb\",\"cat_l3\",\"cdp_l3\",\"pti\",\"ssbd\",\"ibrs\",\"ibpb\",\"stibp\",\"tpr_shadow\",\"flexpriority\",\"ept\",\"vpid\",\"ept_ad\",\"fsgsbase\",\"tsc_adjust\",\"bmi1\",\"hle\",\"avx2\",\"smep\",\"bmi2\",\"erms\",\"invpcid\",\"rtm\",\"cqm\",\"rdt_a\",\"rdseed\",\"adx\",\"smap\",\"intel_pt\",\"xsaveopt\",\"cqm_llc\",\"cqm_occup_llc\",\"cqm_mbm_total\",\"cqm_mbm_local\",\"dtherm\",\"ida\",\"arat\",\"pln\",\"pts\",\"vnmi\",\"md_clear\",\"flush_l1d\"],\"frequency\":3400,\"model_name\":\"Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz\"},\"disks\":[{\"by_id\":\"/dev/disk/by-id/wwn-0x500080d910edcd06\",\"by_path\":\"/dev/disk/by-path/pci-0000:00:11.4-ata-1.0\",\"drive_type\":\"SSD\",\"has_uuid\":true,\"hctl\":\"0:0:0:0\",\"id\":\"/dev/disk/by-id/wwn-0x500080d910edcd06\",\"installation_eligibility\":{\"eligible\":true,\"not_eligible_reasons\":null},\"model\":\"THNSF8200CAME\",\"name\":\"sda\",\"path\":\"/dev/sda\",\"serial\":\"184S107ITCRT\",\"size_bytes\":200049647616,\"vendor\":\"ATA\",\"wwn\":\"0x500080d910edcd06\"},{\"by_id\":\"/dev/disk/by-id/wwn-0x500080d910edcdaf\",\"by_path\":\"/dev/disk/by-path/pci-0000:00:11.4-ata-2.0\",\"drive_type\":\"SSD\",\"has_uuid\":true,\"hctl\":\"1:0:0:0\",\"id\":\"/dev/disk/by-id/wwn-0x500080d910edcdaf\",\"installation_eligibility\":{\"eligible\":true,\"not_eligible_reasons\":null},\"model\":\"THNSF8200CAME\",\"name\":\"sdb\",\"path\":\"/dev/sdb\",\"serial\":\"184S105OTCRT\",\"size_bytes\":200049647616,\"vendor\":\"ATA\",\"wwn\":\"0x500080d910edcdaf\"}],\"gpus\":[{\"address\":\"0000:0a:00.0\",\"device_id\":\"0534\",\"name\":\"G200eR2\",\"vendor\":\"Matrox Electronics Systems Ltd.\",\"vendor_id\":\"102b\"}],\"hostname\":\"host-10-233-1-70\",\"interfaces\":[{\"flags\":[\"up\",\"loopback\",\"running\"],\"has_carrier\":true,\"ipv4_addresses\":[\"127.0.0.1/8\"],\"ipv6_addresses\":[\"::1/128\"],\"mtu\":65536,\"name\":\"lo\",\"type\":\"device\"},{\"biosdevname\":\"em1\",\"flags\":[\"up\",\"broadcast\",\"multicast\",\"running\"],\"has_carrier\":true,\"ipv4_addresses\":[\"10.233.1.70/20\"],\"ipv6_addresses\":[],\"mac_address\":\"18:db:f2:a4:8d:1b\",\"mtu\":1500,\"name\":\"eno1\",\"product\":\"0x168e\",\"speed_mbps\":10000,\"type\":\"physical\",\"vendor\":\"0x14e4\"},{\"biosdevname\":\"em2\",\"flags\":[\"up\",\"broadcast\",\"multicast\"],\"ipv4_addresses\":[],\"ipv6_addresses\":[],\"mac_address\":\"18:db:f2:a4:8d:1e\",\"mtu\":1500,\"name\":\"eno2\",\"product\":\"0x168e\",\"speed_mbps\":-1,\"type\":\"physical\",\"vendor\":\"0x14e4\"}],\"memory\":{\"physical_bytes\":137438953472,\"physical_bytes_method\":\"dmidecode\",\"usable_bytes\":135066259456},\"routes\":[{\"destination\":\"0.0.0.0\",\"family\":2,\"gateway\":\"10.233.0.1\",\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"10.233.0.0\",\"family\":2,\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"169.254.169.254\",\"family\":2,\"gateway\":\"10.233.0.52\",\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"::1\",\"family\":10,\"interface\":\"lo\",\"metric\":256},{\"destination\":\"fe80::\",\"family\":10,\"interface\":\"eno1\",\"metric\":1024}],\"system_vendor\":{\"manufacturer\":\"Dell Inc.\",\"product_name\":\"PowerEdge FC430\",\"serial_number\":\"H9R11Q2\"},\"tpm_version\":\"none\"}> error <> exit-code <0>" file="step_processor.go:76" request_id=30fcf824-1d5a-499d-b8cf-f546c7466af7
The `10.233.1.70` address shown here is correct, but looking at the associated agent, we see the incorrect (previous) address:
$ kubectl -n hardware-inventory get agent a12466ac-cfe9-345e-43b4-83bbd32dbce5 -o jsonpath='{.status.inventory.interfaces[0].ipV4Addresses}'; echo
["10.117.0.179/24"]
What's worse is that this behavior seems intermittent – sometimes it successfully updates the agent network information.
Version-Release number of selected component (if applicable):
We are running ACM with Hosted Control Planes on OpenShift 4.18. We are using:
$ kubectl -n open-cluster-management get csv NAME DISPLAY VERSION REPLACES PHASE aap-operator.v2.5.0-0.1742434756 Ansible Automation Platform 2.5.0+0.1742434756 aap-operator.v2.5.0-0.1741369251 Succeeded advanced-cluster-management.v2.12.2 Advanced Cluster Management for Kubernetes 2.12.2 advanced-cluster-management.v2.12.1 Succeeded cert-manager.v1.16.1 cert-manager 1.16.1 cert-manager.v1.15.2 Succeeded elasticsearch-operator.v5.8.18 OpenShift Elasticsearch Operator 5.8.18 elasticsearch-operator.v5.8.17 Succeeded external-secrets-operator.v0.11.0 External Secrets Operator 0.11.0 external-secrets-operator.v0.10.7 Succeeded metallb-operator.v4.18.0-202503181802 MetalLB Operator 4.18.0-202503181802 metallb-operator.v4.18.0-202503110933 Succeeded openshift-gitops-operator.v1.15.1 Red Hat OpenShift GitOps 1.15.1 openshift-gitops-operator.v1.15.0-0.1738074324.p Succeeded
Additional info:
We can manually refresh the agent by restarting agent.service on the node:
# systemctl restart agent.service
But we would like this to update automatically and predictably.