Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54484

agent-installer agents do not update network information when network configuration changes on host

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When the network configuration on a node changes (e.g., we update a switch port configuration to place the agent on a different network), the network information for the agent does not update. Logging in as the `core` user on the node, it is still running the discovery environment, and we can see what appear to be regular reports that contain the correct network information. Here is an example log from `journalctl`:

      Apr 01 17:42:11 host-10-233-1-70 next_step_runne[594509]: time="01-04-2025 17:42:11" level=info msg="Sending step <inventory-b74e056a> reply output <{\"bmc_address\":\"10.2.14.165\",\"bmc_v6address\":\"::/0\",\"boot\":{\"command_line\":\"BOOT_IMAGE=/images/pxeboot/vmlinuz initrd=/images/pxeboot/initrd.img,/images/ignition.img,/images/assisted_installer_custom.img,/images/nmstate.img rw ignition.firstboot ignition.platform.id=metal coreos.live.rootfs_url=https://assisted-image-service-multicluster-engine.apps.hypershift1.nerc.mghpcc.org/boot-artifacts/rootfs?arch=x86_64\\u0026version=4.18\\n\",\"current_boot_mode\":\"bios\"},\"cpu\":{\"architecture\":\"x86_64\",\"count\":40,\"flags\":[\"fpu\",\"vme\",\"de\",\"pse\",\"tsc\",\"msr\",\"pae\",\"mce\",\"cx8\",\"apic\",\"sep\",\"mtrr\",\"pge\",\"mca\",\"cmov\",\"pat\",\"pse36\",\"clflush\",\"dts\",\"acpi\",\"mmx\",\"fxsr\",\"sse\",\"sse2\",\"ss\",\"ht\",\"tm\",\"pbe\",\"syscall\",\"nx\",\"pdpe1gb\",\"rdtscp\",\"lm\",\"constant_tsc\",\"arch_perfmon\",\"pebs\",\"bts\",\"rep_good\",\"nopl\",\"xtopology\",\"nonstop_tsc\",\"cpuid\",\"aperfmperf\",\"pni\",\"pclmulqdq\",\"dtes64\",\"monitor\",\"ds_cpl\",\"vmx\",\"smx\",\"est\",\"tm2\",\"ssse3\",\"sdbg\",\"fma\",\"cx16\",\"xtpr\",\"pdcm\",\"pcid\",\"dca\",\"sse4_1\",\"sse4_2\",\"x2apic\",\"movbe\",\"popcnt\",\"tsc_deadline_timer\",\"aes\",\"xsave\",\"avx\",\"f16c\",\"rdrand\",\"lahf_lm\",\"abm\",\"3dnowprefetch\",\"cpuid_fault\",\"epb\",\"cat_l3\",\"cdp_l3\",\"pti\",\"ssbd\",\"ibrs\",\"ibpb\",\"stibp\",\"tpr_shadow\",\"flexpriority\",\"ept\",\"vpid\",\"ept_ad\",\"fsgsbase\",\"tsc_adjust\",\"bmi1\",\"hle\",\"avx2\",\"smep\",\"bmi2\",\"erms\",\"invpcid\",\"rtm\",\"cqm\",\"rdt_a\",\"rdseed\",\"adx\",\"smap\",\"intel_pt\",\"xsaveopt\",\"cqm_llc\",\"cqm_occup_llc\",\"cqm_mbm_total\",\"cqm_mbm_local\",\"dtherm\",\"ida\",\"arat\",\"pln\",\"pts\",\"vnmi\",\"md_clear\",\"flush_l1d\"],\"frequency\":3400,\"model_name\":\"Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz\"},\"disks\":[{\"by_id\":\"/dev/disk/by-id/wwn-0x500080d910edcd06\",\"by_path\":\"/dev/disk/by-path/pci-0000:00:11.4-ata-1.0\",\"drive_type\":\"SSD\",\"has_uuid\":true,\"hctl\":\"0:0:0:0\",\"id\":\"/dev/disk/by-id/wwn-0x500080d910edcd06\",\"installation_eligibility\":{\"eligible\":true,\"not_eligible_reasons\":null},\"model\":\"THNSF8200CAME\",\"name\":\"sda\",\"path\":\"/dev/sda\",\"serial\":\"184S107ITCRT\",\"size_bytes\":200049647616,\"vendor\":\"ATA\",\"wwn\":\"0x500080d910edcd06\"},{\"by_id\":\"/dev/disk/by-id/wwn-0x500080d910edcdaf\",\"by_path\":\"/dev/disk/by-path/pci-0000:00:11.4-ata-2.0\",\"drive_type\":\"SSD\",\"has_uuid\":true,\"hctl\":\"1:0:0:0\",\"id\":\"/dev/disk/by-id/wwn-0x500080d910edcdaf\",\"installation_eligibility\":{\"eligible\":true,\"not_eligible_reasons\":null},\"model\":\"THNSF8200CAME\",\"name\":\"sdb\",\"path\":\"/dev/sdb\",\"serial\":\"184S105OTCRT\",\"size_bytes\":200049647616,\"vendor\":\"ATA\",\"wwn\":\"0x500080d910edcdaf\"}],\"gpus\":[{\"address\":\"0000:0a:00.0\",\"device_id\":\"0534\",\"name\":\"G200eR2\",\"vendor\":\"Matrox Electronics Systems Ltd.\",\"vendor_id\":\"102b\"}],\"hostname\":\"host-10-233-1-70\",\"interfaces\":[{\"flags\":[\"up\",\"loopback\",\"running\"],\"has_carrier\":true,\"ipv4_addresses\":[\"127.0.0.1/8\"],\"ipv6_addresses\":[\"::1/128\"],\"mtu\":65536,\"name\":\"lo\",\"type\":\"device\"},{\"biosdevname\":\"em1\",\"flags\":[\"up\",\"broadcast\",\"multicast\",\"running\"],\"has_carrier\":true,\"ipv4_addresses\":[\"10.233.1.70/20\"],\"ipv6_addresses\":[],\"mac_address\":\"18:db:f2:a4:8d:1b\",\"mtu\":1500,\"name\":\"eno1\",\"product\":\"0x168e\",\"speed_mbps\":10000,\"type\":\"physical\",\"vendor\":\"0x14e4\"},{\"biosdevname\":\"em2\",\"flags\":[\"up\",\"broadcast\",\"multicast\"],\"ipv4_addresses\":[],\"ipv6_addresses\":[],\"mac_address\":\"18:db:f2:a4:8d:1e\",\"mtu\":1500,\"name\":\"eno2\",\"product\":\"0x168e\",\"speed_mbps\":-1,\"type\":\"physical\",\"vendor\":\"0x14e4\"}],\"memory\":{\"physical_bytes\":137438953472,\"physical_bytes_method\":\"dmidecode\",\"usable_bytes\":135066259456},\"routes\":[{\"destination\":\"0.0.0.0\",\"family\":2,\"gateway\":\"10.233.0.1\",\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"10.233.0.0\",\"family\":2,\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"169.254.169.254\",\"family\":2,\"gateway\":\"10.233.0.52\",\"interface\":\"eno1\",\"metric\":100},{\"destination\":\"::1\",\"family\":10,\"interface\":\"lo\",\"metric\":256},{\"destination\":\"fe80::\",\"family\":10,\"interface\":\"eno1\",\"metric\":1024}],\"system_vendor\":{\"manufacturer\":\"Dell Inc.\",\"product_name\":\"PowerEdge FC430\",\"serial_number\":\"H9R11Q2\"},\"tpm_version\":\"none\"}> error <> exit-code <0>" file="step_processor.go:76" request_id=30fcf824-1d5a-499d-b8cf-f546c7466af7
      

      The `10.233.1.70` address shown here is correct, but looking at the associated agent, we see the incorrect (previous) address:

      $ kubectl -n hardware-inventory get agent a12466ac-cfe9-345e-43b4-83bbd32dbce5 -o jsonpath='{.status.inventory.interfaces[0].ipV4Addresses}'; echo
      ["10.117.0.179/24"]
      

      What's worse is that this behavior seems intermittent – sometimes it successfully updates the agent network information.

      Version-Release number of selected component (if applicable):

      We are running ACM with Hosted Control Planes on OpenShift 4.18. We are using:

      $ kubectl -n open-cluster-management get csv
      NAME                                    DISPLAY                                      VERSION               REPLACES                                           PHASE
      aap-operator.v2.5.0-0.1742434756        Ansible Automation Platform                  2.5.0+0.1742434756    aap-operator.v2.5.0-0.1741369251                   Succeeded
      advanced-cluster-management.v2.12.2     Advanced Cluster Management for Kubernetes   2.12.2                advanced-cluster-management.v2.12.1                Succeeded
      cert-manager.v1.16.1                    cert-manager                                 1.16.1                cert-manager.v1.15.2                               Succeeded
      elasticsearch-operator.v5.8.18          OpenShift Elasticsearch Operator             5.8.18                elasticsearch-operator.v5.8.17                     Succeeded
      external-secrets-operator.v0.11.0       External Secrets Operator                    0.11.0                external-secrets-operator.v0.10.7                  Succeeded
      metallb-operator.v4.18.0-202503181802   MetalLB Operator                             4.18.0-202503181802   metallb-operator.v4.18.0-202503110933              Succeeded
      openshift-gitops-operator.v1.15.1       Red Hat OpenShift GitOps                     1.15.1                openshift-gitops-operator.v1.15.0-0.1738074324.p   Succeeded
      

      Additional info:

      We can manually refresh the agent by restarting agent.service on the node:

      # systemctl restart agent.service
      

      But we would like this to update automatically and predictably.

              cchun@redhat.com Crystal Chun
              lkellogg@redhat.com Lars Kellogg-Stedman
              None
              None
              Michael Burman Michael Burman
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: