Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10799

ironic-neutron-agent fails to exit with error if backend service connectivity is lost

XMLWordPrintable

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • python-networking-baremetal-6.1.1-18.0.20250403134826.bfcd09e.el9ost
    • python-networking-baremetal-6.1.1-18.0.20250403134826.bfcd09e.el9ost
    • Impediment
    • rhos-ops-day1day2-hardprov
    • None
    • Hide
      .Improved logging and error handling for cross-controller packet loss

      Before this update, cross-controller packet loss could impact request handling by the python-networking-baremetal agent and prevent physical network mapping updates from occurring in the Networking service (neutron) for bare-metal nodes. With this update, there is additional logging and error handling so that the python-networking-baremetal provided service exits and the container can automatically restart if packet loss occurs. Physical network mappings for bare-metal nodes continue to to be updated if network interruptions for Controller nodes occur.
      Show
      .Improved logging and error handling for cross-controller packet loss Before this update, cross-controller packet loss could impact request handling by the python-networking-baremetal agent and prevent physical network mapping updates from occurring in the Networking service (neutron) for bare-metal nodes. With this update, there is additional logging and error handling so that the python-networking-baremetal provided service exits and the container can automatically restart if packet loss occurs. Physical network mappings for bare-metal nodes continue to to be updated if network interruptions for Controller nodes occur.
    • Bug Fix
    • Done
    • Hide

      This issue is difficult to reproduce because it requires to externally break network connectivity for one or more controller nodes in sporadic fashion where the customer was operating controllers on the same physical segment which was spanned across multiple switches utilizing a tunnel. In the case where this occurred, the connectivity between notes was sporadically interrupted which eventually hung the thread which is responsible for managing updates because there was a lack of error handling around the errors.

      Show
      This issue is difficult to reproduce because it requires to externally break network connectivity for one or more controller nodes in sporadic fashion where the customer was operating controllers on the same physical segment which was spanned across multiple switches utilizing a tunnel. In the case where this occurred, the connectivity between notes was sporadically interrupted which eventually hung the thread which is responsible for managing updates because there was a lack of error handling around the errors.
    • HardProv Sprint 2, HardProv Sprint 3, HardProv Sprint 4, HardProv Sprint 6, HardProv Sprint 7, HardProv Sprint 8
    • 6
    • Moderate

      Note: Observed in 17.1.x, and should be backported actively.

      A customer issue was observed with a stretched control plane where the ironic-neutron-agent service was continuing to apparently run, but was no longer connected to services like database/message bus. In reality, I believe the service only operates with the message bus, but other services had been observed in similar state on the same host.

      In any event, no logs had been recorded at any point recently, and the the service was not functioning, so we believe halted. This results in spine/leaf based deployments potentially failing due to missing spine/leaf provider network mapping data.

      I've opened this upstream as launchpad bug ID 2084912.

      Filed against the openstack-ironic component since it relates to the use, however the networking-baremetal package where this issue resides is lacking a specific component in jira.

       

       

              jasonparoly Jason Paroly
              jkreger@redhat.com Julia Kreger
              rhos-dfg-hardprov
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: