Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61050

nmstate-handler pod crashing due to a port binding conflict

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The nmstate-handler pod on master2 node is in CrashLoopBackOff state due to a port binding conflict. The pod fails to start its metrics server because port 8089 is already in use. The error message indicates: "failed to start metrics server: failed to create listener: listen tcp :8089: bind: address already in use"
      Additionally, the logs show multiple warnings about netlink packet route incompatibilities with the newer kernel, and failed netlink packet decoding errors.
      

      Version-Release number of selected component (if applicable):

      OpenShift Version: 4.20.0-ec.6
      Kubernetes Version: v1.33.3
      nmstate-handler image: registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel9@sha256:8d7f064859fbb4890a87008d80ccb56b2b560dc105290265cf00b684d96a390b
      Nmstate version: 2.2.48
      Kernel Version: 5.14.0-570.35.1.el9_6.x86_64
      

      How reproducible:

      Currently observed on master2 node with 12 restarts. The issue appears to be consistent as the pod is in CrashLoopBackOff state.
      

      Steps to Reproduce:

      Deploy OpenShift 4.20.0-ec.6 cluster
      Install nmstate operator
      Observe nmstate-handler pod on master2 node
      Pod fails to start with port binding error
      

      Actual results:

      nmstate-handler-s8spq pod on master2 is in CrashLoopBackOff state
      Pod has restarted 12 times
      Error log shows: "failed to start metrics server: failed to create listener: listen tcp :8089: bind: address already in use"
      Multiple netlink warnings and errors in the logs:
      "WARN netlink_packet_route::link::buffer_tool] Specified IFLA_INET6_ICMP6STATS NLA attribute holds more(most likely new kernel) data which is unknown to netlink-packet-route crate, expecting 48, got 56"
      "ERROR netlink_proto::codecs] failed to decode packet" for eno1, eno2, eno3, and eno4 interfaces
      

      Expected results:

      All nmstate-handler pods should run successfully without port conflicts
      nmstate-handler should be able to bind to port 8089 for its metrics server
      No netlink packet decoding errors should occur
      

      Additional info:

      Cluster: bm12-ibm.rhos-psi.cnv-qe.rhood.us
      Only the nmstate-handler pod on master2 is affected; pods on master1 and master3 are running fine
      The port conflict suggests either:
      Another process on the node is using port 8089
      A previous instance of nmstate-handler didn't release the port properly
      There might be a hostNetwork/hostPort configuration issue
      The netlink errors suggest compatibility issues between the nmstate/nispor libraries and the newer kernel version
      must-gather collected at: must-gather.local.3886589935363134981/
      

              bnemec@redhat.com Benjamin Nemec
              lbednar@redhat.com Lukas Bednar
              None
              Yossi Segev
              Guohua Ouyang Guohua Ouyang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: