Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74085

NNCE Enters "Failing" State After Node Reboot due to API Server Probe Timeout

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

           NNCE Enters "Failing" State After Node Reboot due to API Server Probe Timeout

      Version-Release number of selected component (if applicable):

          4.16.54

      How reproducible:

      When a node with an active NodeNetworkConfigurationPolicy (NNCP) is rebooted, the nmstate-handler attempts to reconcile the state immediately upon startup. The reconciliation fails with a "failed running probe 'api-server'" error, causing the NodeNetworkConfigurationEnactment (NNCE) to enter a Failing state.Restarting the nmstate-handler daemonset (oc rollout restart ds/nmstate-handler) resolves the issue immediately.

      Steps to Reproduce:

      1. Apply a valid NodeNetworkConfigurationPolicy for static routes)
         apiVersion: nmstate.io/v1
      
      kind: NodeNetworkConfigurationPolicy
      metadata:
        labels:
          app.kubernetes.io/instance: openshift-nmstate-config
        name: static-route-hnas-workers
      spec:
        desiredState:
          routes:
            config:
            - destination: x.x.x.x/32
              next-hop-address: x.x.x.x
              next-hop-interface: br-ex
              table-id: 254
        nodeSelector:
          node-role.kubernetes.io/worker: ""
      
      
      2. Verify the NNCE status is Available and SuccessfullyConfigured.
      
      oc get nnce
      Thu Jan 15 04:42:47 PM CET 2026
      NAME                                                         STATUS      STATUS AGE   REASON
      1-workers   Available   2d5h         SuccessfullyConfigured
      2-workers   Available   2d5h         SuccessfullyConfigured
      3-workers   Available   2d5h         SuccessfullyConfigured
      4-workers   Available   2d5h         SuccessfullyConfigured
      
      3. Reboot the worker node (systemctl reboot).
      
      4. Wait for the node to return to Ready state. 
      
      5. Check the NNCE status for that node.
      
      NAME                                                         STATUS      STATUS AGE   REASON
      1-workers   Available   2d5h         SuccessfullyConfigured
      2-workers   Available   2d5h         SuccessfullyConfigured
      3-workers   Available   2d5h         SuccessfullyConfigured
      4-workers   Failing     55s          FailedToConfigure
      
         message: |-
        error reconciling NodeNetworkConfigurationPolicy on node <node-name> desired state apply: "",
        rolling back desired state configuration: failed runnig probes after network changes: failed runnig probe 'api-server' with after network reconfiguration -> currentState: hostname:
        running: node-name
      
      Resolve:
      
      Need to restart the handler pods. 
          

      Actual results:

       The NNCE goes into Failing state after reboot. 

      Expected results:

         The NNCE after node reboot automatically reconcile to available state.  

      Additional info:

          

              mkowalsk@redhat.com Mat Kowalski
              rhn-support-shishind Shivam shinde
              Ross Brattain Ross Brattain
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: