Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49722

NMState NNCP remains in Progressing state

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          NMState NNCP remains in Progressing/ConfigurationProgressing state meanwhile the NNCE is however Available/SuccessfullyConfigured for that node (single node selector policy). This is causing all the CI/CD pipelines failing as the NNCP doesn't become Available.
      
      NNCP:
      $ oc get nncp | grep -i 3003-worker-3
      vlan-infra-10g-3003-worker-3-policy     Progressing   ConfigurationProgressing
      
      NNCE:
      $ oc get nnce | grep -i 3003-worker-3
      worker-3.vlan-infra-10g-3003-worker-3-policy     Available   2025-01-28T13:19:48Z   SuccessfullyConfigured
      
      nmstate-handler pods are showing the following errors:
      ---
      2025-01-28T13:19:48.615967539Z {"level":"info","ts":"2025-01-28T13:19:48.615Z","logger":"controllers.NodeNetworkConfigurationPolicy.forceNNSRefresh","msg":"forcing NodeNetworkState refresh after NNCP applied","node":"worker-3"}
      
      2025-01-28T13:19:48.929845273Z {"level":"error","ts":"2025-01-28T13:19:48.929Z","logger":"controllers.NodeNetworkConfigurationPolicy","msg":"error decrementing unavailableNodeCount with non-cached client","error":"Internal error occurred: failed calling webhook \"nodene
      tworkconfigurationpolicies-status-mutate.nmstate.io\": failed to call webhook: Post \"https://nmstate-webhook.openshift-nmstate.svc:443/nodenetworkconfigurationpolicies-status-mutate?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown a
      uthority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"nmstate\")","stacktrace":"github.com/nmstate/kubernetes-nmstate/controllers/handler.
      ...}
      2025-01-28T13:19:48.979492103Z {"level":"info","ts":"2025-01-28T13:19:48.979Z","logger":"policyconditions","msg":"numberOfNmstateMatchingNodes: 1, enactments count: {failed: {true: 0, false: 1, unknown: 0}, progressing: {true: 0, false: 1, unknown: 0}, pending: {true: 0
      , false: 1, unknown: 0}, available: {true: 1, false: 0, unknown: 0}, aborted: {true: 0, false: 1, unknown: 0}}","policy":"vlan-infra-10g-3003-worker-3-policy"}
      2025-01-28T13:19:48.979492103Z {"level":"info","ts":"2025-01-28T13:19:48.979Z","logger":"policyconditions","msg":"SetPolicySuccess"}
      ---
      
      Partner knows about the solution 7101331 (deleting the nmstate pods) but it's not a viable workaround for CI/CD.
      
      Is there anything to do to make the NNCP to succeed?

      Version-Release number of selected component (if applicable):

          4.16.25

      How reproducible:

          True

      Steps to Reproduce:

          1. Apply NNCP policy 
          2. Check if NNCE is Available and reason SuccessfullyConfigured
          3. Go back to check NNCP and see if it's still Progressing with reason ConfigurationProgressing     

      Actual results:

          NNCP policy hanging in status Progressing

      Expected results:

          NNCP policy with status Available

      Additional info:

          

              mkowalsk@redhat.com Mat Kowalski
              rhn-support-igarcia Ivan Garcia
              None
              None
              Ross Brattain Ross Brattain
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: