Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38662

clusteroperator/kube-controller-manager blips Degraded=True during upgrade test

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.18
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          In an effort to ensure all HA components are not degraded by design during normal e2e test or upgrades, we are collecting all operators that are blipping Degraded=True during any payload job run.
      
      This card captures kube-controller-manager operator that blips Degraded=True during upgrade runs.
      
      Example Job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade/1843275894844559360
      
      Reasons associated with the blip: NodeController_MasterNodesReady and NodeController_MasterNodesReady::StaticPods_Error 
      
      For now, we put an exception in the test. But it is expected that teams take action to fix those and remove the exceptions after the fix go in.
      
      See linked issue for more explanation on the effort.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      Found a new reason, job example:

          : [Monitor:legacy-cvo-invariants][bz-kube-controller-manager] clusteroperator/kube-controller-manager should not change condition/Degraded expand_less2h17m38s{  2 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:
      
      Oct 20 16:02:29.174 E clusteroperator/kube-controller-manager condition/Degraded reason/StaticPods_Error status/True StaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "cluster-policy-controller" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager-cert-syncer" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager-recovery-controller" is waiting: ContainerCreating:
      Oct 20 16:02:29.174 - 93s   E clusteroperator/kube-controller-manager condition/Degraded reason/StaticPods_Error status/True StaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "cluster-policy-controller" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager-cert-syncer" is waiting: ContainerCreating: \nStaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager-recovery-controller" is waiting: ContainerCreating:
      
      10 unwelcome but acceptable clusteroperator state transitions during e2e test run.  These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:
      
      Oct 20 16:04:02.860 W clusteroperator/kube-controller-manager condition/Degraded reason/AsExpected status/False StaticPodsDegraded: pod/kube-controller-manager-ip-10-0-106-117.us-east-2.compute.internal container "kube-controller-manager" started at 2025-10-20 15:59:24 +0000 UTC is still not ready\nNodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
      Oct 20 16:40:25.962 E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-106-117.us-east-2.compute.internal" not ready since 2025-10-20 16:40:04 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 16:40:25.962 - 4s    E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-106-117.us-east-2.compute.internal" not ready since 2025-10-20 16:40:04 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 16:40:30.645 W clusteroperator/kube-controller-manager condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
      Oct 20 16:47:52.366 E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-81-174.us-east-2.compute.internal" not ready since 2025-10-20 16:47:31 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 16:47:52.366 - 5s    E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-81-174.us-east-2.compute.internal" not ready since 2025-10-20 16:47:31 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 16:47:57.539 W clusteroperator/kube-controller-manager condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
      Oct 20 17:11:11.914 E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-36-166.us-east-2.compute.internal" not ready since 2025-10-20 17:10:56 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 17:11:11.914 - 10s   E clusteroperator/kube-controller-manager condition/Degraded reason/NodeController_MasterNodesReady status/True NodeControllerDegraded: The master nodes not ready: node "ip-10-0-36-166.us-east-2.compute.internal" not ready since 2025-10-20 17:10:56 +0000 UTC because KubeletNotReady (container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?) (exception: https://issues.redhat.com/browse/OCPBUGS-38662)
      Oct 20 17:11:22.264 W clusteroperator/kube-controller-manager condition/Degraded reason/AsExpected status/False NodeControllerDegraded: All master nodes are ready (exception: Degraded=False is the happy case)
      }

              aos-workloads-staff Workloads Team Bot Account
              kenzhang@redhat.com Ken Zhang
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: