Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12797

4.13 MNO: RC4 Cluster operator kube-controller-manager is degraded: GuardControllerDegraded: Missing operand on node master-0...

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      4.13 MNO: RC4 Cluster operator kube-controller-manager is degraded: GuardControllerDegraded: Missing operand on node master-0...

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Tried two times, first happened. Second installed successfully

      Steps to Reproduce:

      1. Install 4.13.0-rc.4 ON MNO (3 node compact cluster) Bare Metal using Agent-Based-Installer
      
      2. created ISO image with following directory
      
      tree kni-qe-31-dualstack.bak/
      kni-qe-31-dualstack.bak/
      ├── agent-config.yaml
      ├── install-config.yaml
      └── openshift
          ├── 00-clean-spare-disk.yaml
          ├── 00-disable-operatorhub.yaml
          ├── 00-kni-lso-catsrc.yaml
          ├── 00-kni-sriov-catsrc.yaml
          ├── 00-redhat-operators-catsrc.yaml
          ├── 98-master-etc-block-connectivity-service.yaml
          ├── 98-master-etc-chrony-conf.yaml
          ├── 98-worker-etc-chrony-conf.yaml
          ├── 99-masters-disable-crio-wipe.yaml
          ├── 99-workers-disable-crio-wipe.yaml
          ├── admin-user-oauth.yaml
          ├── admin-user-secret.yaml
          ├── elasticsearch-namespace.yaml
          ├── elasticsearch-operatorgroup.yaml
          ├── elasticsearch-subscription.yaml
          ├── load-kernel-modules-master.yaml
          ├── load-kernel-modules-worker.yaml
          ├── localstorage-namespace.yaml
          ├── localstorage-operatorgroup.yaml
          ├── localstorage-subscription.yaml
          ├── logging-namespace.yaml
          ├── logging-operatorgroup.yaml
          ├── logging-subscription.yaml
          ├── odf-namespace.yaml
          ├── odf-operatorgroup.yaml
          ├── odf-subscription.yaml
          ├── sriov-namespace.yaml
          ├── sriov-operatorgroup.yaml
          └── sriov-subscription.yaml
      
      INFO cluster bootstrap is complete
      DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
      DEBUG * Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, machine-api, monitoring, openshift-apiserver, openshift-samples, operator-lifecycle-manager-packageserver are not available
      DEBUG * Could not update imagestream "openshift/driver-toolkit" (582 of 841): the server is down or not responding
      DEBUG * Could not update oauthclient "console" (525 of 841): the server does not recognize this resource, check extension API servers
      DEBUG * Could not update role "openshift-console-operator/prometheus-k8s" (758 of 841): resource may have been deleted
      DEBUG * Could not update role "openshift-console/prometheus-k8s" (761 of 841): resource may have been deleted
      DEBUG Still waiting for the cluster to initialize: Working towards 4.13.0-rc.4
      DEBUG Still waiting for the cluster to initialize: Working towards 4.13.0-rc.4: 576 of 841 done (68% complete)
      DEBUG Route found in openshift-console namespace: console
      DEBUG OpenShift console route is admitted
      DEBUG Still waiting for the cluster to initialize: Cluster operators authentication, console, monitoring are not available
      DEBUG Still waiting for the cluster to initialize: Cluster operators authentication, monitoring are not available
      DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is not available
      DEBUG Still waiting for the cluster to initialize: Cluster operator kube-controller-manager is degraded
      DEBUG Still waiting for the cluster to initialize: Cluster operator kube-controller-manager is degraded 

      Actual results:

      Must-Gather
      http://10.1.101.1/4.13/must-gather/kube-controller-manager-degraded.tar.gz
      
      
      
      Reprinting Cluster State:
      When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
      ClusterID: 99c1a3fd-fc05-49fc-9d98-630f028c79ba
      ClusterVersion: Installing "4.13.0-rc.4" for About an hour: Error while reconciling 4.13.0-rc.4: the cluster operator kube-controller-manager is degraded
      ClusterOperators:
              clusteroperator/kube-controller-manager is degraded because GuardControllerDegraded: Missing operand on node master-0
      MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 7 on node: "master-1" didn't show up, waited: 3m0s
      StaticPodsDegraded: pod/kube-controller-manager-master-1 container "cluster-policy-controller" is terminated: Completed:
      StaticPodsDegraded: pod/kube-controller-manager-master-1 container "kube-controller-manager" is terminated: Completed:
      StaticPodsDegraded: pod/kube-controller-manager-master-1 container "kube-controller-manager-cert-syncer" is terminated: Error: st *v1.ConfigMap: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/configmaps?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: W0426 17:42:12.106089       1 reflector.go:424] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: E0426 17:42:12.106113       1 reflector.go:140] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: W0426 17:42:44.012955       1 reflector.go:424] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/configmaps?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: E0426 17:42:44.012985       1 reflector.go:140] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/configmaps?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: W0426 17:43:09.168250       1 reflector.go:424] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded: E0426 17:43:09.168282       1 reflector.go:140] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Failed to watch *v1.Secret: failed to list *v1.Secret: Get "https://localhost:6443/api/v1/namespaces/openshift-kube-controller-manager/secrets?limit=500&resourceVersion=0": x509: certificate signed by unknown authority
      StaticPodsDegraded:
      StaticPodsDegraded: pod/kube-controller-manager-master-1 container "kube-controller-manager-recovery-controller" is terminated: Completed:

      Expected results:

      Successful install

      Additional info:

      oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       False         18m     Error while reconciling 4.13.0-rc.4: the cluster operator kube-controller-manager is degraded
      [kni@registry.kni-qe-31 ocp-edge-qe-venv]$ oc get co
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-rc.4   True        False         False      23m
      baremetal                                  4.13.0-rc.4   True        False         False      43m
      cloud-controller-manager                   4.13.0-rc.4   True        False         False      58m
      cloud-credential                           4.13.0-rc.4   True        False         False      68m
      cluster-autoscaler                         4.13.0-rc.4   True        False         False      43m
      config-operator                            4.13.0-rc.4   True        False         False      44m
      console                                    4.13.0-rc.4   True        False         False      33m
      control-plane-machine-set                  4.13.0-rc.4   True        False         False      44m
      csi-snapshot-controller                    4.13.0-rc.4   True        False         False      44m
      dns                                        4.13.0-rc.4   True        False         False      43m
      etcd                                       4.13.0-rc.4   True        False         False      42m
      image-registry                             4.13.0-rc.4   True        False         False      34m
      ingress                                    4.13.0-rc.4   True        False         False      37m
      insights                                   4.13.0-rc.4   True        False         False      37m
      kube-apiserver                             4.13.0-rc.4   True        False         False      39m
      kube-controller-manager                    4.13.0-rc.4   True        True          True       41m     GuardControllerDegraded: Missing operand on node master-0...
      kube-scheduler                             4.13.0-rc.4   True        False         False      40m
      kube-storage-version-migrator              4.13.0-rc.4   True        False         False      44m
      machine-api                                4.13.0-rc.4   True        False         False      38m
      machine-approver                           4.13.0-rc.4   True        False         False      43m
      machine-config                             4.13.0-rc.4   True        False         False      43m
      marketplace                                4.13.0-rc.4   True        False         False      43m
      monitoring                                 4.13.0-rc.4   True        False         False      32m
      network                                    4.13.0-rc.4   True        False         False      43m
      node-tuning                                4.13.0-rc.4   True        False         False      43m
      openshift-apiserver                        4.13.0-rc.4   True        False         False      37m
      openshift-controller-manager               4.13.0-rc.4   True        False         False      40m
      openshift-samples                          4.13.0-rc.4   True        False         False      36m
      operator-lifecycle-manager                 4.13.0-rc.4   True        False         False      43m
      operator-lifecycle-manager-catalog         4.13.0-rc.4   True        False         False      43m
      operator-lifecycle-manager-packageserver   4.13.0-rc.4   True        False         False      36m
      service-ca                                 4.13.0-rc.4   True        False         False      44m
      storage                                    4.13.0-rc.4   True        False         False      44m

              rphillip@redhat.com Ryan Phillips
              mlammon@redhat.com Mike Lammon
              ying zhou ying zhou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: