Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36836

7 of 3645 SNOs fail to install with multiple operators failure


    • Moderate
    • None
    • False
    • Hide



      Description of problem:

      We're deploying 3500+ SNOs for ACM 2.11 ZTP Perf/Scale test.  When deploying 4.16.2 we observed 7 SNOs failed to install with multiple operators failure as an example below. This seems new to 4.16.2, as for 4.16.1 we mainly observe API Down issue that we opened another issue. AI install log and must-gather collected, can be accessed from here: https://drive.google.com/drive/u/0/folders/1t1CTgipWq3yeyIi2Tl_iP-KvMQFCrEL_?ths=true 
      Redhatters should have viewer permission for the folder. 
       # oc --kubeconfig /root/hv-vm/kc/vm00962/kubeconfig get clusteroperators
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.16.2    False       False         True       10h     OAuthServerDeploymentAvailable: no oauth-openshift.openshift-authentication pods available on any node....
      config-operator                            4.16.2    True        False         False      10h     
      dns                                        4.16.2    True        False         False      10h     
      etcd                                       4.16.2    True        False         False      10h     
      ingress                                    4.16.2    True        False         False      10h     
      kube-apiserver                             4.16.2    True        False         False      10h     
      kube-controller-manager                    4.16.2    True        False         True       10h     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on [fd02::a]:53: server misbehaving
      kube-scheduler                             4.16.2    True        False         False      10h     
      kube-storage-version-migrator              4.16.2    True        False         False      10h     
      machine-approver                           4.16.2    True        False         False      10h     
      machine-config                             4.16.2    True        False         False      10h     
      monitoring                                           False       True          True       61s     UpdatingPrometheusOperator: reconciling Prometheus Operator Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator: context deadline exceeded: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
      network                                    4.16.2    True        False         False      10h     
      node-tuning                                4.16.2    True        False         False      10h     
      openshift-apiserver                        4.16.2    True        False         False      10h     
      openshift-controller-manager               4.16.2    True        False         False      10h     
      operator-lifecycle-manager                 4.16.2    True        False         False      10h     
      operator-lifecycle-manager-catalog         4.16.2    True        False         False      10h     
      operator-lifecycle-manager-packageserver   4.16.2    True        False         False      10h     
      service-ca                                 4.16.2    True        False         False      10h 



      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:


      Actual results:


      Expected results:


      Additional info:


            oourfali Oved Ourfali
            rhn-support-txue Ting Xue
            Michael Burman Michael Burman
            0 Vote for this issue
            5 Start watching this issue
