Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3367

Multiple cluster operators are degraded when installing on ARM Baremetal m6g.metal instance types using IPI

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Multiple operators are reported as degraded during install on baremetal, specifically when using m6g.metal instance types for worker nodes.

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-arm64-2022-11-06-054834

      How reproducible:

      Create a cluster using m6gd.metal for master nodes and m6g.metal for worker nodes and notice the errors reported during install as installation fails.

      Steps to Reproduce:

      1. 
      2.
      3.
      

      Actual results:

      Installation fails

      Expected results:

      Installation succeeds 

      Additional info:

      11-07 17:27:37.103  level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-774d47b77f-6pvqn" cannot be scheduled: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.)
      ...
      11-07 17:27:37.103  level=info msg=Cluster operator insights UploadDegraded is True with NotAuthorized: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: UHC services authentication failed
      11-07 17:27:37.103  level=info
      11-07 17:27:37.103  level=error msg=Cluster operator kube-controller-manager Degraded is True with GarbageCollector_Error: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
      11-07 17:27:37.104  level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.12.0-0.nightly-arm64-2022-11-06-054834
      11-07 17:27:37.104  level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.12.0-0.nightly-arm64-2022-11-06-054834 because minimum worker replica count (2) not yet met: current running replicas 1, waiting for [sv-m6g-bm-trial2-ccz68-worker-us-east-2b-jn89g sv-m6g-bm-trial2-ccz68-worker-us-east-2c-c5vsb]
      11-07 17:27:37.104  level=error msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing
      11-07 17:27:37.104  level=error msg=Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas
      11-07 17:27:37.104  level=error msg=Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas
      11-07 17:27:37.104  level=info msg=Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
      11-07 17:27:37.104  level=info msg=Cluster operator network ManagementStateDegraded is False with : 
      11-07 17:27:37.104  level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
      11-07 17:27:37.104  level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
      11-07 17:27:37.104  level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
      11-07 17:27:37.104  level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
      11-07 17:27:37.104  level=error msg=failed to initialize the cluster: Cluster operators machine-api, monitoring are not available
      
      

              rdossant Rafael Fonseca dos Santos
              svetsa@redhat.com Sharada Vetsa
              None
              None
              Alessandro Di Stefano Alessandro Di Stefano
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: