-
Bug
-
Resolution: Won't Do
-
Minor
-
None
-
4.12
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Multiple operators are reported as degraded during install on baremetal, specifically when using m6g.metal instance types for worker nodes.
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-arm64-2022-11-06-054834
How reproducible:
Create a cluster using m6gd.metal for master nodes and m6g.metal for worker nodes and notice the errors reported during install as installation fails.
Steps to Reproduce:
1. 2. 3.
Actual results:
Installation fails
Expected results:
Installation succeeds
Additional info:
11-07 17:27:37.103 level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-774d47b77f-6pvqn" cannot be scheduled: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.) ... 11-07 17:27:37.103 level=info msg=Cluster operator insights UploadDegraded is True with NotAuthorized: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: UHC services authentication failed 11-07 17:27:37.103 level=info 11-07 17:27:37.103 level=error msg=Cluster operator kube-controller-manager Degraded is True with GarbageCollector_Error: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host 11-07 17:27:37.104 level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.12.0-0.nightly-arm64-2022-11-06-054834 11-07 17:27:37.104 level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.12.0-0.nightly-arm64-2022-11-06-054834 because minimum worker replica count (2) not yet met: current running replicas 1, waiting for [sv-m6g-bm-trial2-ccz68-worker-us-east-2b-jn89g sv-m6g-bm-trial2-ccz68-worker-us-east-2c-c5vsb] 11-07 17:27:37.104 level=error msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing 11-07 17:27:37.104 level=error msg=Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas 11-07 17:27:37.104 level=error msg=Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas 11-07 17:27:37.104 level=info msg=Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 11-07 17:27:37.104 level=info msg=Cluster operator network ManagementStateDegraded is False with : 11-07 17:27:37.104 level=error msg=Cluster initialization failed because one or more operators are not functioning properly. 11-07 17:27:37.104 level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 11-07 17:27:37.104 level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 11-07 17:27:37.104 level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation 11-07 17:27:37.104 level=error msg=failed to initialize the cluster: Cluster operators machine-api, monitoring are not available