-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.12
-
None
-
Proposed
-
False
-
Description of problem:
IPI clusters with OVNKubernetes on AWS with m6g[d]?.metal machines fail to install. In particular: - the console op. is not available due to "missing replicas" (pods are in CrashLoopBackOff because unable to reach the oauth routes); - the canary-routes pods are up, but they are unreachable via their route;- the router-default pods seem up; - the Classic LB on AWS is only bound to the master instances and the instances are reported as unhealthy. The HTTP healthcheck is failed on the AWS console: it doesn’t fail if changing from HTTP to the TCP SYN/ACK check;- however, curl requests on the healthcheck uri from any other cluster node to any other report: curl 10.0.215.132:32407/healthz { "service": { "namespace": "openshift-ingress", "name": "router-default" }, "localEndpoints": 0|1 (based on the node) } - in a test, deleting the ingress controller and letting it to be re-created, made the reconciliation to conclude successfully and the installation to finish: oc -n openshift-ingress-operator delete ingresscontroller/default
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-arm64-2022-10-18-153953
How reproducible:
always
Steps to Reproduce:
1.Install a OVNKubernetes IPI on AWS cluster on m6g[d].metal nodes
Actual results:
The installation fails: oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-arm64-2022-10-18-153953 False False True 9h OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.adistefa-1020f.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) baremetal 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cloud-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cloud-credential 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cluster-autoscaler 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h config-operator 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h console 4.12.0-0.nightly-arm64-2022-10-18-153953 False True False 9h DeploymentAvailable: 0 replicas available for console deployment... control-plane-machine-set 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h csi-snapshot-controller 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h dns 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h etcd 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h image-registry 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h ingress 4.12.0-0.nightly-arm64-2022-10-18-153953 True False True 8h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) insights 4.12.0-0.nightly-arm64-2022-10-18-153953 False False True 56m Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: UHC services authentication failed... kube-apiserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-scheduler 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-storage-version-migrator 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h machine-api 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h machine-approver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h machine-config 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h marketplace 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h monitoring 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h network 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h node-tuning 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-apiserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-samples 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager-catalog 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager-packageserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h service-ca 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h storage 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h
Expected results:
The installation succeed
Additional info:
Additional info: - It works when the VM type is not a metal machine - It works when using OpenshiftSDN - It works when using m5.metal machines and x86 nightlies (4.12.0-0.nightly-2022-10-18-192348) - must-gather: https://drive.google.com/file/d/1S-q6PPomreWAzfFZY8r5brWuq6Ea-wJn/view?usp=sharing install-config.yaml: --- apiVersion: v1 controlPlane: architecture: arm64 hyperthreading: Enabled name: master platform: aws: type: m6gd.metal replicas: 3 compute: - architecture: arm64 hyperthreading: Enabled name: worker platform: aws: type: m6gd.metal replicas: 3 metadata: name: adistefa-1020f platform: aws: region: us-east-2 pullSecret: HIDDEN networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.0.0.0/16 networkType: OVNKubernetes publish: External baseDomain: qe.devcluster.openshift.com sshKey: -