-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.12
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
IPI clusters with OVNKubernetes on AWS with m6g[d]?.metal machines fail to install.
In particular:
- the console op. is not available due to "missing replicas" (pods are in CrashLoopBackOff because unable to reach the oauth routes);
- the canary-routes pods are up, but they are unreachable via their route;- the router-default pods seem up;
- the Classic LB on AWS is only bound to the master instances and the instances are reported as unhealthy. The HTTP healthcheck is failed on the AWS console: it doesn’t fail if changing from HTTP to the TCP SYN/ACK check;- however, curl requests on the healthcheck uri from any other cluster node to any other report:
curl 10.0.215.132:32407/healthz
{ "service": { "namespace": "openshift-ingress", "name": "router-default" }, "localEndpoints": 0|1 (based on the node) }
- in a test, deleting the ingress controller and letting it to be re-created, made the reconciliation to conclude successfully and the installation to finish:
oc -n openshift-ingress-operator delete ingresscontroller/default
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-arm64-2022-10-18-153953
How reproducible:
always
Steps to Reproduce:
1.Install a OVNKubernetes IPI on AWS cluster on m6g[d].metal nodes
Actual results:
The installation fails: oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-arm64-2022-10-18-153953 False False True 9h OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.adistefa-1020f.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) baremetal 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cloud-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cloud-credential 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h cluster-autoscaler 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h config-operator 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h console 4.12.0-0.nightly-arm64-2022-10-18-153953 False True False 9h DeploymentAvailable: 0 replicas available for console deployment... control-plane-machine-set 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h csi-snapshot-controller 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h dns 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h etcd 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h image-registry 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h ingress 4.12.0-0.nightly-arm64-2022-10-18-153953 True False True 8h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) insights 4.12.0-0.nightly-arm64-2022-10-18-153953 False False True 56m Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: UHC services authentication failed... kube-apiserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-scheduler 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h kube-storage-version-migrator 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h machine-api 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h machine-approver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h machine-config 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h marketplace 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h monitoring 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 8h network 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h node-tuning 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-apiserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-controller-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h openshift-samples 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager-catalog 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h operator-lifecycle-manager-packageserver 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h service-ca 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h storage 4.12.0-0.nightly-arm64-2022-10-18-153953 True False False 9h
Expected results:
The installation succeed
Additional info:
Additional info:
- It works when the VM type is not a metal machine
- It works when using OpenshiftSDN
- It works when using m5.metal machines and x86 nightlies (4.12.0-0.nightly-2022-10-18-192348)
- must-gather: https://drive.google.com/file/d/1S-q6PPomreWAzfFZY8r5brWuq6Ea-wJn/view?usp=sharing
install-config.yaml:
---
apiVersion: v1
controlPlane:
architecture: arm64
hyperthreading: Enabled
name: master
platform:
aws:
type: m6gd.metal
replicas: 3
compute:
- architecture: arm64
hyperthreading: Enabled
name: worker
platform:
aws:
type: m6gd.metal
replicas: 3
metadata:
name: adistefa-1020f
platform:
aws:
region: us-east-2
pullSecret: HIDDEN
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
- 172.30.0.0/16
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OVNKubernetes
publish: External
baseDomain: qe.devcluster.openshift.com
sshKey: -