-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.15.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
Approved
-
CLOUD Sprint 244, CLOUD Sprint 245
-
2
-
None
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
IPI or UPI installing a private cluster on GCP always fail, with the cluster operator ingress telling LoadBalancerPending and CanaryChecksRepetitiveFailures
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-11-07-233748
How reproducible:
Always
Steps to Reproduce:
1. create a private cluster on GCP, either IPI or UPI
Actual results:
The installation failed, with ingress operator degraded.
Expected results:
The installation can succeed.
Additional info:
Some PROW CI tests:
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920 (Must-gather https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-arm64-nightly-gcp-ipi-private-f28-longduration-cloud/1722352860160593920/artifacts/gcp-ipi-private-f28-longduration-cloud/gather-must-gather/artifacts/must-gather.tar)
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-xpn-private-f28/1722176483704705024
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-gcp-ipi-private-fips-f6-disasterrecovery/1722066338567950336
FYI QE Flexy-install jobs: IPI Flexy-install/245364/, UPI Flexy-install/245524/
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 14h Unable to apply 4.15.0-0.nightly-2023-11-07-233748: some cluster operators are not available
$ oc get nodes
NAME STATUS ROLES AGE VERSION
jiwei-1108-priv-kx7b4-master-0.c.openshift-qe.internal Ready control-plane,master 14h v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal Ready control-plane,master 14h v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-master-2.c.openshift-qe.internal Ready control-plane,master 14h v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-a-l28pl.c.openshift-qe.internal Ready worker 14h v1.28.3+4cbdd29
jiwei-1108-priv-kx7b4-worker-b-84bx5.c.openshift-qe.internal Ready worker 14h v1.28.3+4cbdd29
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.15.0-0.nightly-2023-11-07-233748 False False True 14h OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1108-priv.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
baremetal 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
cloud-controller-manager 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
cloud-credential 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
cluster-autoscaler 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
config-operator 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
console 4.15.0-0.nightly-2023-11-07-233748 False True False 14h DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
csi-snapshot-controller 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
dns 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
etcd 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
image-registry 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
ingress False True True 7h37m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
insights 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
kube-apiserver 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
kube-controller-manager 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
kube-scheduler 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
kube-storage-version-migrator 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
machine-api 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
machine-approver 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
machine-config 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
marketplace 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
monitoring 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
network 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
node-tuning 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
openshift-apiserver 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
openshift-controller-manager 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
openshift-samples 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
operator-lifecycle-manager 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
operator-lifecycle-manager-catalog 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
operator-lifecycle-manager-packageserver 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
service-ca 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
storage 4.15.0-0.nightly-2023-11-07-233748 True False False 14h
$ oc describe co ingress
Name: ingress
Namespace:
Labels: <none>
Annotations: include.release.openshift.io/ibm-cloud-managed: true
include.release.openshift.io/self-managed-high-availability: true
include.release.openshift.io/single-node-developer: true
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2023-11-08T10:38:15Z
Generation: 1
Owner References:
API Version: config.openshift.io/v1
Controller: true
Kind: ClusterVersion
Name: version
UID: dbaae892-1b6d-480d-a201-0549d0a3149d
Resource Version: 172514
UID: 3922a9fe-584f-458f-ac4f-b62b4842758e
Spec:
Status:
Conditions:
Last Transition Time: 2023-11-08T17:49:01Z
Message: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
Reason: IngressUnavailable
Status: False
Type: Available
Last Transition Time: 2023-11-08T11:02:27Z
Message: Not all ingress controllers are available.
Reason: Reconciling
Status: True
Type: Progressing
Last Transition Time: 2023-11-08T17:51:01Z
Message: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
Reason: IngressDegraded
Status: True
Type: Degraded
Last Transition Time: 2023-11-08T10:52:36Z
Reason: IngressControllersUpgradeable
Status: True
Type: Upgradeable
Last Transition Time: 2023-11-08T10:52:36Z
Reason: AsExpected
Status: False
Type: EvaluationConditionsDetected
Extension: <nil>
Related Objects:
Group:
Name: openshift-ingress-operator
Resource: namespaces
Group: operator.openshift.io
Name:
Namespace: openshift-ingress-operator
Resource: ingresscontrollers
Group: ingress.operator.openshift.io
Name:
Namespace: openshift-ingress-operator
Resource: dnsrecords
Group:
Name: openshift-ingress
Resource: namespaces
Group:
Name: openshift-ingress-canary
Resource: namespaces
Events: <none>
$ oc get pods -n openshift-ingress-operator -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-operator-57c555c75b-gqbk6 2/2 Running 2 (14h ago) 14h 10.129.0.36 jiwei-1108-priv-kx7b4-master-1.c.openshift-qe.internal <none> <none>
$ oc -n openshift-ingress-operator logs ingress-operator-57c555c75b-gqbk6
...output omitted...
2023-11-08T10:56:53.715Z ERROR operator.ingress_controller controller/controller.go:118 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod \"router-default-7c86c4f4b5-jsljz\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Pod \"router-default-7c86c4f4b5-pltz4\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: INSTANCE_IN_MULTIPLE_LOAD_BALANCED_IGS - Validation failed for instance 'projects/openshift-qe/zones/us-central1-a/instances/jiwei-1108-priv-kx7b4-master-0': instance may belong to at most one load-balanced instance group.\nThe kube-controller-manager logs may contain more details.)"}
...output omitted...
2023-11-08T15:13:41.323Z ERROR operator.ingress_controller controller/controller.go:118 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1108-priv-kx7b4-worker-b-84bx5' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/jiwei-1108-priv-worker-subnet'., wrongSubnetwork\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
...output omitted...
$
Must-gather https://drive.google.com/file/d/1zwhJ4ga0-tQuRorha4XnUGUKbSTx1fx4/view?usp=drive_link