-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14.0
-
Critical
-
No
-
Approved
-
False
-
Description of problem:
Installation cannot succeed with userLabels & userTags settings.
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-09-02-132842
How reproducible:
Always
Steps to Reproduce:
1. "create install-config" 2. insert userLabels & userTags setting into install-config.yaml (see below) 3. make sure your GCP credential has Tag User role in the project level and organizational level 4. "create cluster"
Actual results:
The installation failed, with cluster operators authentication, console, image-registry, ingress, monitoring, olm, platform-operators-aggregated, storage are not available.
Expected results:
The installation succeeds.
Additional info:
FYI The installation succeeded with 4.14.0-0.nightly-2023-08-28-154013. $ openshift-install version openshift-install 4.14.0-0.nightly-2023-09-02-132842 built from commit 43cffbbdbba4e3bbc6dcbb141518b3728f401e51 release image registry.ci.openshift.org/ocp/release@sha256:87077b3b95eba15e96758d04d0b69fb0b2b1eb78a3c2269c0db9cd0df2223a12 release architecture amd64 $ yq-3.3.0 r test-lt/install-config.yaml platform gcp: projectID: openshift-qe region: us-central1 userLabels: - key: createdby value: installer-qe - key: environment value: test userTags: - parentID: openshift-qe key: department value: engineering - parentID: 54643501348 key: ocp_tag_dev value: foo - parentID: openshift-qe key: team value: 'installer qe' $ yq-3.3.0 r test-lt/install-config.yaml credentialsMode Passthrough $ yq-3.3.0 r test-lt/install-config.yaml featureSet TechPreviewNoUpgrade $ gcloud config get account ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com $ gcloud config get project openshift-qe $ openshift-install create cluster --dir test-lt INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 6:03PM CST) for the Kubernetes API at https://api.jiwei-0905l.qe.gcp.devcluster.openshift.com:6443... INFO API v1.27.4+2c83a9f up INFO Waiting up to 30m0s (until 6:15PM CST) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 6:38PM CST) for the cluster at https://api.jiwei-0905l.qe.gcp.devcluster.openshift.com:6443 to initialize... ...output omitted... ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Cluster operators authentication, console, image-registry, ingress, monitoring, olm, platform-operators-aggregated, storage are not available $ export KUBECONFIG=test-lt/auth/kubeconfig $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 16h Unable to apply 4.14.0-0.nightly-2023-09-02-132842: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-0905l-9s6f6-master-0.c.openshift-qe.internal Ready control-plane,master 16h v1.27.4+2c83a9f jiwei-0905l-9s6f6-master-1.c.openshift-qe.internal Ready control-plane,master 16h v1.27.4+2c83a9f jiwei-0905l-9s6f6-master-2.c.openshift-qe.internal Ready control-plane,master 16h v1.27.4+2c83a9f jiwei-0905l-9s6f6-worker-a-jf2kg.c.openshift-qe.internal Ready worker 16h v1.27.4+2c83a9f jiwei-0905l-9s6f6-worker-b-ff8gc.c.openshift-qe.internal Ready worker 16h v1.27.4+2c83a9f $ oc get co | grep -v 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.0-0.nightly-2023-09-02-132842 False False True 16h OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found... console 4.14.0-0.nightly-2023-09-02-132842 False False True 16h RouteHealthAvailable: console route is not admitted image-registry False True True 16h Available: The deployment does not have available replicas... ingress False True True 16h The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) kube-controller-manager 4.14.0-0.nightly-2023-09-02-132842 True False True 16h GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host monitoring False True True 16h reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded network 4.14.0-0.nightly-2023-09-02-132842 True True False 16h Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready olm 4.14.0-0.nightly-2023-09-02-132842 False True False 16h CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment... platform-operators-aggregated storage 4.14.0-0.nightly-2023-09-02-132842 False False False 16h SHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service $ oc describe node jiwei-0905l-9s6f6-worker-a-jf2kg.c.openshift-qe.internal Name: jiwei-0905l-9s6f6-worker-a-jf2kg.c.openshift-qe.internal Roles: worker ...output omitted... CreationTimestamp: Tue, 05 Sep 2023 17:59:45 +0800 Taints: node.kubernetes.io/network-unavailable:NoSchedule UpdateInProgress:PreferNoSchedule ...output omitted... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable True Tue, 05 Sep 2023 17:59:47 +0800 Tue, 05 Sep 2023 17:59:47 +0800 NoRouteCreated Node created without a route MemoryPressure False Wed, 06 Sep 2023 10:28:09 +0800 Tue, 05 Sep 2023 17:59:45 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 06 Sep 2023 10:28:09 +0800 Tue, 05 Sep 2023 17:59:45 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 06 Sep 2023 10:28:09 +0800 Tue, 05 Sep 2023 17:59:45 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Wed, 06 Sep 2023 10:28:09 +0800 Tue, 05 Sep 2023 18:00:25 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.128.2 Hostname: jiwei-0905l-9s6f6-worker-a-jf2kg.c.openshift-qe.internal ...output omitted... $ oc debug node/jiwei-0905l-9s6f6-worker-a-jf2kg.c.openshift-qe.internal Starting pod/jiwei-0905l-9s6f6-worker-a-jf2kgcopenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.128.2 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-5.1# ip route show default via 10.0.128.1 dev br-ex proto dhcp src 10.0.128.2 metric 48 10.0.128.1 dev br-ex proto dhcp scope link src 10.0.128.2 metric 48 10.128.0.0/14 via 10.128.2.1 dev ovn-k8s-mp0 10.128.2.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.2.2 169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2 169.254.169.1 dev br-ex src 10.0.128.2 169.254.169.3 via 10.128.2.1 dev ovn-k8s-mp0 172.30.0.0/16 via 169.254.169.4 dev br-ex mtu 1360 sh-5.1# exit exit sh-4.4# exit exitRemoving debug pod ... $ oc describe co ingress | grep network Message: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod "router-default-8588454847-h5t2n" cannot be scheduled: 0/5 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.. Pod "router-default-8588454847-fnghk" cannot be scheduled: 0/5 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 0/2 of replicas are available), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) $
- blocks
-
OCPBUGS-19568 [gcp] installation with "featureSet: TechPreviewNoUpgrade" failed, possibly due to nodes getting taint - "node.kubernetes.io/network-unavailable"
- Closed
- is cloned by
-
OCPBUGS-19568 [gcp] installation with "featureSet: TechPreviewNoUpgrade" failed, possibly due to nodes getting taint - "node.kubernetes.io/network-unavailable"
- Closed
- is related to
-
OCPBUGS-19081 internal-registry-pull-secret.json updates and causes nodes to be stuck
- Closed
- links to
-
RHEA-2023:7198 rpm