-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.16
-
Critical
-
Yes
-
Proposed
-
False
-
Description of problem:
Cluster install fails on IBMCloud, nodes tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Version-Release number of selected component (if applicable):
from 4.16.0-0.nightly-2023-12-22-210021 last PASS version: 4.16.0-0.nightly-2023-12-20-061023
How reproducible:
Always
Steps to Reproduce:
1. Install a cluster on IBMCloud, we use auto flexy template: aos-4_16/ipi-on-ibmcloud/versioned-installer liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 92m Unable to apply 4.16.0-0.nightly-2023-12-25-200355: an unknown error has occurred: MultipleErrors liuhuali@Lius-MacBook-Pro huali-test % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication baremetal cloud-controller-manager 4.16.0-0.nightly-2023-12-25-200355 True False False 89m cloud-credential cluster-autoscaler config-operator console control-plane-machine-set csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-ibma-qbg48-master-0 NotReady control-plane,master 89m v1.29.0+b0d609f huliu-ibma-qbg48-master-1 NotReady control-plane,master 89m v1.29.0+b0d609f huliu-ibma-qbg48-master-2 NotReady control-plane,master 89m v1.29.0+b0d609f liuhuali@Lius-MacBook-Pro huali-test % oc describe node huliu-ibma-qbg48-master-0 Name: huliu-ibma-qbg48-master-0 Roles: control-plane,master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=huliu-ibma-qbg48-master-0 kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node-role.kubernetes.io/master= node.openshift.io/os_id=rhcos Annotations: volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 27 Dec 2023 18:02:21 +0800 Taints: node-role.kubernetes.io/master:NoSchedule node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule node.kubernetes.io/not-ready:NoSchedule Unschedulable: false Lease: HolderIdentity: huliu-ibma-qbg48-master-0 AcquireTime: <unset> RenewTime: Wed, 27 Dec 2023 19:32:24 +0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Addresses: Capacity: cpu: 4 ephemeral-storage: 104266732Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16391716Ki pods: 250 Allocatable: cpu: 3500m ephemeral-storage: 95018478229 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15240740Ki pods: 250 System Info: Machine ID: 0ae21a012be844f18c5871f6eaefb85b System UUID: 0ae21a01-2be8-44f1-8c58-71f6eaefb85b Boot ID: fbe619e2-8ff5-4cdb-b6a4-cd6830ccc568 Kernel Version: 5.14.0-284.45.1.el9_2.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 416.92.202312250319-0 (Plow) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.28.2-9.rhaos4.15.git6d902a3.el9 Kubelet Version: v1.29.0+b0d609f Kube-Proxy Version: v1.29.0+b0d609f Non-terminated Pods: (0 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 0 (0%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeHasNoDiskPressure 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientPID Normal NodeHasSufficientMemory 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientMemory Normal RegisteredNode 90m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller Normal RegisteredNode 73m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller Normal RegisteredNode 53m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller Normal RegisteredNode 32m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller Normal RegisteredNode 12m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cloud-controller-manager NAME READY STATUS RESTARTS AGE ibm-cloud-controller-manager-787645668b-djqnr 0/1 CrashLoopBackOff 22 (2m29s ago) 90m ibm-cloud-controller-manager-787645668b-pgkh2 0/1 Error 15 (5m8s ago) 52m liuhuali@Lius-MacBook-Pro huali-test % oc describe pod ibm-cloud-controller-manager-787645668b-pgkh2 -n openshift-cloud-controller-manager Name: ibm-cloud-controller-manager-787645668b-pgkh2 Namespace: openshift-cloud-controller-manager Priority: 2000000000 Priority Class Name: system-cluster-critical Node: huliu-ibma-qbg48-master-2/ Start Time: Wed, 27 Dec 2023 18:41:23 +0800 Labels: infrastructure.openshift.io/cloud-controller-manager=IBMCloud k8s-app=ibm-cloud-controller-manager pod-template-hash=787645668b Annotations: operator.openshift.io/config-hash: 82a75c6ff86a490b0dac9c8c9b91f1987da0e646a42d72c33c54cbde3c29395b Status: Running IP: IPs: <none> Controlled By: ReplicaSet/ibm-cloud-controller-manager-787645668b Containers: cloud-controller-manager: Container ID: cri-o://c56e246f64c770146c30b7a894f6a4d974159551dbb9d1ea31c238e516a0f854 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218 Image ID: e494d0d4b28e31170a4a2792bb90701c7f1e81c78c03e3686c5f0e601801937e Port: 10258/TCP Host Port: 10258/TCP Command: /bin/bash -c #!/bin/bash set -o allexport if [[ -f /etc/kubernetes/apiserver-url.env ]]; then source /etc/kubernetes/apiserver-url.env fi exec /bin/ibm-cloud-controller-manager \ --bind-address=$(POD_IP_ADDRESS) \ --use-service-account-credentials=true \ --configure-cloud-routes=false \ --cloud-provider=ibm \ --cloud-config=/etc/ibm/cloud.conf \ --profiling=false \ --leader-elect=true \ --leader-elect-lease-duration=137s \ --leader-elect-renew-deadline=107s \ --leader-elect-retry-period=26s \ --leader-elect-resource-namespace=openshift-cloud-controller-manager \ --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_AES_256_GCM_SHA384 \ --v=2 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 27 Dec 2023 19:33:23 +0800 Finished: Wed, 27 Dec 2023 19:33:23 +0800 Ready: False Restart Count: 15 Requests: cpu: 75m memory: 60Mi Liveness: http-get https://:10258/healthz delay=300s timeout=160s period=10s #success=1 #failure=3 Environment: POD_IP_ADDRESS: (v1:status.podIP) VPCCTL_CLOUD_CONFIG: /etc/ibm/cloud.conf VPCCTL_PUBLIC_ENDPOINT: false Mounts: /etc/ibm from cloud-conf (rw) /etc/kubernetes from host-etc-kube (ro) /etc/pki/ca-trust/extracted/pem from trusted-ca (ro) /etc/vpc from ibm-cloud-credentials (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cbd4b (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: ccm-trusted-ca Optional: false host-etc-kube: Type: HostPath (bare host directory volume) Path: /etc/kubernetes HostPathType: Directory cloud-conf: Type: ConfigMap (a volume populated by a ConfigMap) Name: cloud-conf Optional: false ibm-cloud-credentials: Type: Secret (a volume populated by a Secret) SecretName: ibm-cloud-credentials Optional: false kube-api-access-cbd4b: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/not-ready:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 52m default-scheduler Successfully assigned openshift-cloud-controller-manager/ibm-cloud-controller-manager-787645668b-pgkh2 to huliu-ibma-qbg48-master-2 Normal Pulling 52m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" Normal Pulled 52m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" in 3.431s (3.431s including waiting) Normal Created 50m (x5 over 52m) kubelet Created container cloud-controller-manager Normal Started 50m (x5 over 52m) kubelet Started container cloud-controller-manager Normal Pulled 50m (x4 over 52m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" already present on machine Warning BackOff 2m19s (x240 over 52m) kubelet Back-off restarting failed container cloud-controller-manager in pod ibm-cloud-controller-manager-787645668b-pgkh2_openshift-cloud-controller-manager(d7f93ecf-cd14-450e-a986-028559a775b3) liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
cluster install failed on IBMCloud
Expected results:
cluster install succeed on IBMCloud
Additional info:
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update