-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
Yes
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Cluster install fails on IBMCloud, nodes tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Version-Release number of selected component (if applicable):
from 4.16.0-0.nightly-2023-12-22-210021 last PASS version: 4.16.0-0.nightly-2023-12-20-061023
How reproducible:
Always
Steps to Reproduce:
1. Install a cluster on IBMCloud, we use auto flexy template: aos-4_16/ipi-on-ibmcloud/versioned-installer
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 92m Unable to apply 4.16.0-0.nightly-2023-12-25-200355: an unknown error has occurred: MultipleErrors
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication
baremetal
cloud-controller-manager 4.16.0-0.nightly-2023-12-25-200355 True False False 89m
cloud-credential
cluster-autoscaler
config-operator
console
control-plane-machine-set
csi-snapshot-controller
dns
etcd
image-registry
ingress
insights
kube-apiserver
kube-controller-manager
kube-scheduler
kube-storage-version-migrator
machine-api
machine-approver
machine-config
marketplace
monitoring
network
node-tuning
openshift-apiserver
openshift-controller-manager
openshift-samples
operator-lifecycle-manager
operator-lifecycle-manager-catalog
operator-lifecycle-manager-packageserver
service-ca
storage
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-ibma-qbg48-master-0 NotReady control-plane,master 89m v1.29.0+b0d609f
huliu-ibma-qbg48-master-1 NotReady control-plane,master 89m v1.29.0+b0d609f
huliu-ibma-qbg48-master-2 NotReady control-plane,master 89m v1.29.0+b0d609f
liuhuali@Lius-MacBook-Pro huali-test % oc describe node huliu-ibma-qbg48-master-0
Name: huliu-ibma-qbg48-master-0
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=huliu-ibma-qbg48-master-0
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/master=
node.openshift.io/os_id=rhcos
Annotations: volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 27 Dec 2023 18:02:21 +0800
Taints: node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: huliu-ibma-qbg48-master-0
AcquireTime: <unset>
RenewTime: Wed, 27 Dec 2023 19:32:24 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 27 Dec 2023 19:32:21 +0800 Wed, 27 Dec 2023 18:02:21 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Addresses:
Capacity:
cpu: 4
ephemeral-storage: 104266732Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16391716Ki
pods: 250
Allocatable:
cpu: 3500m
ephemeral-storage: 95018478229
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15240740Ki
pods: 250
System Info:
Machine ID: 0ae21a012be844f18c5871f6eaefb85b
System UUID: 0ae21a01-2be8-44f1-8c58-71f6eaefb85b
Boot ID: fbe619e2-8ff5-4cdb-b6a4-cd6830ccc568
Kernel Version: 5.14.0-284.45.1.el9_2.x86_64
OS Image: Red Hat Enterprise Linux CoreOS 416.92.202312250319-0 (Plow)
Operating System: linux
Architecture: amd64
Container Runtime Version: cri-o://1.28.2-9.rhaos4.15.git6d902a3.el9
Kubelet Version: v1.29.0+b0d609f
Kube-Proxy Version: v1.29.0+b0d609f
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasNoDiskPressure 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 90m (x7 over 90m) kubelet Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientMemory
Normal RegisteredNode 90m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
Normal RegisteredNode 73m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
Normal RegisteredNode 53m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
Normal RegisteredNode 32m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
Normal RegisteredNode 12m node-controller Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cloud-controller-manager
NAME READY STATUS RESTARTS AGE
ibm-cloud-controller-manager-787645668b-djqnr 0/1 CrashLoopBackOff 22 (2m29s ago) 90m
ibm-cloud-controller-manager-787645668b-pgkh2 0/1 Error 15 (5m8s ago) 52m
liuhuali@Lius-MacBook-Pro huali-test % oc describe pod ibm-cloud-controller-manager-787645668b-pgkh2 -n openshift-cloud-controller-manager
Name: ibm-cloud-controller-manager-787645668b-pgkh2
Namespace: openshift-cloud-controller-manager
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: huliu-ibma-qbg48-master-2/
Start Time: Wed, 27 Dec 2023 18:41:23 +0800
Labels: infrastructure.openshift.io/cloud-controller-manager=IBMCloud
k8s-app=ibm-cloud-controller-manager
pod-template-hash=787645668b
Annotations: operator.openshift.io/config-hash: 82a75c6ff86a490b0dac9c8c9b91f1987da0e646a42d72c33c54cbde3c29395b
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/ibm-cloud-controller-manager-787645668b
Containers:
cloud-controller-manager:
Container ID: cri-o://c56e246f64c770146c30b7a894f6a4d974159551dbb9d1ea31c238e516a0f854
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218
Image ID: e494d0d4b28e31170a4a2792bb90701c7f1e81c78c03e3686c5f0e601801937e
Port: 10258/TCP
Host Port: 10258/TCP
Command:
/bin/bash
-c
#!/bin/bash
set -o allexport
if [[ -f /etc/kubernetes/apiserver-url.env ]]; then
source /etc/kubernetes/apiserver-url.env
fi
exec /bin/ibm-cloud-controller-manager \
--bind-address=$(POD_IP_ADDRESS) \
--use-service-account-credentials=true \
--configure-cloud-routes=false \
--cloud-provider=ibm \
--cloud-config=/etc/ibm/cloud.conf \
--profiling=false \
--leader-elect=true \
--leader-elect-lease-duration=137s \
--leader-elect-renew-deadline=107s \
--leader-elect-retry-period=26s \
--leader-elect-resource-namespace=openshift-cloud-controller-manager \
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_AES_256_GCM_SHA384 \
--v=2
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 27 Dec 2023 19:33:23 +0800
Finished: Wed, 27 Dec 2023 19:33:23 +0800
Ready: False
Restart Count: 15
Requests:
cpu: 75m
memory: 60Mi
Liveness: http-get https://:10258/healthz delay=300s timeout=160s period=10s #success=1 #failure=3
Environment:
POD_IP_ADDRESS: (v1:status.podIP)
VPCCTL_CLOUD_CONFIG: /etc/ibm/cloud.conf
VPCCTL_PUBLIC_ENDPOINT: false
Mounts:
/etc/ibm from cloud-conf (rw)
/etc/kubernetes from host-etc-kube (ro)
/etc/pki/ca-trust/extracted/pem from trusted-ca (ro)
/etc/vpc from ibm-cloud-credentials (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cbd4b (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
trusted-ca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ccm-trusted-ca
Optional: false
host-etc-kube:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes
HostPathType: Directory
cloud-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cloud-conf
Optional: false
ibm-cloud-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: ibm-cloud-credentials
Optional: false
kube-api-access-cbd4b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists
node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
node.kubernetes.io/not-ready:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned openshift-cloud-controller-manager/ibm-cloud-controller-manager-787645668b-pgkh2 to huliu-ibma-qbg48-master-2
Normal Pulling 52m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218"
Normal Pulled 52m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" in 3.431s (3.431s including waiting)
Normal Created 50m (x5 over 52m) kubelet Created container cloud-controller-manager
Normal Started 50m (x5 over 52m) kubelet Started container cloud-controller-manager
Normal Pulled 50m (x4 over 52m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" already present on machine
Warning BackOff 2m19s (x240 over 52m) kubelet Back-off restarting failed container cloud-controller-manager in pod ibm-cloud-controller-manager-787645668b-pgkh2_openshift-cloud-controller-manager(d7f93ecf-cd14-450e-a986-028559a775b3)
liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
cluster install failed on IBMCloud
Expected results:
cluster install succeed on IBMCloud
Additional info:
- links to
-
RHEA-2024:0041
OpenShift Container Platform 4.16.z bug fix update