-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.16
-
Important
-
No
-
Approved
-
False
-
-
Release Note Not Required
-
In Progress
This is a clone of issue OCPBUGS-30860. The following is the description of the original issue:
—
Description of problem:
Installation failed on 4.16 nightly build when waiting for install-complete. API is unavailable. level=info msg=Waiting up to 20m0s (until 5:00AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443... level=info msg=API v1.29.2+a0beecc up level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete... api available waiting for bootstrap to complete level=info msg=Waiting up to 20m0s (until 5:01AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443... level=info msg=API v1.29.2+a0beecc up level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete... level=info msg=It is now safe to remove the bootstrap resources level=info msg=Time elapsed: 15m54s Copying kubeconfig to shared dir as kubeconfig-minimal level=info msg=Destroying the bootstrap resources... level=info msg=Waiting up to 40m0s (until 5:39AM UTC) for the cluster at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443 to initialize... W0313 04:59:34.272442 229 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout I0313 04:59:34.272658 229 trace.go:236] Trace[533197684]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (13-Mar-2024 04:59:04.271) (total time: 30000ms): Trace[533197684]: ---"Objects listed" error:Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout 30000ms (04:59:34.272) ... E0313 05:38:18.669780 229 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout level=error msg=Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 172.212.184.131:6443: i/o timeout level=error msg=Cluster initialization failed because one or more operators are not functioning properly. level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation level=error msg=failed to initialize the cluster: timed out waiting for the condition On master node, seems that kube-apiserver is not running, [root@ci-op-4sgxj8jx-8482f-hppxj-master-0 ~]# crictl ps | grep apiserver e4b6cc9622b01 ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 7 minutes ago Running kube-apiserver-cert-syncer 22 3ff4af6614409 kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0 1249824fe5788 ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 4 hours ago Running kube-apiserver-insecure-readyz 0 3ff4af6614409 kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0 ca774b07284f0 ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 4 hours ago Running kube-apiserver-cert-regeneration-controller 0 3ff4af6614409 kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0 2931b9a2bbabd ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 4 hours ago Running openshift-apiserver-check-endpoints 0 4136bf2183de1 apiserver-7df5bb879-xx74p 0c9534aec3b6b 8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de 4 hours ago Running openshift-apiserver 0 4136bf2183de1 apiserver-7df5bb879-xx74p db21a2dd1df33 ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 4 hours ago Running guard 0 199e1f4e665b9 kube-apiserver-guard-ci-op-4sgxj8jx-8482f-hppxj-master-0 429110f9ea5a3 6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5 4 hours ago Running apiserver-watcher 0 7664f480df29d apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-0 [root@ci-op-4sgxj8jx-8482f-hppxj-master-1 ~]# crictl ps | grep apiserver c64187e7adcc6 ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90 4 hours ago Running openshift-apiserver-check-endpoints 0 1a4a5b247c28a apiserver-7df5bb879-f6v5x ff98c52402288 8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de 4 hours ago Running openshift-apiserver 0 1a4a5b247c28a apiserver-7df5bb879-f6v5x 2f8a97f959409 faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927 4 hours ago Running oauth-apiserver 0 ffa2c316a0cca apiserver-97fbc599c-2ftl7 72897e30e0df0 6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5 4 hours ago Running apiserver-watcher 0 3b6c3849ce91f apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-1 [root@ci-op-4sgxj8jx-8482f-hppxj-master-2 ~]# crictl ps | grep apiserver 04c426f07573d faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927 4 hours ago Running oauth-apiserver 0 2172a64fb1a38 apiserver-654dcb4cc6-tq8fj 4dcca5c0e9b99 6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5 4 hours ago Running apiserver-watcher 0 1cd99ec327199 apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-2 And found below error in kubelet log, Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: E0313 06:10:15.004656 23961 kuberuntime_manager.go:1262] container &Container{Name:kube-apiserver,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:789f242b8bc721b697e265c6f9d025f45e56e990bfd32e331c633fe0b9f076bc,Command:[/bin/bash -ec],Args:[LOCK=/var/log/kube-apiserver/.lock Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: # We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely. Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || { Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug." Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exit 1 Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: } Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: echo "Copying system trust bundle ..." Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: fi Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP} -v=2 --permit-address-sharing Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: ],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:6443,ContainerPort:6443,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:POD_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:POD_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:STATIC_POD_VERSION,Value:4,ValueFrom:nil,},EnvVar{Name:HOST_IP,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.hostIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:GOGC,Value:100,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{265 -3} {<nil>} 265m DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:resource-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-resources,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:cert-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-certs,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:audit-dir,ReadOnly:false,MountPath:/var/log/kube-apiserver,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:livez,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:readyz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:1,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:healthz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:30,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0_openshift-kube-apiserver(196e0956694ff43707b03f4585f3b6cd): CreateContainerConfigError: host IP unknown; known addresses: []
Version-Release number of selected component (if applicable):
4.16 latest nightly build
How reproducible:
frequently
Steps to Reproduce:
1. Install cluster on 4.16 nightly build 2. 3.
Actual results:
Installation failed.
Expected results:
Installation is successful.
Additional info:
Searched CI jobs, found many jobs failed with same error, most are on azure platform. https://search.dptools.openshift.org/?search=failed+to+initialize+the+cluster%3A+timed+out+waiting+for+the+condition&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
- is blocked by
-
OCPBUGS-35073 SNO Fix for API is unavailable after bootstrap server is destroyed
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update