Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35073

SNO Fix for API is unavailable after bootstrap server is destroyed

XMLWordPrintable

    • Important
    • No
    • Approved
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Done

      Description of problem:

      Installation failed on 4.16 nightly build when waiting for install-complete. API is unavailable.
      
      level=info msg=Waiting up to 20m0s (until 5:00AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
      level=info msg=API v1.29.2+a0beecc up
      level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
      api available
      waiting for bootstrap to complete
      level=info msg=Waiting up to 20m0s (until 5:01AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
      level=info msg=API v1.29.2+a0beecc up
      level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
      level=info msg=It is now safe to remove the bootstrap resources
      level=info msg=Time elapsed: 15m54s
      Copying kubeconfig to shared dir as kubeconfig-minimal
      level=info msg=Destroying the bootstrap resources... 
      level=info msg=Waiting up to 40m0s (until 5:39AM UTC) for the cluster at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443 to initialize...
      W0313 04:59:34.272442     229 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
      I0313 04:59:34.272658     229 trace.go:236] Trace[533197684]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (13-Mar-2024 04:59:04.271) (total time: 30000ms):
      Trace[533197684]: ---"Objects listed" error:Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout 30000ms (04:59:34.272)
      ...
      E0313 05:38:18.669780     229 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
      level=error msg=Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 172.212.184.131:6443: i/o timeout
      level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
      level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
      level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
      level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
      level=error msg=failed to initialize the cluster: timed out waiting for the condition 
      
      On master node, seems that kube-apiserver is not running, 
      [root@ci-op-4sgxj8jx-8482f-hppxj-master-0 ~]# crictl ps | grep apiserver
      e4b6cc9622b01       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         7 minutes ago        Running             kube-apiserver-cert-syncer                    22                  3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
      1249824fe5788       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-insecure-readyz                0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
      ca774b07284f0       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-cert-regeneration-controller   0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
      2931b9a2bbabd       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             openshift-apiserver-check-endpoints           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
      0c9534aec3b6b       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago          Running             openshift-apiserver                           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
      db21a2dd1df33       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             guard                                         0                   199e1f4e665b9       kube-apiserver-guard-ci-op-4sgxj8jx-8482f-hppxj-master-0
      429110f9ea5a3       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago          Running             apiserver-watcher                             0                   7664f480df29d       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-0
      
      [root@ci-op-4sgxj8jx-8482f-hppxj-master-1 ~]# crictl ps | grep apiserver
      c64187e7adcc6       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago         Running             openshift-apiserver-check-endpoints           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
      ff98c52402288       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago         Running             openshift-apiserver                           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
      2f8a97f959409       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                               0                   ffa2c316a0cca       apiserver-97fbc599c-2ftl7
      72897e30e0df0       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                             0                   3b6c3849ce91f       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-1
      
      [root@ci-op-4sgxj8jx-8482f-hppxj-master-2 ~]# crictl ps | grep apiserver
      04c426f07573d       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                      0                   2172a64fb1a38       apiserver-654dcb4cc6-tq8fj
      4dcca5c0e9b99       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                    0                   1cd99ec327199       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-2
      
      
      And found below error in kubelet log,
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: E0313 06:10:15.004656   23961 kuberuntime_manager.go:1262] container &Container{Name:kube-apiserver,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:789f242b8bc721b697e265c6f9d025f45e56e990bfd32e331c633fe0b9f076bc,Command:[/bin/bash -ec],Args:[LOCK=/var/log/kube-apiserver/.lock
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: # We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely.
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || {
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug."
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   exit 1
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: }
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Copying system trust bundle ..."
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: fi
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP}  -v=2 --permit-address-sharing
      Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: ],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:6443,ContainerPort:6443,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:POD_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:POD_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:STATIC_POD_VERSION,Value:4,ValueFrom:nil,},EnvVar{Name:HOST_IP,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.hostIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:GOGC,Value:100,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{265 -3} {<nil>} 265m DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:resource-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-resources,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:cert-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-certs,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:audit-dir,ReadOnly:false,MountPath:/var/log/kube-apiserver,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:livez,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:readyz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:1,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:healthz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:30,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0_openshift-kube-apiserver(196e0956694ff43707b03f4585f3b6cd): CreateContainerConfigError: host IP unknown; known addresses: []
      
      

      Version-Release number of selected component (if applicable):

          4.16 latest nightly build

      How reproducible:

          frequently

      Steps to Reproduce:

          1. Install cluster on 4.16 nightly build
          2.
          3.
          

      Actual results:

          Installation failed.

      Expected results:

          Installation is successful.

      Additional info:

      Searched CI jobs, found many jobs failed with same error, most are on azure platform.
      https://search.dptools.openshift.org/?search=failed+to+initialize+the+cluster%3A+timed+out+waiting+for+the+condition&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

              Unassigned Unassigned
              jinyunma Jinyun Ma
              Ge Liu Ge Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: