Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-626

[IBMCloud] Worker nodes stuck in Provisioning during cluster creation - fail to join cluster

    XMLWordPrintable

Details

    • Important
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      During IPI cluster creation on IBM Cloud, the worker machines are created but do not appear to report in to the cluster, blocking the cluster creation process.
      https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6255/pull-ci-openshift-installer-master-e2e-ibmcloud/1562931167492050944

      NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
      authentication 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest False False True 64m OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.143.137:443/healthz": dial tcp 172.30.143.137:443: connect: connection refused...
      baremetal 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 62m
      cloud-controller-manager 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 66m
      cloud-credential 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 62m
      cluster-autoscaler True False True 61m machine-api not ready
      config-operator 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 64m
      console 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest False False True 41m RouteHealthAvailable: console route is not admitted
      control-plane-machine-set 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 62m
      csi-snapshot-controller 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      dns 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 62m
      etcd 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 54m
      image-registry False True True 50m Available: The deployment does not have available replicas...
      ingress False True True 61m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
      insights 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 57m
      kube-apiserver 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 53m
      kube-controller-manager 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False True 54m GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
      kube-scheduler 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 54m
      kube-storage-version-migrator 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      machine-api False True True 62m Operator is initializing
      machine-approver 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      machine-config 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 59m
      marketplace 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 62m
      monitoring False True True 47m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
      network 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True True False 63m Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
      node-tuning 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 61m
      openshift-apiserver 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 50m
      openshift-controller-manager 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 54m
      openshift-samples 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 50m
      operator-lifecycle-manager 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      operator-lifecycle-manager-catalog 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      operator-lifecycle-manager-packageserver 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 50m
      service-ca 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest True False False 63m
      storage 4.12.0-0.ci.test-2022-08-25-223706-ci-op-6ckhb9q1-latest False True False 63m IBMVPCBlockCSIDriverOperatorCRAvailable: IBMBlockDriverControllerServiceControllerAvailable: Waiting for Deployment

      NAMESPACE NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-master-0 Running bx2-4x16 eu-gb eu-gb-1 69m ci-op-6ckhb9q1-74899-74qj8-master-0 ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/0787_90175449-7395-4f41-885e-348d68af0fc7 running
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-master-1 Running bx2-4x16 eu-gb eu-gb-2 69m ci-op-6ckhb9q1-74899-74qj8-master-1 ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/0797_18653039-5510-44bf-a7d6-4706bfec04d4 running
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-master-2 Running bx2-4x16 eu-gb eu-gb-3 69m ci-op-6ckhb9q1-74899-74qj8-master-2 ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/07a7_410a543a-f17d-4897-847e-e8a05d464935 running
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-worker-1-9k6xn Provisioned bx2-4x16 eu-gb eu-gb-1 62m ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/0787_c2632870-b88a-406b-a003-3c6dfd612753 running
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-worker-2-xdjnf Provisioned bx2-4x16 eu-gb eu-gb-2 62m ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/0797_e14c7e2e-9358-4b19-9f0a-c2400d698cf8 running
      openshift-machine-api ci-op-6ckhb9q1-74899-74qj8-worker-3-4rnqn Provisioned bx2-4x16 eu-gb eu-gb-3 62m ibm://b9db97fa92ce47ae8dbf40ec2318b9f4///ci-op-6ckhb9q1-74899-74qj8/07a7_cdf11bf7-db22-424c-805a-96c09ff91061 running

      NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
      ci-op-6ckhb9q1-74899-74qj8-master-0 Ready control-plane,master 67m v1.24.0+d6a97e1 10.242.1.7 10.242.1.7 Red Hat Enterprise Linux CoreOS 412.86.202208241934-0 (Ootpa) 4.18.0-372.19.1.el8_6.x86_64 cri-o://1.25.0-42.rhaos4.12.gitaaf6efe.el8
      ci-op-6ckhb9q1-74899-74qj8-master-1 Ready control-plane,master 66m v1.24.0+d6a97e1 10.242.65.6 10.242.65.6 Red Hat Enterprise Linux CoreOS 412.86.202208241934-0 (Ootpa) 4.18.0-372.19.1.el8_6.x86_64 cri-o://1.25.0-42.rhaos4.12.gitaaf6efe.el8
      ci-op-6ckhb9q1-74899-74qj8-master-2 Ready control-plane,master 67m v1.24.0+d6a97e1 10.242.128.4 10.242.128.4 Red Hat Enterprise Linux CoreOS 412.86.202208241934-0 (Ootpa) 4.18.0-372.19.1.el8_6.x86_64 cri-o://1.25.0-42.rhaos4.12.gitaaf6efe.el8

      However, the VSI's do appear to have been created and appear to be running.

      1. ibmcloud is instances

      0787_90175449-7395-4f41-885e-348d68af0fc7 ci-op-6ckhb9q1-74899-74qj8-master-0 running 10.242.1.7 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-1 ci-op-6ckhb9q1-74899-74qj8
      0787_c2632870-b88a-406b-a003-3c6dfd612753 ci-op-6ckhb9q1-74899-74qj8-worker-1-9k6xn running 10.242.0.4 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-1 ci-op-6ckhb9q1-74899-74qj8
      0797_18653039-5510-44bf-a7d6-4706bfec04d4 ci-op-6ckhb9q1-74899-74qj8-master-1 running 10.242.65.6 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-2 ci-op-6ckhb9q1-74899-74qj8
      0797_e14c7e2e-9358-4b19-9f0a-c2400d698cf8 ci-op-6ckhb9q1-74899-74qj8-worker-2-xdjnf running 10.242.64.4 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-2 ci-op-6ckhb9q1-74899-74qj8
      07a7_410a543a-f17d-4897-847e-e8a05d464935 ci-op-6ckhb9q1-74899-74qj8-master-2 running 10.242.128.4 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-3 ci-op-6ckhb9q1-74899-74qj8
      07a7_cdf11bf7-db22-424c-805a-96c09ff91061 ci-op-6ckhb9q1-74899-74qj8-worker-3-4rnqn running 10.242.129.4 - bx2-4x16 ci-op-6ckhb9q1-74899-74qj8-rhcos ci-op-6ckhb9q1-74899-74qj8-vpc eu-gb-3 ci-op-6ckhb9q1-74899-74qj8

      My assumption is that the IBM Cloud resources are not fully/properly configured or deployed, perhaps the VSI's are not actually running properly, their network devices not attached, or there is an issue with the networking to these worker nodes.

      I have a similar issue already reported to IBM Cloud VPC Network support, where the three worker nodes cannot be reached by any network means, even with the most lenient network traffic restrictions, which is likely the case here as well.

      Attachments

        Issue Links

          Activity

            People

              jeffbnowickirh Jeff Nowicki
              cjschaef@us.ibm.com Christopher Schaefer (Inactive)
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: