Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-539

[IBMCloud] Storage node labeler fails to label node for csi-driver

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

       Description of problem:
      The IBM Storage driver fails to run on nodes that do not have the proper labels.
      https://github.com/openshift/ibm-vpc-block-csi-driver

      The IBM VPC Node label updater is responsible for adding one such label, which is missing
      https://github.com/openshift/ibm-vpc-node-label-updater

      This causes the storage service to not function resulting in a failed cluster creation.

      Version-Release number of selected component (if applicable):
      4.12

      How reproducible:
      Infrequent (unknown for sure at this time)

      Steps to Reproduce:
      1. Create a IPI cluster on IBM Cloud

      Actual results:
      Successful cluster creation

      Expected results:
      Failed cluster creation, waiting for storage operator to report healthy

      level=error msg=Cluster operator storage Available is False with IBMVPCBlockCSIDriverOperatorCR_IBMBlockDriverControllerServiceController_Deploying: IBMVPCBlockCSIDriverOperatorCRAvailable: IBMBlockDriverControllerServiceControllerAvailable: Waiting for Deployment

      Additional info:
      I have seen this occur twice, recently, but only have details for one such failure as part of CI testing. Those details can be found for the Prow build
      https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6056/pull-ci-openshift-installer-master-e2e-ibmcloud/1562148413049409536

      Primarily, the controller pod was failing (vpc-block-driver container) with
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/6056/pull-ci-openshift-installer-master-e2e-ibmcloud/1562148413049409536/artifacts/e2e-ibmcloud/gather-extra/artifacts/pods/openshift-cluster-csi-drivers_ibm-vpc-block-csi-controller-6dc5f55d87-47vxh_iks-vpc-block-driver.log

      {"level":"info","timestamp":"2022-08-23T21:34:07.951Z","caller":"ibmcsidriver/ibm_csi_driver.go:109","msg":"Successfully setup IBM CSI driver","name":"ibm-vpc-block-csi-driver","CSIDriverName":"IBM VPC block driver"}

      {"level":"fatal","timestamp":"2022-08-23T21:34:07.969Z","caller":"cmd/main.go:110","msg":"Failed to initialize driver...","name":"ibm-vpc-block-csi-driver","CSIDriverName":"IBM VPC block driver","error":"Controller_Helper: Failed to initialize node metadata: error: One or few required node label(s) is/are missing [ibm-cloud.kubernetes.io/worker-id, failure-domain.beta.kubernetes.io/region, failure-domain.beta.kubernetes.io/zone]. Node Labels Found = [#map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/instance-type:bx2-4x16 beta.kubernetes.io/os:linux failure-domain.beta.kubernetes.io/region:eu-gb failure-domain.beta.kubernetes.io/zone:eu-gb-3 kubernetes.io/arch:amd64 kubernetes.io/hostname:ci-op-d2gzpmty-74899-zg8t6-worker-3-ktlcd kubernetes.io/os:linux node-role.kubernetes.io/worker: node.kubernetes.io/instance-type:bx2-4x16 node.openshift.io/os_id:rhcos topology.kubernetes.io/region:eu-gb topology.kubernetes.io/zone:eu-gb-3]]"}

      For which the the "ibmcloud.kubernetes.io/worker-id" label was missing, which is added by the ibm-vpc-node-label-updater
      https://github.com/openshift/ibm-vpc-node-label-updater/blob/64c1820764f8a7065b03b08a70673b8c125876c1/pkg/nodeupdater/node_label.go#L49

      Which was failing due to missing credentials
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/6056/pull-ci-openshift-installer-master-e2e-ibmcloud/1562148413049409536/artifacts/e2e-ibmcloud/gather-extra/artifacts/pods/openshift-cluster-csi-drivers_ibm-vpc-block-csi-node-cg5bw_vpc-node-label-updater.log

      {"level":"info","timestamp":"2022-08-23T21:33:59.095Z","caller":"nodeupdater/utils.go:158","msg":"parsing conf file","watcher-name":"vpc-node-label-updater","confpath":"/etc/storage_ibmc/slclient.toml"}

      {"level":"error","timestamp":"2022-08-23T21:34:29.096Z","caller":"nodeupdater/utils.go:96","msg":"Failed to Get IAM access token","watcher-name":"vpc-node-label-updater","error":"Post \"https://iam.cloud.ibm.com/oidc/token\": dial tcp: lookup iam.cloud.ibm.com: i/o timeout"} {"level":"fatal","timestamp":"2022-08-23T21:34:29.096Z","caller":"cmd/main.go:140","msg":"Failed to read secret configuration from storage secret present in the cluster ","watcher-name":"vpc-node-label-updater","error":"Post \"https://iam.cloud.ibm.com/oidc/token\": dial tcp: lookup iam.cloud.ibm.com: i/o timeout"}

      which appears to have been created multiple times by the storage operator (which could be the issue)
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/6056/pull-ci-openshift-installer-master-e2e-ibmcloud/1562148413049409536/artifacts/e2e-ibmcloud/gather-extra/artifacts/pods/openshift-cluster-csi-drivers_ibm-vpc-block-csi-driver-operator-97ccb4f8c-9jsqs_ibm-vpc-block-csi-driver-operator.log
      I0823 20:41:24.863653 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:41:25.711470 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:41:26.799574 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:41:27.722871 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:42:39.697423 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:42:58.094318 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:42:58.921656 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:42:59.793962 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:43:45.413511 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:43:46.469002 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:51:39.789506 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:53:44.446652 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 20:53:45.449387 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:03:44.473388 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:03:45.360558 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:13:44.428195 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:13:45.388330 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:23:44.369963 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:23:45.321491 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:33:44.301093 1 secretsync.go:125] storage-secret-store secret created successfully
      I0823 21:33:45.276650 1 secretsync.go:125] storage-secret-store secret created successfully

              arahamad Arashad Ahamad (Inactive)
              cjschaef@us.ibm.com Christopher Schaefer (Inactive)
              Chao Yang Chao Yang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: