Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4123

[Alibaba 4.11.0-0.nightly] cluster storage component in Progressing state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.10.z
    • Storage / Operators
    • None

      This bug is a backport clone of [Bugzilla Bug 2069075](https://bugzilla.redhat.com/show_bug.cgi?id=2069075). The following is the description of the original bug:

      Description of problem:
      Alibaba cluster storage component is in Progressing stage with error as:
      "AlibabaDiskCSIDriverOperatorCRAvailable: AlibabaCloudDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service"

      Version-Release number of selected component (if applicable):
      4.11.0-0.nightly-2022-03-27-140854

      How reproducible:
      Always

      Steps to Reproduce:
      1. Install cluster via flexy job.
      2. https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/89082/console
      3. Check the cluster version and components status.

      Actual results:
      rohitpatil@ropatil-mac Downloads % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-03-27-140854 True False 3h17m Error while reconciling 4.11.0-0.nightly-2022-03-27-140854: the cluster operator storage has not yet successfully rolled out

      rohitpatil@ropatil-mac Downloads % oc get co storage
      NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
      storage 4.11.0-0.nightly-2022-03-27-140854 False True False 8m17s AlibabaDiskCSIDriverOperatorCRAvailable: AlibabaCloudDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service

      1. Pods in status: CrashLoopBackOff
        NAME READY STATUS RESTARTS AGE
        alibaba-disk-csi-driver-controller-8db6d86ff-bk8bv 10/10 Running 0 3h42m
        alibaba-disk-csi-driver-controller-8db6d86ff-zfmhr 10/10 Running 0 3h41m
        alibaba-disk-csi-driver-node-glmb4 2/3 CrashLoopBackOff 46 (5m4s ago) 3h36m
        alibaba-disk-csi-driver-node-hq57z 2/3 CrashLoopBackOff 48 (69s ago) 3h41m
        alibaba-disk-csi-driver-node-ncgrf 2/3 CrashLoopBackOff 48 (84s ago) 3h41m
        alibaba-disk-csi-driver-node-rg7q8 2/3 CrashLoopBackOff 47 (52s ago) 3h37m
        alibaba-disk-csi-driver-node-t47c8 2/3 CrashLoopBackOff 48 (36s ago) 3h41m
        alibaba-disk-csi-driver-node-xbcs2 2/3 CrashLoopBackOff 46 (2m17s ago) 3h33m
        alibaba-disk-csi-driver-operator-67d49bd48c-tlr94 1/1 Running 0 3h42m

      #CSI driver logs:
      W0328 08:12:03.133236 301365 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
      time="2022-03-28T08:12:03Z" level=info msg="Not found configmap named as csi-plugin under kube-system, with: configmaps \"csi-plugin\" is forbidden: User \"system:serviceaccount:openshift-cluster-csi-drivers:alibaba-disk-csi-driver-node-sa\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kube-system\""
      time="2022-03-28T08:12:03Z" level=info msg="AD-Controller is enabled by Env(true), CSI Disk Plugin running in AD Controller mode."
      time="2022-03-28T08:12:03Z" level=info msg="AD-Controller is enabled, CSI Disk Plugin running in AD Controller mode."
      time="2022-03-28T08:12:03Z" level=error msg="Describe node wduan-0328a-al-pd7bp-master-1 with error: nodes \"wduan-0328a-al-pd7bp-master-1\" is forbidden: User \"system:serviceaccount:openshift-cluster-csi-drivers:alibaba-disk-csi-driver-node-sa\" cannot get resource \"nodes\" in API group \"\" at the cluster scope"
      time="2022-03-28T08:12:03Z" level=info msg="Starting with GlobalConfigVar: region(us-east-1), NodeID(i-0xihk9cwhslvbcw1qi2o), ADControllerEnable(true), DiskTagEnable(false), DiskBdfEnable(false), MetricEnable(true), RunTimeClass(runc), DetachDisabled(false), DetachBeforeDelete(true), ClusterID()"
      time="2022-03-28T08:12:03Z" level=info msg="NewNodeServer: MAX_VOLUMES_PERNODE is set to(not default): 15"
      W0328 08:12:03.151204 301365 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
      time="2022-03-28T08:12:03Z" level=fatal msg="[IsVFNode] lspci -D: cmd: lspci, stdout: , stderr: , err: exec: \"lspci\": executable file not found in $PATH"

      Expected results:
      Storage component should not be in Progressing state.

            rhn-engineering-jsafrane Jan Safranek
            openshift-crt-jira-prow OpenShift Prow Bot
            Rohit Patil Rohit Patil
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: