Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23120

[IBM ROKS] cluster-storage-operator does not set upgradeable=True

XMLWordPrintable

    • Critical
    • Yes
    • False
    • Hide

      None

      Show
      None
    • NA
    • Release Note Not Required
    • In Progress

      Description of problem:

      There is a problem with IBM ROKS (managed service) running 4.14+
      
      cluster-storage-operator never sets the upgradeable=True condition, so it shows up as Unknown:
      
        - lastTransitionTime: "2023-11-08T19:07:01Z"
          reason: NoData
          status: Unknown
          type: Upgradeable
      
      This is a regression from 4.13.
      
      In 4.13, pkg/operator/snapshotcrd/controller.go was the one that set `upgradeable: True`
      
          upgradeable := operatorapi.OperatorCondition{
              Type:   conditionsPrefix + operatorapi.OperatorStatusTypeUpgradeable,
              Status: operatorapi.ConditionTrue,
          }
      
      In the 4.13 bundle from IBM ROKS, these two conditions are set in cluster-scoped-resources/operator.openshift.io/storages/cluster.yaml
      
        - lastTransitionTime: "2023-11-08T14:22:21Z"
          status: "True"
          type: SnapshotCRDControllerUpgradeable
        - lastTransitionTime: "2023-11-08T14:22:21Z"
          reason: AsExpected
          status: "False"
          type: SnapshotCRDControllerDegraded
      
      So the SnapshotCRDController is running and sets `upgradeable: True` on 4.13.
      
      But in the 4.14 bundle, SnapshotCRDController no longer exists.
      
      https://github.com/openshift/cluster-storage-operator/pull/385/commits/fa9af3aad65b9d0e9c618453825e4defeaad59ac
      
      So in 4.14+ it's pkg/operator/defaultstorageclass/controller.go that should set the condition
      
      https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/defaultstorageclass/controller.go#L97-L100
      
      But that only happens if `syncErr == unsupportedPlatformError`...
      and not if `if syncErr == supportedByCSIError` like the case with the IBM VPC driver.
      
        - lastTransitionTime: "2023-11-08T14:22:23Z"
          message: 'DefaultStorageClassControllerAvailable: StorageClass provided by supplied
            CSI Driver instead of the cluster-storage-operator'
          reason: AsExpected
          status: "True"
          type: Available
      
      So what controller will set `upgradeable: True` for IBM VPC?
      IBM VPC uses this StatusFilter function for ROKS:
      
      https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/csioperatorclient/ibm-vpc-block.go#L17-L27
      
      ROKS and AzureStack are the only deployments using a StatusFilter function...
      So shouldRunController returns false here because the platform is ROKS:
      
      https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/driver_starter.go#L347-L349
      
      Which means there is no controller to set `upgradeable: True`

      Version-Release number of selected component (if applicable):

      4.14.0+

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install 4.14 via IBM ROKS
      2. Check status conditions in cluster-scoped-resources/config.openshift.io/clusteroperators/storage.yaml
      

      Actual results:

      upgradeable=Unknown

      Expected results:

      upgradeable=True

      Additional info:

      4.13 IBM ROKS must-gather:
      https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather-4.13.tar.gz
      
      4.14 IBM ROKS must-gather: 
      https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather.tar.gz

            jdobson@redhat.com Jonathan Dobson
            jdobson@redhat.com Jonathan Dobson
            Chao Yang Chao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: