Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34667

Azure HostedClusters failing to complete due to cluster-storage-operator

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.17.0
    • 4.17.0
    • HyperShift
    • Important
    • No
    • Hypershift Sprint 255
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Azure HostedClusters are failing in OCP 4.17 due to issues with the cluster-storage-operator.
      - lastTransitionTime: "2024-05-29T19:58:39Z"
                message: 'Unable to apply 4.17.0-0.nightly-multi-2024-05-29-121923: the cluster operator storage is not available'
                observedGeneration: 2
                reason: ClusterOperatorNotAvailable
                status: "True"
                type: ClusterVersionProgressing  
      I0529 20:05:21.547544       1 status_controller.go:218] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2024-05-29T20:02:00Z","message":"AzureDiskCSIDriverOperatorCRDegraded: AzureDiskDriverGuestStaticResourcesControllerDegraded: \"node_service.yaml\" (string): namespaces \"clusters-test-case4\" not found\nAzureDiskCSIDriverOperatorCRDegraded: AzureDiskDriverGuestStaticResourcesControllerDegraded: ","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverGuestStaticResourcesController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2024-05-29T20:04:15Z","message":"AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying","status":"True","type":"Progressing"},{"lastTransitionTime":"2024-05-29T19:59:00Z","message":"AzureDiskCSIDriverOperatorCRAvailable: AzureDiskDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying","status":"False","type":"Available"},{"lastTransitionTime":"2024-05-29T19:59:00Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"},{"lastTransitionTime":"2024-05-29T19:59:00Z","reason":"NoData","status":"Unknown","type":"EvaluationConditionsDetected"}]}} I0529 20:05:21.566215       1 event.go:364] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"azure-cloud-controller-manager", UID:"205a4307-67e4-481e-9fee-975b2c5c40fb", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Progressing message changed from "AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nAzureFileCSIDriverOperatorCRProgressing: AzureFileDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods" to "AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
      

       

      On the HostedCluster itself, these errors with the csi pods not coming up are:

      % k describe pod/azure-disk-csi-driver-node-5hb24 -n openshift-cluster-csi-drivers | grep fail
          Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
          Liveness:     http-get http://:rhealthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
        Warning  FailedMount  2m (x28 over 42m)  kubelet            MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "azure-disk-csi-driver-node-metrics-serving-cert" not found  

      There was an error with the CO as well:

      storage                                    4.17.0-0.nightly-multi-2024-05-29-121923   False       True          True       49m     AzureDiskCSIDriverOperatorCRAvailable: AzureDiskDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service  

       

      Version-Release number of selected component (if applicable):

          4.17

      How reproducible:

          Every time

      Steps to Reproduce:

          1. Create a HC with a 4.17 nightly
          

      Actual results:

          Azure HC does not complete; nodes do join NodePool though

      Expected results:

          Azure HC should complete

      Additional info:

          

            fxierh Feilian Xie
            rh-ee-brcox Bryan Cox
            Feilian Xie Feilian Xie
            Hemant Kumar, Jie Zhao, Rohit Patil
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: