Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45121

[ROSA HCP] Fail to start ceph monitors. Storage deployment failed

XMLWordPrintable

    • Important
    • None
    • Hypershift Sprint 264
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      RCA performed by ODF team, leads to opinion that issue happened on ROSA OCP deployment stage. 
      Triggered by another issue: https://issues.redhat.com/browse/DFBUGS-946
      Must-gather logs attached bellow
      
      Comment 1
      
      rook mon pods are stuck since the pvcs are not provisioned by `ebs.csi.aws.com`.
      From the logs:
      rook-ceph-mon-a pvc
      ```
      (combined from similar events): failed to provision volume with StorageClass "gp3-csi": rpc error: code = Internal desc = Could not create volume "pvc-b7ba63c5-62a8-4316-aa4b-d55f6348fa78": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 401, RequestID: 1e7c3f92-8d46-4b82-8787-c7bdb667f18e, api error AuthFailure: AWS was not able to validate the provided access credentials
      ```Seems additional authorization is required for 4.17 ROSA HCP clusters.
      
      Comment 2
          Since this is a HCP cluster , the `aws-ebs-csi-driver-controller-` pod is not present in the `openshift-cluster-csi-drivers` namespace. It would be present in the hub cluster. More logs should be avaiable there.
      Digged a bit more,Found the secret somewhat incomplete, Also Daniel Osypenko (keep me honest here) , you were not able to locate the existence of the below role arn as well.
      ebs-cloud-credentials secret from openshift-cluster-csi-drivers
      ```
      [default]
      role_arn = arn:aws:iam::861790564636:role/oproleshcp-dosypenk-2711n-openshift-cluster-csi-drivers-ebs-clou
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
      sts_regional_endpoints = regional
      ```

      Version-Release number of selected component (if applicable):

          ODF: full_version: 4.16.3-2OCP: 4.17.4

      How reproducible:

          try deploy OCP 4.17.4 clsuter on rosa hcp

      Steps to Reproduce:

          1. 
          2.
          3.
          

      Actual results:

          Deployment failed

      Expected results:

          Deployment succeeded

      Additional info:

      ocp-mg: https://url.corp.redhat.com/f1393d1
      ocs-mg: https://url.corp.redhat.com/dc2190f

              Unassigned Unassigned
              rh-ee-dosypenk Daniel Osypenko
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: