Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-729

[2319351] ServiceAccount ocs-operator cannot patch resource "nodes" in API group "" at the cluster and failed to find parent StorageSystem's name in StorageCluster "ocs-storagecluster"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • ocs-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      When deploying ODF the StorageCluster remains in phase Error with the following message: "system:serviceaccount:openshift-storage:ocs-operator" cannot patch resource "nodes" in API group "" at the cluster.

      Then we can apply a work-around but there is a next error: failed to create or update odf-info config: failed to get odf-info config map data: failed to find parent StorageSystem's name in StorageCluster "ocs-storagecluster" ownerreferences, []

      Version of all relevant components (if applicable):

      4.18.0-34.stable

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Not really, we can install older versions of ODF, but we would like 4.18 gets fixed before OCP 4.18 gets GA.

      Is there any workaround available to the best of your knowledge?

      sort of, for the first error, but then there is a new error, (see Additional info below)

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      2

      Can this issue reproducible?

      Yes

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Deploy OCP 4.18
      2. Install ODF 4.18.0-34.stable operator
      3. Deploy Local Storage Volume, and apply a StorageCluster config to use the volumes.
      4. Check the status of the StorageCluster resource, it will remain in phase Error, and you can get the details of the error when describing the resource.

      Actual results:

      StorageCluster resource remains in phase Error because of missing permissions in the ClusterRole. And after adding the missing permissions now it fails with a different error.

      Expected results:

      StorageCluster resource goes to a completed phase and ODF cluster is installed successfully

      Additional info:

      • This is how we check the status and obtain the error:

      $ oc -n openshift-storage get StorageCluster
      NAME AGE PHASE EXTERNAL CREATED AT VERSION
      ocs-storagecluster 150m Error 2024-10-15T12:03:53Z 4.18.0

      $ oc -n openshift-storage describe StorageCluster | grep Error
      Message: Error while reconciling: nodes "dciokd-worker-0" is forbidden: User "system:serviceaccount:openshift-storage:ocs-operator" cannot patch resource "nodes" in API group "" at the cluster scope
      Phase: Error

      • The work-around is to include the patch verb in the ClusterRole:

      $ oc get ClusterRole | grep ocs-operator
      ocs-operator.v4.18.0-34.-85lsWOBeXBfgQwa3aD28kdSyasORTuGvVYGZ5D 2024-10-17T03:02:46Z
      ocs-operator.v4.18.0-34.s-cHx751WkoT61jzym2HldUozhwxgdH74w7cI8U 2024-10-17T03:02:45Z

      $ oc edit ClusterRole ocs-operator.v4.18.0-34.s-cHx751WkoT61jzym2HldUozhwxgdH74w7cI8U
      ...

      • apiGroups:
      • ""
        resources:
      • configmaps
      • endpoints
      • events
      • nodes
      • persistentvolumeclaims
      • persistentvolumes
      • pods
      • secrets
      • serviceaccounts
      • services
        verbs:
      • create
      • delete
      • get
      • list
      • update
      • watch
      • patch
        ...
      • Delete the ocs-operator pod

      $ oc -n openshift-storage get pods
      NAME READY STATUS RESTARTS AGE
      ceph-csi-controller-manager-6c59467c77-9t7p9 2/2 Running 0 7h3m
      csi-addons-controller-manager-cbccccb58-4r2lg 2/2 Running 0 7h3m
      ocs-operator-5b649cf54b-mstwv 1/1 Running 0 7h3m
      ocs-provider-server-6665fddf55-fps86 1/1 Running 0 6h44m
      odf-console-7b8bbbd5b9-dp9ck 1/1 Running 0 7h3m
      odf-operator-controller-manager-76848c847f-w7p8d 2/2 Running 0 7h3m
      rook-ceph-operator-86f8f989b4-s5wjj 1/1 Running 0 7h3m
      ux-backend-server-5ffb66c5b6-p6ql9 2/2 Running 0 7h3m

      $ oc -n openshift-storage delete pod ocs-operator-5b649cf54b-mstwv
      pod "ocs-operator-5b649cf54b-mstwv" deleted

      • Check the pod status and StorageCluster, now there are more pods, but still fails to get installed

      $ oc -n openshift-storage get pods
      NAME READY STATUS RESTARTS AGE
      ceph-csi-controller-manager-6c59467c77-9t7p9 2/2 Running 0 7h6m
      csi-addons-controller-manager-cbccccb58-4r2lg 2/2 Running 0 7h7m
      csi-cephfsplugin-82tkv 3/3 Running 0 2m18s
      csi-cephfsplugin-g7lc4 3/3 Running 1 (104s ago) 2m18s
      csi-cephfsplugin-mz7qs 3/3 Running 1 (104s ago) 2m18s
      csi-cephfsplugin-provisioner-8f65fc496-5r8g6 7/7 Running 4 (87s ago) 2m18s
      csi-cephfsplugin-provisioner-8f65fc496-5vmrj 7/7 Running 0 2m18s
      csi-rbdplugin-fzsgd 4/4 Running 0 2m18s
      csi-rbdplugin-jz467 4/4 Running 1 (104s ago) 2m18s
      csi-rbdplugin-nvlrd 4/4 Running 1 (105s ago) 2m18s
      csi-rbdplugin-provisioner-966dd5865-mlgs5 7/7 Running 5 (64s ago) 2m18s
      csi-rbdplugin-provisioner-966dd5865-vq92q 7/7 Running 0 2m18s
      ocs-operator-5b649cf54b-xqmll 1/1 Running 0 2m47s
      ocs-provider-server-6665fddf55-fps86 1/1 Running 0 6h47m
      odf-console-7b8bbbd5b9-dp9ck 1/1 Running 0 7h6m
      odf-operator-controller-manager-76848c847f-w7p8d 2/2 Running 0 7h6m
      rook-ceph-crashcollector-dciokd-worker-0-6f88b56676-7nfgg 1/1 Running 0 36s
      rook-ceph-crashcollector-dciokd-worker-1-7c4bdf9df-vcdch 1/1 Running 0 12s
      rook-ceph-crashcollector-dciokd-worker-2-64957dfd46-6qt7q 1/1 Running 0 10s
      rook-ceph-exporter-dciokd-worker-0-6f6ff48d56-57qj2 1/1 Running 0 36s
      rook-ceph-exporter-dciokd-worker-1-c4c874cb5-dpkpg 1/1 Running 0 10s
      rook-ceph-exporter-dciokd-worker-2-6d688b5ddb-9r4hw 1/1 Running 0 7s
      rook-ceph-mgr-a-5f948dcdc4-p8xkn 3/3 Running 0 45s
      rook-ceph-mgr-b-777d88599-txg4c 3/3 Running 0 44s
      rook-ceph-mon-a-795dfd44fd-dpwtf 2/2 Running 0 2m6s
      rook-ceph-mon-b-7b779ddb97-zv477 2/2 Running 0 102s
      rook-ceph-mon-c-8f59ccd55-2tdzc 2/2 Running 0 56s
      rook-ceph-operator-86f8f989b4-s5wjj 1/1 Running 0 7h6m
      rook-ceph-osd-0-76889c6f4f-ld9d7 1/2 Running 0 13s
      rook-ceph-osd-1-5fc5bf648b-5l5vn 1/2 Running 0 12s
      rook-ceph-osd-2-5dbcd75fb6-n9srq 1/2 Running 0 11s
      rook-ceph-osd-prepare-ocs-deviceset-0-data-0k7zxm-x87vr 0/1 Completed 0 22s
      rook-ceph-osd-prepare-ocs-deviceset-1-data-0lxqrj-8ktlr 0/1 Completed 0 22s
      rook-ceph-osd-prepare-ocs-deviceset-2-data-0gcbk7-pgdxp 0/1 Completed 0 22s
      ux-backend-server-5ffb66c5b6-p6ql9 2/2 Running 0 7h6m

      $ oc -n openshift-storage get StorageCluster
      NAME AGE PHASE EXTERNAL CREATED AT VERSION
      ocs-storagecluster 6h49m Error 2024-10-17T03:22:08Z 4.18.0

      $ oc -n openshift-storage describe StorageCluster | grep -i error
      Message: Error while reconciling: failed to create or update odf-info config: failed to get odf-info config map data: failed to find parent StorageSystem's name in StorageCluster "ocs-storagecluster" ownerreferences, []
      Phase: Error

      • Extra logs

      [dci@provisionhost ~]$ oc -n openshift-storage logs ocs-operator-5b649cf54b-xqmll | tail -2
      {"level":"error","ts":"2024-10-17T10:11:56Z","logger":"controllers.StorageCluster","msg":"failed to create or update odf-info config map","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","ConfigMap":

      {"name":"odf-info","namespace":"openshift-storage"}

      ,"error":"failed to get odf-info config map data: failed to find parent StorageSystem's name in StorageCluster \"ocs-storagecluster\" ownerreferences, []","stacktrace":"github.com/red-hat-storage/ocs-operator/v4/controllers/storagecluster.(*odfInfoConfig).ensureCreated\n\t/remote-source/app/controllers/storagecluster/odfinfoconfig.go:77\ngithub.com/red-hat-storage/ocs-operator/v4/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:455\ngithub.com/red-hat-storage/ocs-operator/v4/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:190\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
      {"level":"error","ts":"2024-10-17T10:11:56Z","msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","StorageCluster":

      {"name":"ocs-storagecluster","namespace":"openshift-storage"}

      ,"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"cdab1e52-fc19-4f92-bdf0-075dc90c66e2","error":"failed to create or update odf-info config: failed to get odf-info config map data: failed to find parent StorageSystem's name in StorageCluster \"ocs-storagecluster\" ownerreferences, []","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}

              rh-ee-mrudraia Marulasiddaiah Rudraiah
              rhn-gps-manrodri Manuel Rodriguez
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: