Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-63

[2281536] [Provider mode] StorageClient connection fails. Failed to create a new provider client: failed to dial.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.16
    • ocs-client-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • Proposed
    • None

      Description of problem:

      On 2 Hosted client clusters from 3 the StorageClient connection fails from both UI and CLI. One cluster from the same setup have got connected.

      The logs on ocs-client-operator-controller-manager pods are repetitive

      2024-05-19T17:26:31Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":

      {"name":"storage-client-bm1-c"}

      , "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "868d2702-b916-41fe-bf51-c48238cfc9c4", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
      2024-05-19T17:27:53Z INFO Reconciling StorageClient {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":

      {"name":"storage-client-bm1-c"}

      , "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "31d79aec-939a-40fa-b77f-ae03c0ab533d", "StorageClient": {"name":"storage-client-bm1-c"}}
      2024-05-19T17:28:03Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":

      {"name":"storage-client-bm1-c"}

      , "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "31d79aec-939a-40fa-b77f-ae03c0ab533d", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
      2024-05-19T17:30:47Z INFO Reconciling StorageClient {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":

      {"name":"storage-client-bm1-c"}

      , "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "cb424122-537d-4145-b972-2abe7d1502f2", "StorageClient": {"name":"storage-client-bm1-c"}}
      2024-05-19T17:30:57Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":

      {"name":"storage-client-bm1-c"}

      , "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "cb424122-537d-4145-b972-2abe7d1502f2", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227

      Version-Release number of selected component (if applicable):
      ODF 4.16.0-99
      Provider OCP 4.16.0-ec.6
      Client OCP 4.15.13

      How reproducible:
      Create StorageClient from CLI or Management-console

      StorageCluster:

      $ oc get storagecluster ocs-storagecluster -n openshift-storage -o yaml
      apiVersion: ocs.openshift.io/v1
      kind: StorageCluster
      metadata:
      annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ocs.openshift.io/v1","kind":"StorageCluster","metadata":{"annotations":{},"name":"ocs-storagecluster","namespace":"openshift-storage"},"spec":{"allowRemoteStorageConsumers":true,"arbiter":{},"encryption":{"clusterWide":true,"enable":true,"kms":{}},"externalStorage":{},"flexibleScaling":true,"hostNetwork":true,"managedResources":{"cephBlockPools":

      {"disableSnapshotClass":true,"disableStorageClass":true}

      ,"cephCluster":{},"cephConfig":{},"cephDashboard":{},"cephFilesystems":

      {"disableSnapshotClass":true,"disableStorageClass":true}

      ,"cephNonResilientPools":{},"cephObjectStoreUsers":{},"cephObjectStores":

      {"hostNetwork":false}

      ,"cephRBDMirror":{},"cephToolbox":{}},"mirroring":{},"monDataDirHostPath":"/var/lib/rook","nodeTopologies":{},"providerAPIServerServiceType":"NodePort","storageDeviceSets":[{"config":{},"count":2,"dataPVCTemplate":{"metadata":{},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"256Gi"}},"storageClassName":"localblock","volumeMode":"Block"},"status":{}},"deviceClass":"ssd","name":"local-storage-deviceset","placement":{},"preparePlacement":{},"replica":6,"resources":{}}]}}
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
      creationTimestamp: "2024-05-16T13:45:06Z"
      finalizers:

      • storagecluster.ocs.openshift.io
        generation: 3
        name: ocs-storagecluster
        namespace: openshift-storage
        ownerReferences:
      • apiVersion: odf.openshift.io/v1alpha1
        kind: StorageSystem
        name: ocs-storagecluster-storagesystem
        uid: cc31befb-e1b9-45a7-966b-5b84164bc28a
        resourceVersion: "8351843"
        uid: ec35b326-b40a-416f-a567-883b2a96873b
        spec:
        allowRemoteStorageConsumers: true
        arbiter: {}
        enableCephTools: true
        encryption:
        clusterWide: true
        enable: true
        keyRotation:
        schedule: '@weekly'
        kms: {}
        externalStorage: {}
        flexibleScaling: true
        hostNetwork: true
        managedResources:
        cephBlockPools:
        disableSnapshotClass: true
        disableStorageClass: true
        cephCluster: {}
        cephConfig: {}
        cephDashboard: {}
        cephFilesystems:
        dataPoolSpec:
        application: ""
        erasureCoded:
        codingChunks: 0
        dataChunks: 0
        mirroring: {}
        quotas: {}
        replicated:
        size: 0
        statusCheck:
        mirror: {}
        disableSnapshotClass: true
        disableStorageClass: true
        cephNonResilientPools:
        count: 1
        resources: {}
        volumeClaimTemplate:
        metadata: {}
        spec:
        resources: {}
        status: {}
        cephObjectStoreUsers: {}
        cephObjectStores:
        hostNetwork: false
        cephRBDMirror:
        daemonCount: 1
        cephToolbox: {}
        mirroring: {}
        monDataDirHostPath: /var/lib/rook
        nodeTopologies: {}
        providerAPIServerServiceType: NodePort
        storageDeviceSets:
      • config: {}
        count: 2
        dataPVCTemplate:
        metadata: {}
        spec:
        accessModes:
      • ReadWriteOnce
        resources:
        requests:
        storage: 256Gi
        storageClassName: localblock
        volumeMode: Block
        status: {}
        deviceClass: ssd
        name: local-storage-deviceset
        placement: {}
        preparePlacement: {}
        replica: 6
        resources: {}
        status:
        conditions:
      • lastHeartbeatTime: "2024-05-16T13:45:06Z"
        lastTransitionTime: "2024-05-16T13:45:06Z"
        message: Version check successful
        reason: VersionMatched
        status: "False"
        type: VersionMismatch
      • lastHeartbeatTime: "2024-05-19T18:03:30Z"
        lastTransitionTime: "2024-05-19T08:58:30Z"
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: "True"
        type: ReconcileComplete
      • lastHeartbeatTime: "2024-05-19T18:03:30Z"
        lastTransitionTime: "2024-05-16T13:49:09Z"
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: "True"
        type: Available
      • lastHeartbeatTime: "2024-05-19T18:03:30Z"
        lastTransitionTime: "2024-05-19T00:02:04Z"
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: "False"
        type: Progressing
      • lastHeartbeatTime: "2024-05-19T18:03:30Z"
        lastTransitionTime: "2024-05-16T13:45:06Z"
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: "False"
        type: Degraded
      • lastHeartbeatTime: "2024-05-19T18:03:30Z"
        lastTransitionTime: "2024-05-19T00:02:04Z"
        message: Reconcile completed successfully
        reason: ReconcileCompleted
        status: "True"
        type: Upgradeable
        currentMonCount: 3
        failureDomain: host
        failureDomainKey: kubernetes.io/hostname
        failureDomainValues:
      • baremetal1-01
      • baremetal1-02
      • baremetal1-03
      • baremetal1-04
      • baremetal1-05
      • baremetal1-06
        images:
        ceph:
        actualImage: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
        desiredImage: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
        noobaaCore:
        actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:313739c72ce84a82b9ed6064747c4cbb2d069bd0f0479750293b0260eace675b
        desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:313739c72ce84a82b9ed6064747c4cbb2d069bd0f0479750293b0260eace675b
        noobaaDB:
        actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:1aeac23901c0147e4c6e9a1b8bb5f41dd6f95532b0d96adac55d609d0eed32fe
        desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:1aeac23901c0147e4c6e9a1b8bb5f41dd6f95532b0d96adac55d609d0eed32fe
        kmsServerConnection: {}
        nodeTopologies:
        labels:
        kubernetes.io/hostname:
      • baremetal1-01
      • baremetal1-02
      • baremetal1-03
      • baremetal1-04
      • baremetal1-05
      • baremetal1-06
        phase: Ready
        relatedObjects:
      • apiVersion: ceph.rook.io/v1
        kind: CephCluster
        name: ocs-storagecluster-cephcluster
        namespace: openshift-storage
        resourceVersion: "8350231"
        uid: 563b5e96-7af4-40de-8e86-f165d5412ee1
      • apiVersion: noobaa.io/v1alpha1
        kind: NooBaa
        name: noobaa
        namespace: openshift-storage
        resourceVersion: "8351822"
        uid: 0f35c052-49d5-4d4f-b3b2-5a7f64d69470
        storageProviderEndpoint: 52.118.6.131:31659
        version: 4.16.0

      Latest try to create StorageClient was with the yaml:
      apiVersion: ocs.openshift.io/v1alpha1
      kind: StorageClient
      metadata:
      name: storage-client-bm1-c
      namespace: openshift-storage-client
      spec:
      onboardingTicket: eyJpZCI6ImUzNjMxMTUyLTQ2YjgtNDE4YS05ZTA3LTU3NjI2MGE0MWE4MSIsImV4cGlyYXRpb25EYXRlIjoiMTcxNjMxMjE1NiJ9.MBcOeKrA0hufeWv/uwpMWspJfTIHVV8xGyoRahtP4dKWBqZ9sN0kM3CH5eFeVsUxuerqhTZE7/vXN/XCZcMPzHY7WRx8uTtTAuywuAUr/TdNCptrhro/0+u/6tnLvpF1FZAYplxMsPSj33QVkhMD4kko1FDkz7XROOhQEyptKesWQT6S9HFAmeA5tuOazdGc6V8KHxVf1j6uSIo+BRMNuWOGk1derLxcOdozlklnMcZZ+Ok1xZGEfW+pYHiAKBhqRyQara+TyG3tFRiijZ4vAFGGV9UTdNX27auj4x68mGrsB7hpq3gZf0dIDnZ2cO0ryBqqNjccFd84dCCbI2k1fW4XA72INBYMB9WTezYjQj95TA8PLBzlMa3jk3YKNbWBUiZlfn1sLbnH4tX3cgc8/G4NT26f2DVLtMBKvavz3SGobuUR3CBUfAWs5CHa5z9Jo35KJXsKrEfxoHHAbALuHnTsIJlTuJ57Hn3RPUwy1RKPFodtARKm3X3s3d8eN1OENhWpOIBfVlYHfwHYjpnPRTGTuW2urQSpPjpw55zZTUWx3fZOResGjrkOSPCHCs1nxw/yepaV73GmO3Vjxy+RC+iVfSK0/iSOjnkQF4P9zXQBgPcdNY8QdkrZlIoWivw1fx9jtirlGWEDnbb91lQACz4TqDPSANGONurhI55dGeo=
      storageProviderEndpoint: 52.118.6.131:31659

      Steps to Reproduce:
      1. Install ODF with Provider mode and create hosted kubevirt cluster
      2. Install ODF Client operator on hosted kubevirt cluster
      3. Create StorageClient with yaml similar to above

      Actual results:
      StorageClient does not get Connected

      Expected results:
      StorageClient connects to Provider in reasonable time. Storage Claims are created automatically, and Storage Classes are created respectively.

      Additional info:
      Cluster is available for debugging and details will be provided by request.
      Screen recording of StorageClient creation via UI - https://drive.google.com/file/d/1LkjqQeFHfS_VflScfely0QObG-4w2gfQ/view?usp=sharing

              rhn-support-lgangava Leela Gangavarapu
              rh-ee-dosypenk Daniel Osypenko
              Neha Berry
              Neha Berry Neha Berry
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: