-
Bug
-
Resolution: Done
-
Critical
-
odf-4.16
-
None
Description of problem:
On 2 Hosted client clusters from 3 the StorageClient connection fails from both UI and CLI. One cluster from the same setup have got connected.
The logs on ocs-client-operator-controller-manager pods are repetitive
2024-05-19T17:26:31Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":
{"name":"storage-client-bm1-c"}, "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "868d2702-b916-41fe-bf51-c48238cfc9c4", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
2024-05-19T17:27:53Z INFO Reconciling StorageClient {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":
, "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "31d79aec-939a-40fa-b77f-ae03c0ab533d", "StorageClient": {"name":"storage-client-bm1-c"}}
2024-05-19T17:28:03Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":
, "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "31d79aec-939a-40fa-b77f-ae03c0ab533d", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
2024-05-19T17:30:47Z INFO Reconciling StorageClient {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":
, "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "cb424122-537d-4145-b972-2abe7d1502f2", "StorageClient": {"name":"storage-client-bm1-c"}}
2024-05-19T17:30:57Z ERROR Reconciler error {"controller": "storageclient", "controllerGroup": "ocs.openshift.io", "controllerKind": "StorageClient", "StorageClient":
, "namespace": "", "name": "storage-client-bm1-c", "reconcileID": "cb424122-537d-4145-b972-2abe7d1502f2", "error": "failed to create a new provider client: failed to dial: context deadline exceeded"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
Version-Release number of selected component (if applicable):
ODF 4.16.0-99
Provider OCP 4.16.0-ec.6
Client OCP 4.15.13
How reproducible:
Create StorageClient from CLI or Management-console
StorageCluster:
$ oc get storagecluster ocs-storagecluster -n openshift-storage -o yaml
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"ocs.openshift.io/v1","kind":"StorageCluster","metadata":{"annotations":{},"name":"ocs-storagecluster","namespace":"openshift-storage"},"spec":{"allowRemoteStorageConsumers":true,"arbiter":{},"encryption":{"clusterWide":true,"enable":true,"kms":{}},"externalStorage":{},"flexibleScaling":true,"hostNetwork":true,"managedResources":{"cephBlockPools":
,"cephCluster":{},"cephConfig":{},"cephDashboard":{},"cephFilesystems":
{"disableSnapshotClass":true,"disableStorageClass":true},"cephNonResilientPools":{},"cephObjectStoreUsers":{},"cephObjectStores":
{"hostNetwork":false},"cephRBDMirror":{},"cephToolbox":{}},"mirroring":{},"monDataDirHostPath":"/var/lib/rook","nodeTopologies":{},"providerAPIServerServiceType":"NodePort","storageDeviceSets":[{"config":{},"count":2,"dataPVCTemplate":{"metadata":{},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"256Gi"}},"storageClassName":"localblock","volumeMode":"Block"},"status":{}},"deviceClass":"ssd","name":"local-storage-deviceset","placement":{},"preparePlacement":{},"replica":6,"resources":{}}]}}
uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
creationTimestamp: "2024-05-16T13:45:06Z"
finalizers:
- storagecluster.ocs.openshift.io
generation: 3
name: ocs-storagecluster
namespace: openshift-storage
ownerReferences: - apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
name: ocs-storagecluster-storagesystem
uid: cc31befb-e1b9-45a7-966b-5b84164bc28a
resourceVersion: "8351843"
uid: ec35b326-b40a-416f-a567-883b2a96873b
spec:
allowRemoteStorageConsumers: true
arbiter: {}
enableCephTools: true
encryption:
clusterWide: true
enable: true
keyRotation:
schedule: '@weekly'
kms: {}
externalStorage: {}
flexibleScaling: true
hostNetwork: true
managedResources:
cephBlockPools:
disableSnapshotClass: true
disableStorageClass: true
cephCluster: {}
cephConfig: {}
cephDashboard: {}
cephFilesystems:
dataPoolSpec:
application: ""
erasureCoded:
codingChunks: 0
dataChunks: 0
mirroring: {}
quotas: {}
replicated:
size: 0
statusCheck:
mirror: {}
disableSnapshotClass: true
disableStorageClass: true
cephNonResilientPools:
count: 1
resources: {}
volumeClaimTemplate:
metadata: {}
spec:
resources: {}
status: {}
cephObjectStoreUsers: {}
cephObjectStores:
hostNetwork: false
cephRBDMirror:
daemonCount: 1
cephToolbox: {}
mirroring: {}
monDataDirHostPath: /var/lib/rook
nodeTopologies: {}
providerAPIServerServiceType: NodePort
storageDeviceSets: - config: {}
count: 2
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 256Gi
storageClassName: localblock
volumeMode: Block
status: {}
deviceClass: ssd
name: local-storage-deviceset
placement: {}
preparePlacement: {}
replica: 6
resources: {}
status:
conditions: - lastHeartbeatTime: "2024-05-16T13:45:06Z"
lastTransitionTime: "2024-05-16T13:45:06Z"
message: Version check successful
reason: VersionMatched
status: "False"
type: VersionMismatch - lastHeartbeatTime: "2024-05-19T18:03:30Z"
lastTransitionTime: "2024-05-19T08:58:30Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: ReconcileComplete - lastHeartbeatTime: "2024-05-19T18:03:30Z"
lastTransitionTime: "2024-05-16T13:49:09Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Available - lastHeartbeatTime: "2024-05-19T18:03:30Z"
lastTransitionTime: "2024-05-19T00:02:04Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Progressing - lastHeartbeatTime: "2024-05-19T18:03:30Z"
lastTransitionTime: "2024-05-16T13:45:06Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Degraded - lastHeartbeatTime: "2024-05-19T18:03:30Z"
lastTransitionTime: "2024-05-19T00:02:04Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Upgradeable
currentMonCount: 3
failureDomain: host
failureDomainKey: kubernetes.io/hostname
failureDomainValues: - baremetal1-01
- baremetal1-02
- baremetal1-03
- baremetal1-04
- baremetal1-05
- baremetal1-06
images:
ceph:
actualImage: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
desiredImage: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:5b80a7edfdaadca7af1d7c18b5b1a1265569b43dd538f7f3758533f77d12ecd2
noobaaCore:
actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:313739c72ce84a82b9ed6064747c4cbb2d069bd0f0479750293b0260eace675b
desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:313739c72ce84a82b9ed6064747c4cbb2d069bd0f0479750293b0260eace675b
noobaaDB:
actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:1aeac23901c0147e4c6e9a1b8bb5f41dd6f95532b0d96adac55d609d0eed32fe
desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:1aeac23901c0147e4c6e9a1b8bb5f41dd6f95532b0d96adac55d609d0eed32fe
kmsServerConnection: {}
nodeTopologies:
labels:
kubernetes.io/hostname: - baremetal1-01
- baremetal1-02
- baremetal1-03
- baremetal1-04
- baremetal1-05
- baremetal1-06
phase: Ready
relatedObjects: - apiVersion: ceph.rook.io/v1
kind: CephCluster
name: ocs-storagecluster-cephcluster
namespace: openshift-storage
resourceVersion: "8350231"
uid: 563b5e96-7af4-40de-8e86-f165d5412ee1 - apiVersion: noobaa.io/v1alpha1
kind: NooBaa
name: noobaa
namespace: openshift-storage
resourceVersion: "8351822"
uid: 0f35c052-49d5-4d4f-b3b2-5a7f64d69470
storageProviderEndpoint: 52.118.6.131:31659
version: 4.16.0
Latest try to create StorageClient was with the yaml:
apiVersion: ocs.openshift.io/v1alpha1
kind: StorageClient
metadata:
name: storage-client-bm1-c
namespace: openshift-storage-client
spec:
onboardingTicket: eyJpZCI6ImUzNjMxMTUyLTQ2YjgtNDE4YS05ZTA3LTU3NjI2MGE0MWE4MSIsImV4cGlyYXRpb25EYXRlIjoiMTcxNjMxMjE1NiJ9.MBcOeKrA0hufeWv/uwpMWspJfTIHVV8xGyoRahtP4dKWBqZ9sN0kM3CH5eFeVsUxuerqhTZE7/vXN/XCZcMPzHY7WRx8uTtTAuywuAUr/TdNCptrhro/0+u/6tnLvpF1FZAYplxMsPSj33QVkhMD4kko1FDkz7XROOhQEyptKesWQT6S9HFAmeA5tuOazdGc6V8KHxVf1j6uSIo+BRMNuWOGk1derLxcOdozlklnMcZZ+Ok1xZGEfW+pYHiAKBhqRyQara+TyG3tFRiijZ4vAFGGV9UTdNX27auj4x68mGrsB7hpq3gZf0dIDnZ2cO0ryBqqNjccFd84dCCbI2k1fW4XA72INBYMB9WTezYjQj95TA8PLBzlMa3jk3YKNbWBUiZlfn1sLbnH4tX3cgc8/G4NT26f2DVLtMBKvavz3SGobuUR3CBUfAWs5CHa5z9Jo35KJXsKrEfxoHHAbALuHnTsIJlTuJ57Hn3RPUwy1RKPFodtARKm3X3s3d8eN1OENhWpOIBfVlYHfwHYjpnPRTGTuW2urQSpPjpw55zZTUWx3fZOResGjrkOSPCHCs1nxw/yepaV73GmO3Vjxy+RC+iVfSK0/iSOjnkQF4P9zXQBgPcdNY8QdkrZlIoWivw1fx9jtirlGWEDnbb91lQACz4TqDPSANGONurhI55dGeo=
storageProviderEndpoint: 52.118.6.131:31659
Steps to Reproduce:
1. Install ODF with Provider mode and create hosted kubevirt cluster
2. Install ODF Client operator on hosted kubevirt cluster
3. Create StorageClient with yaml similar to above
Actual results:
StorageClient does not get Connected
Expected results:
StorageClient connects to Provider in reasonable time. Storage Claims are created automatically, and Storage Classes are created respectively.
Additional info:
Cluster is available for debugging and details will be provided by request.
Screen recording of StorageClient creation via UI - https://drive.google.com/file/d/1LkjqQeFHfS_VflScfely0QObG-4w2gfQ/view?usp=sharing