-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.16
-
None
Description of problem:
----------
On fresh ODF deployment installed on 'odf-storage' namespace the node labels present:
oc get nodes -l cluster.ocs.openshift.io/odf-storage=""
NAME STATUS ROLES AGE VERSION
ip-10-0-0-144.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-181.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-45.us-west-2.compute.internal Ready worker 21h v1.29.6+aba1e8d
ip-10-0-0-70.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-78.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-95.us-west-2.compute.internal Ready worker 21h v1.29.6+aba1e8d
That triggers an error on StorageCluster
“Not enough nodes found” (screenshot added)
StorageCluster in error state
oc get storagecluster -A
NAMESPACE NAME AGE PHASE EXTERNAL CREATED AT VERSION
odf-storage ocs-storagecluster 7m25s Error 2024-07-31T15:53:39Z 4.16.0
[jenkins@temp-jagent-dosypenk-r217 terraform-vpc-example]$ oc describe storagecluster ocs-storagecluster -nodf-storage
Name: ocs-storagecluster
Namespace: odf-storage
Labels: <none>
Annotations: uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
API Version: ocs.openshift.io/v1
Kind: StorageCluster
Metadata:
Creation Timestamp: 2024-07-31T15:53:39Z
Finalizers:
storagecluster.ocs.openshift.io
Generation: 2
Owner References:
API Version: odf.openshift.io/v1alpha1
Kind: StorageSystem
Name: ocs-storagecluster-storagesystem
UID: 2dee21a8-8039-4640-8fd1-9e7a669356b6
Resource Version: 101564
UID: 1d04f184-50c7-4f6f-9777-0f197a2fc1d1
Spec:
Arbiter:
Encryption:
Key Rotation:
Schedule: @weekly
Kms:
External Storage:
Managed Resources:
Ceph Block Pools:
Ceph Cluster:
Ceph Config:
Ceph Dashboard:
Ceph Filesystems:
Data Pool Spec:
Application:
Erasure Coded:
Coding Chunks: 0
Data Chunks: 0
Mirroring:
Quotas:
Replicated:
Size: 0
Status Check:
Mirror:
Ceph Non Resilient Pools:
Count: 1
Resources:
Volume Claim Template:
Metadata:
Spec:
Resources:
Status:
Ceph Object Store Users:
Ceph Object Stores:
Ceph RBD Mirror:
Daemon Count: 1
Ceph Toolbox:
Mirroring:
Network:
Connections:
Encryption:
Multi Cluster Service:
Node Topologies:
Resource Profile: lean
Storage Device Sets:
Config:
Count: 1
Data PVC Template:
Metadata:
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 2Ti
Storage Class Name: gp3-csi
Volume Mode: Block
Status:
Name: ocs-deviceset-gp3-csi
Placement:
Portable: true
Prepare Placement:
Replica: 3
Resources:
Status:
Conditions:
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Version check successful
Reason: VersionMatched
Status: False
Type: VersionMismatch
Last Heartbeat Time: 2024-07-31T15:59:08Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Error while reconciling: Not enough nodes found: Expected 3, found 0
Reason: ReconcileFailed
Status: False
Type: ReconcileComplete
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: False
Type: Available
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: True
Type: Progressing
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: False
Type: Degraded
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: Unknown
Type: Upgradeable
Images:
Ceph:
Desired Image: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:579e5358418e176194812eeab523289a0c65e366250688be3f465f1a633b026d
Noobaa Core:
Desired Image: registry.redhat.io/odf4/mcg-core-rhel9@sha256:5f56419be1582bf7a0ee0b9d99efae7523fbf781a88f8fe603182757a315e871
Noobaa DB:
Desired Image: registry.redhat.io/rhel9/postgresql-15@sha256:5c4cad6de1b8e2537c845ef43b588a11347a3297bfab5ea611c032f866a1cb4e
Kms Server Connection:
Phase: Error
Version: 4.16.0
Events: <none>
[jenkins@temp-jagent-dosypenk-r217 terraform-vpc-example]$ oc get nodes -w
NAME STATUS ROLES AGE VERSION
ip-10-0-0-144.us-west-2.compute.internal Ready worker 39m v1.29.6+aba1e8d
ip-10-0-0-181.us-west-2.compute.internal Ready worker 39m v1.29.6+aba1e8d
ip-10-0-0-45.us-west-2.compute.internal Ready worker 3h30m v1.29.6+aba1e8d
ip-10-0-0-70.us-west-2.compute.internal Ready worker 41m v1.29.6+aba1e8d
ip-10-0-0-78.us-west-2.compute.internal Ready worker 43m v1.29.6+aba1e8d
ip-10-0-0-95.us-west-2.compute.internal Ready worker 3h37m v1.29.6+aba1e8d
---------
Workaround:
oc label node -l node-role.kubernetes.io/worker cluster.ocs.openshift.io/openshift-storage=""
---------
Version-Release number of selected component (if applicable):
ODF full_version: 4.16.0-137
---------
How reproducible:
install ODF on ROSA HCP OCP4.16 cluster
Steps to Reproduce:
1. Install ODF 4.16 on ROSA HCP OCP4.16 cluster
2.
3.
---------
Actual results:
Storage cluster "Not enough nodes found" error. ODF installation stalls, no cephfs, rbd storage classes available
Expected results:
no errors. ODF is available same as ODF on regular AWS cluster
---------
Additional info:
ODF installation screen recording - https://drive.google.com/file/d/1y84dNkaj68rov9nbJDAlhcnXwc3cJHs_/view?usp=drive_link
Storage System installation screen recording - https://drive.google.com/file/d/12KUnujZmTAAC1H0YqnhXsWjD2PtjRblW/view?usp=sharing