Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: odf-4.18
Affects Version/s: odf-4.16
Component/s: management-console
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2302235
Dev Approval:
Committed
Prod build version:
4.17.0-77
QE Approval:
Committed
Release Note Text:

Hide
Cause: ODF is installed in a namespace other than "openshift-storage" (ROSA use case).

Consequence:
UI label the nodes while StorageSystem deployment and adds a dynamic label "cluster.ocs.openshift.io/<CLUSTER_NAMESPACE>: ''" (where "CLUSTER_NAMESPACE" is the namespace where StorageSystem is getting created).

ODF/OCS operators on the other hand are still expecting label to be static and always equal to "cluster.ocs.openshift.io/openshift-storage: ''", irrespective of where ODF is installed or StorageSystem is deployed.

Fix:
UI will now always add a static label "cluster.ocs.openshift.io/openshift-storage: ''" to the nodes.

Result:
Install should proceed as expected now.

Workaround:
Label the nodes manually on which we want to deploy the StorageSystem related workloads.
Example:
To label all the worker nodes: `oc label node -l node-role.kubernetes.io/worker cluster.ocs.openshift.io/openshift-storage=""`.

To label a specific node(s): `oc label node <NODE_NAME> cluster.ocs.openshift.io/openshift-storage=""`

Show
Cause: ODF is installed in a namespace other than "openshift-storage" (ROSA use case). Consequence: UI label the nodes while StorageSystem deployment and adds a dynamic label "cluster.ocs.openshift.io/<CLUSTER_NAMESPACE>: ''" (where "CLUSTER_NAMESPACE" is the namespace where StorageSystem is getting created). ODF/OCS operators on the other hand are still expecting label to be static and always equal to "cluster.ocs.openshift.io/openshift-storage: ''", irrespective of where ODF is installed or StorageSystem is deployed. Fix: UI will now always add a static label "cluster.ocs.openshift.io/openshift-storage: ''" to the nodes. Result: Install should proceed as expected now. Workaround: Label the nodes manually on which we want to deploy the StorageSystem related workloads. Example: To label all the worker nodes: `oc label node -l node-role.kubernetes.io/worker cluster.ocs.openshift.io/openshift-storage=""`. To label a specific node(s): `oc label node <NODE_NAME> cluster.ocs.openshift.io/openshift-storage=""`
Release Note Type:
Bug Fix
Target Release:

odf-4.18
Intelligence Requested:
Market:

Release Blocker:
Approved
Target Version:

odf-4.18

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

----------

On fresh ODF deployment installed on 'odf-storage' namespace the node labels present:

oc get nodes -l cluster.ocs.openshift.io/odf-storage=""
NAME STATUS ROLES AGE VERSION
ip-10-0-0-144.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-181.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-45.us-west-2.compute.internal Ready worker 21h v1.29.6+aba1e8d
ip-10-0-0-70.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-78.us-west-2.compute.internal Ready worker 18h v1.29.6+aba1e8d
ip-10-0-0-95.us-west-2.compute.internal Ready worker 21h v1.29.6+aba1e8d

That triggers an error on StorageCluster
“Not enough nodes found” (screenshot added)

StorageCluster in error state
oc get storagecluster -A
NAMESPACE NAME AGE PHASE EXTERNAL CREATED AT VERSION
odf-storage ocs-storagecluster 7m25s Error 2024-07-31T15:53:39Z 4.16.0
[jenkins@temp-jagent-dosypenk-r217 terraform-vpc-example]$ oc describe storagecluster ocs-storagecluster -nodf-storage
Name: ocs-storagecluster
Namespace: odf-storage
Labels: <none>
Annotations: uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
API Version: ocs.openshift.io/v1
Kind: StorageCluster
Metadata:
Creation Timestamp: 2024-07-31T15:53:39Z
Finalizers:
storagecluster.ocs.openshift.io
Generation: 2
Owner References:
API Version: odf.openshift.io/v1alpha1
Kind: StorageSystem
Name: ocs-storagecluster-storagesystem
UID: 2dee21a8-8039-4640-8fd1-9e7a669356b6
Resource Version: 101564
UID: 1d04f184-50c7-4f6f-9777-0f197a2fc1d1
Spec:
Arbiter:
Encryption:
Key Rotation:
Schedule: @weekly
Kms:
External Storage:
Managed Resources:
Ceph Block Pools:
Ceph Cluster:
Ceph Config:
Ceph Dashboard:
Ceph Filesystems:
Data Pool Spec:
Application:
Erasure Coded:
Coding Chunks: 0
Data Chunks: 0
Mirroring:
Quotas:
Replicated:
Size: 0
Status Check:
Mirror:
Ceph Non Resilient Pools:
Count: 1
Resources:
Volume Claim Template:
Metadata:
Spec:
Resources:
Status:
Ceph Object Store Users:
Ceph Object Stores:
Ceph RBD Mirror:
Daemon Count: 1
Ceph Toolbox:
Mirroring:
Network:
Connections:
Encryption:
Multi Cluster Service:
Node Topologies:
Resource Profile: lean
Storage Device Sets:
Config:
Count: 1
Data PVC Template:
Metadata:
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 2Ti
Storage Class Name: gp3-csi
Volume Mode: Block
Status:
Name: ocs-deviceset-gp3-csi
Placement:
Portable: true
Prepare Placement:
Replica: 3
Resources:
Status:
Conditions:
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Version check successful
Reason: VersionMatched
Status: False
Type: VersionMismatch
Last Heartbeat Time: 2024-07-31T15:59:08Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Error while reconciling: Not enough nodes found: Expected 3, found 0
Reason: ReconcileFailed
Status: False
Type: ReconcileComplete
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: False
Type: Available
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: True
Type: Progressing
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: False
Type: Degraded
Last Heartbeat Time: 2024-07-31T15:53:40Z
Last Transition Time: 2024-07-31T15:53:40Z
Message: Initializing StorageCluster
Reason: Init
Status: Unknown
Type: Upgradeable
Images:
Ceph:
Desired Image: registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:579e5358418e176194812eeab523289a0c65e366250688be3f465f1a633b026d
Noobaa Core:
Desired Image: registry.redhat.io/odf4/mcg-core-rhel9@sha256:5f56419be1582bf7a0ee0b9d99efae7523fbf781a88f8fe603182757a315e871
Noobaa DB:
Desired Image: registry.redhat.io/rhel9/postgresql-15@sha256:5c4cad6de1b8e2537c845ef43b588a11347a3297bfab5ea611c032f866a1cb4e
Kms Server Connection:
Phase: Error
Version: 4.16.0
Events: <none>
[jenkins@temp-jagent-dosypenk-r217 terraform-vpc-example]$ oc get nodes -w
NAME STATUS ROLES AGE VERSION
ip-10-0-0-144.us-west-2.compute.internal Ready worker 39m v1.29.6+aba1e8d
ip-10-0-0-181.us-west-2.compute.internal Ready worker 39m v1.29.6+aba1e8d
ip-10-0-0-45.us-west-2.compute.internal Ready worker 3h30m v1.29.6+aba1e8d
ip-10-0-0-70.us-west-2.compute.internal Ready worker 41m v1.29.6+aba1e8d
ip-10-0-0-78.us-west-2.compute.internal Ready worker 43m v1.29.6+aba1e8d
ip-10-0-0-95.us-west-2.compute.internal Ready worker 3h37m v1.29.6+aba1e8d

---------

Workaround:
oc label node -l node-role.kubernetes.io/worker cluster.ocs.openshift.io/openshift-storage=""

---------

Version-Release number of selected component (if applicable):
ODF full_version: 4.16.0-137

---------

How reproducible:
install ODF on ROSA HCP OCP4.16 cluster

Steps to Reproduce:
1. Install ODF 4.16 on ROSA HCP OCP4.16 cluster
2.
3.

---------

Actual results:
Storage cluster "Not enough nodes found" error. ODF installation stalls, no cephfs, rbd storage classes available

Expected results:
no errors. ODF is available same as ODF on regular AWS cluster

---------

Additional info:

ODF installation screen recording - https://drive.google.com/file/d/1y84dNkaj68rov9nbJDAlhcnXwc3cJHs_/view?usp=drive_link

Storage System installation screen recording - https://drive.google.com/file/d/12KUnujZmTAAC1H0YqnhXsWjD2PtjRblW/view?usp=sharing

external trackers

Github red-hat-storage/odf-console/pull/1519

Github red-hat-storage/odf-console/pull/1521

Github red-hat-storage/odf-console/pull/1522

links to

RHBA-2024:138027 Red Hat OpenShift Data Foundation 4.18 security, enhancement & bug fix update

Assignee:: Sanjal Katiyar

Reporter:: Daniel Osypenko

Need Info From:: Daniel Osypenko, Sanjal Katiyar

QA Contact:: Daniel Osypenko

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2024/08/01 11:20 AM

Updated:: 2025/03/11 9:19 AM

Resolved:: 2025/03/11 9:19 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty