Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Component/s: odf-node-recovery
Labels:
- odf-node-recovery

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Critical

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of the problem:
On an assisted installer cluster, I replace one of the osd worker nodes (with the same hostname) and create the odf node recovery resource after the new node joined the cluster. rook ceph osd pod is in the expected clbo, but the node recovery is stuck WaitForOSDPodsStabilize

Node recovery status:

Status:
  Conditions:
    Last Probe Time:       2025-03-20T20:55:27Z
    Last Transition Time:  2025-03-20T20:55:27Z
    Status:                False
    Type:                  EnableCephToolsPod
    Last Probe Time:       2025-03-20T20:55:37Z
    Last Transition Time:  2025-03-20T20:55:27Z
    Status:                False
    Type:                  WaitForCephToolsPodRunning
    Last Probe Time:       2025-03-20T20:59:28Z
    Last Transition Time:  2025-03-20T20:55:37Z
    Message:               OSD pods still in initializing status: pod rook-ceph-osd-1-7dfdb846f5-78xlg: container expand-bluefs waiting in PodInitializing:
pod rook-ceph-osd-1-7dfdb846f5-78xlg: container chown-container-data-dir waiting in PodInitializing:
pod rook-ceph-osd-1-7dfdb846f5-78xlg: container log-collector waiting in PodInitializing:
pod rook-ceph-osd-1-7dfdb846f5-78xlg: container osd waiting in PodInitializing:
    Reason:    WaitingForPodsToInitialize
    Status:    True
    Type:      WaitForOSDPodsStabilize
  Phase:       Running
  Start Time:  2025-03-20T20:55:27Z
Events:        <none>

osd pod is at clbo:

rook-ceph-osd-0-856fb46576-n6sjc                                  2/2     Running                 2               29h
rook-ceph-osd-1-7dfdb846f5-78xlg                                  0/2     Init:CrashLoopBackOff   6 (2m30s ago)   26m
rook-ceph-osd-2-d5dd6fd8-pddgl                                    2/2     Running                 2               29h

How reproducible: 100% so far

Steps to reproduce:1. Deploy 3 master 3 worker assisted installer cluster with odf operator and 1 extra 100G disk for each worker

2. Destroy an osd worker

3. Create a new osd worker via day 2 worker node in assisted installer (same hostname)

4. Run node recovery when the worker joins the cluster

Actual results:

Node recovery stuck WaitForOSDPodsStabilize

Expected results:

Node recovery to recovery the new node

More info

blocks

FLPATH-2018 Develop and release an operator that implements the ODF storage recovery procedures

Closed

Assignee:: Jordi Gil

Reporter:: Chad Crum

QA Contact:: Chad Crum

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/03/20 9:05 PM

Updated:: 2025/05/28 2:38 PM

Resolved:: 2025/03/25 5:01 PM

Details

Description

More info

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates