Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: 4.22.0
Affects Version/s: 4.20, 4.21, 4.22
Component/s: Two Node Fencing
Labels:
- qe-core

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
0
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.22.0
Release Blocker:
None
Sprint:
OCPEDGE Sprint 281, OCPEDGE Sprint 282, OCPEDGE Sprint 283
sprint_count:
3

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

During TNF (Two Nodes with Fencing) installation via MCE Assisted Installer, the Pacemaker etcd resource fails to start on one node because the /var/lib/etcd directory does not exist yet. This is caused by a race condition where the TNF setup job creates the Pacemaker etcd resource before the CEO installer pod has created the etcd data directory on the other node (which was the bootstrap node before).

A more detailed analysis can be found here:
Manifests here: attached manifests

Version-Release number of selected component (if applicable):

4.20+

How reproducible:

95% of installation attempts

Steps to Reproduce:

    1.Run MCE Assisted Installer installation with attached manifests(includes fix for ClusterLabs/resource-agents)
    2.Wait for bootstrap node to restart 
    3.Pacemaker will take over etcd cluster with only one node
    4.Installation will never finish

Actual results:

    Installation gets stuck because CEO never sees ETCD cluster fully stable with 2 nodes

Expected results:

   Installation finishes successfully

Additional info:

    This bug can only be reproduced when OCPBUGS-64765 fix is applied. That fix is included in the attached manifests

depends on

OCPBUGS-64765 TNF assisted-service installation stuck due to address/port conflict

ASSIGNED

links to

openshift/cluster-etcd-operator#1524: OCPBUGS-68371: fix bootstrap race condition

Assignee:: Pablo Fontanilla

Reporter:: Pablo Fontanilla

QA Contact:: Francisco Javier Moreno

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/12/15 5:06 PM

Updated:: 2026/01/25 6:41 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates