-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.22
-
None
Description of problem:
Currently, if a revision update is in progress, a node replacement operation will block on revision stabilization instead of scheduling the pacemaker-setup job immediately. This is fine during initial startup - where we want the installer pods to be completed before we start the transition, but it's not desirable once we're running external-etcd because we want to restore full etcd membership ASAP.
Version-Release number of selected component (if applicable):
openshift-4.22 nightlies as of 3/6/26
How reproducible:
I saw this happen once in the DualStack lanes. I'm not sure why DualStack hits this and other node-replacement lanes haven't.
Steps to Reproduce:
1. Stage a cluster so that a node fails in the middle of a revision update.
2. Start a node replacement procedure until you get to the part where the node is re-added
3. The step to start the new node is blocked until the revision is stabilized
Actual results:
The test timed out. Outside of a test context, you probably just have to wait, unless the node revision starts waiting for the installer to pass on the new node - then you'll never be able to proceed.
Expected results:
Immediately schedule setup job.
Additional info: