Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 4.11
Affects Version/s: 4.11.0
Component/s: Documentation / etcd
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
5
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
OSDOCS Sprint 233, OSDOCS Sprint 234
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Between step 12 and 13 of https://docs.openshift.com/container-platform/4.11/backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html , it is necessary to specify that the user must wait up to several minutes (more than five sometimes) so that CNO redeploys OVN-Kubernetes control plane once the node objects associated to the wrong master nodes are deleted.

If this wait is not added, the step 13 may cause OVN databases to be bootstrapped again as clustered and stay as such.

The best way to check is that the ovnkube-master ds no longer contains any reference to the wrong master IPs as per this command:

oc -n openshift-ovn-kubernetes get ds/ovnkube-master -o yaml | grep -E "${WRONG_MASTER_IP_1}|${WRONG_MASTER_IP_2}"

The user should wait until the command above returns empty result (or piping it to wc -l shows a 0).

Version-Release number of selected component (if applicable):

4.11.13

How reproducible:

Often during cluster restore if OVN-Kubernetes is in use and the nodes have not been deleted automatically but are also not ready (because there is no cloud provider or the broken masters are powered off).

Steps to Reproduce:

1. Follow steps and be unlucky
2.
3.

Actual results:

OVN-Kubernetes trying to start clustered databases

Expected results:

OVN-Kubernetes working after step 13, so that machine-api can work during step 14 (if required in the environment) and/or the procedure can safely continue, in general.

Additional info:

Not relevant for the documentation fix redaction, but just for the record, it is likely that the reasons why it takes several minutes for the CNO to re-bootstrap OVN-K after node deletions are (or include):
- If network-operator pod was in one of the dead masters, it can take long until the newer one spawn in the surviving master can acquire the lease and become active.
- CNO introduces an intentional delay on reconciling OVN-Kubernetes if the number of masters is smaller than 3, because it assumes it may need to wait for other masters to be installed (it is not the case for this procedure, though).

Assignee:: Tami Love

Reporter:: Pablo Alonso Rodriguez

Need Info From:: Ge Liu

Contributors:: None

QA Contact:: Ge Liu

Doc Contact:: Tami Love

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/03/02 4:20 PM

Updated:: 2025/07/27 11:37 PM

Resolved:: 2023/04/19 3:47 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide