Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Assisted Installer SaaS
Labels:
- ContentX

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
ContentX for Assisted Installer
Intelligence Requested:
Market:

Original story points:
5
Sprint:
HCIDOCS 2024#11, HCIDOCS 2024#12
sprint_count:
2

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Installing a primary control plane node on an unhealthy cluster

The title, use case, and prerequisites for this procedure are confusing. The procedure is based on 411-unhealthy.md, which describes replacing a control plane node in a cluster with 2 healthy and 1 unhealthy CP node. (Prerequisite about having a day 2 control plane makes no sense because this is the procedure for creating a day 2 control plane node.)

Task scope:

New title: "Replacing a control plane node in an unhealthy cluster". Remove "primary" because it does not make sense.
Use case: "You can replace an unhealthy control plane node in a cluster with 3 control plane nodes."
Prerequisites:
- You have installed OpenShift Container Platform 4.11 or later, with the required etcd-operator version.
- You have added a host to the cluster by using the UI or the API.
- You have added the annotation role:master to the host to create a new control plane node.
'oc rsh' commands. Will be fixed with ~~HCIDOCS-518~~.
"Confirm initial state of the cluster:" -> "Check the node status to verify that a control plane node is not available:"
"Confirm the etcd-operator detects the cluster as unhealthy:" > "Check the etcd-operator log to verify that a control plane node is not available:"
"Confirm the etcdctl members: " > "Open a remote shell connection to etcd-worker-3"
Add "List the etcdctl members:" # etcdctl member list -w table
"Confirm that etcdctl reports an unhealthy member of the cluster: " > "Check the etcdetl endpoint health" -Fix prompt - # etcdctl endpoint health
"Remove the unhealthy control plane by deleting the Machine Custom Resource:" > "Remove the unhealthy control plane node by deleting the Machine custom resource (CR): ". Move note to next step.
"Confirm that etcd-operator has not removed the unhealthy machine: " > "Check the etcd-operator log to verify that the machine CR was deleted:"
"Note: The Machine and Node Custom Resources (CRs) will not be deleted if the unhealthy cluster cannot run successfully." > "Note: The Machine and Node objects might not be deleted because they are protected by finalizers. If this occurs, you must delete the Machine CR manually."
"Remove the unhealthy etcdctl member manually: " > "Open a remote shell connection to ..."
Add "Get a list of the etcdctl members:" - # etcdctl member list -w table
"Confirm that etcdctl reports an unhealthy member of the cluster: " > "Check the etcdetl endpoint health"
"Remove the unhealthy cluster by deleting the etcdctl member Custom Resource: " > "Remove the unhealthy etcdctl member from the cluster:"
"Confirm members of etcdctl by running the following command: " > "Verify that the unhealthy etcdctl member was removed by running the following command:"
"Confirm ready status of the control plane node: " > "Check the node status to verify that all control plane nodes are available:"
"Validate the Machine, Node and BareMetalHost Custom Resources. " - HOW? This step has no commands. Is it in the wrong place?
"Create Machine Custom Resource linked with BareMetalHost and Node. " Is this sentence perhaps intended to be a lead-in for a sub procedure? (Add BMH, Add Machine, link BMH, Machine, and Node).
"Add BareMetalHost Custom Resource: " > "Create a BareMetalHost CR for the new control plane node:"
"Add Machine Custom Resource: " > "Create a Machine CR for the new control plane node: "
"Link BareMetalHost, Machine, and Node by running the link-machine-and-node.sh script: " > "Save the link-machine-and-node.sh script on your local machine:"
Add "Make the link-machine-and-node.sh script executable by running the following command:
$ chmod +x link-machine-and-node.sh"
Add "Link the BareMetalHost CR, the Machine CR, and the control plane node by running the link-machine-and-node.sh script: " + command
"Remove the unhealthy etcdctl member manually: " > "Open a remote shell connection to ..."
Add "Get a list of the etcdctl members:" - # etcdctl member list -w table
"Confirm the etcd-operator configuration applies to all nodes:" -> "Monitor the etcd-operator configuration process:". The configuration can take a long time. That's why the user needs this command.
"Confirm health of etcdctl: " > "Open a remote shell connection to etcd-worker-3"
Add "Check the etcdetl endpoint health: # etcdctl endpoint health"
"Confirm the health of the ClusterOperators: " > "Verify that the Operators are available:"
"Confirm the ClusterVersion: " > "Verify that the cluster version is correct:"

clones

HCIDOCS-522 Fix "Installing CP node on healthy cluster" procedure

Closed

is related to

HCIDOCS-412 Fix node names

Closed

relates to

HCIDOCS-555 The node names mentioned on 'Installing a primary control plane node on a healthy cluster' documentation looks incorrect.

Closed

mentioned on

Merge request - Resolve HCIDOCS-412-new: Fix node names new

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty