• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • 8
    • False
    • Hide

      None

      Show
      None
    • True
    • 5
    • HCIDOCS 2024#9, HCIDOCS 2024#10, HCIDOCS 2024#11
    • 3

      Task scope: Change node names to "node-0..n" because role-based node names (master/worker-0..n) causes confusion.

      Section to fix: 12.6 Installing a primary control plane node on an unhealthy cluster.

      Unhealthy control plane node (master-2) is replaced by healthy control plane node (node-6).

      Step 1:

      NAME       STATUS       ROLES    AGE   VERSION
      worker-1   Ready        worker   20h   v1.24.0+3882f8f
      master-2   NotReady     master   20h   v1.24.0+3882f8f
      master-3   Ready        master   20h   v1.24.0+3882f8f
      worker-4   Ready        worker   20h   v1.24.0+3882f8f
      master-5   Ready        worker   15h   v1.24.0+3882f8f

      Rename the nodes (original control plane nodes are numbered 0-2). Put them first to make it easier to read.

      NAME       STATUS       ROLES    AGE   VERSION
      node-0     Ready        master   20h   v1.24.0+3882f8f
      node-1     NotReady     master   20h   v1.24.0+3882f8f
      node-2     Ready        master   20h   v1.24.0+3882f8f
      node-3     Ready        worker   20h   v1.24.0+3882f8f
      node-4     Ready        worker   15h   v1.24.0+3882f8

      Update node names in commands in subsequent steps.

      "node-1" (NotReady) is removed. The new node will be "node-6":

      NAME       STATUS     ROLES    AGE   VERSION
      node-0     Ready      master   20h   v1.24.0+3882f8f
      node-2     Ready      master   20h   v1.24.0+3882f8f
      node-3     Ready      worker   20h   v1.24.0+3882f8f
      node-4     Ready      worker   15h   v1.24.0+3882f8f
      node-6     Ready      master   40m   v1.24.0+3882f8f

      12.5. Installing a primary control plane node on a healthy cluster

      Healthy control plane node (master-0) is replaced by healthy control plane node (node-6).

      Step 3:

      NAME       STATUS   ROLES    AGE     VERSION
      master-0   Ready    master   4h42m   v1.24.0+3882f8f
      worker-1   Ready    worker   4h29m   v1.24.0+3882f8f
      master-2   Ready    master   4h43m   v1.24.0+3882f8f
      master-3   Ready    master   4h27m   v1.24.0+3882f8f
      worker-4   Ready    worker   4h30m   v1.24.0+3882f8f
      master-5   Ready    master   105s    v1.24.0+3882f8f
      

      Rename the nodes, with control plane nodes first:

      NAME       STATUS   ROLES    AGE     VERSION
      node-0     Ready    master   4h42m   v1.24.0+3882f8f
      node-1     Ready    master   4h29m   v1.24.0+3882f8f 
      node-2     Ready    master   4h43m   v1.24.0+3882f8f 
      node-3     Ready    master   4h27m   v1.24.0+3882f8f
      node-4     Ready    worker   4h30m   v1.24.0+3882f8f
      node-5     Ready    worker   105s    v1.24.0+3882f8f
      

      Update commands in subsequent steps.

      New node will be added. This will be "node-6".

      Step 4: `$ bash link-machine-and-node.sh custom-master3 worker-5` <= I think worker-5 is going to be node-6.

      Step 11.1: `$oc delete bmh -n openshift-machine-api custom-master3` <= Should this be "$ oc delete bmh -n openshift-machine-api node-0"?

      Step 11.iv, after node is deleted (as currently documented). `master-0` (now node-0) is gone:

      NAME       STATUS   ROLES    AGE   VERSION
      worker-1   Ready    worker   19h   v1.24.0+3882f8f
      master-2   Ready    master   20h   v1.24.0+3882f8f
      master-3   Ready    master   19h   v1.24.0+3882f8f
      worker-4   Ready    worker   19h   v1.24.0+3882f8f
      master-5   Ready    master   15h   v1.24.0+3882f8f

      I think the output would probably look like this. node-0 (aka master-0) is gone and node-6 (master) is ready.

      NAME       STATUS   ROLES    AGE     VERSION
      node-1     Ready    master   4h42m   v1.24.0+3882f8f
      node-2     Ready    master   4h29m   v1.24.0+3882f8f 
      node-3     Ready    master   4h43m   v1.24.0+3882f8f 
      node-4     Ready    worker   4h27m   v1.24.0+3882f8f
      node-5     Ready    worker   4h30m   v1.24.0+3882f8f
      node-6     Ready    master   105s    v1.24.0+3882f8f

       


      Source: email from Oved Ourfali:

      The names of the nodes do cause a major confusion there, abusing the master/worker names.

      I think that in order to make it more clear, we should probably:

      1. Have all node names as "node-X" (node-0/1/2/3/4/5/6).
      2. When a new node gets added, it gets the next number (in our case we're adding a worker node that then becomes a master node, iiuc, so it should be called node-6).
      3. nodes 0/1/2 are the original master nodes. In 12.6 one of those gets replaced with node-6. In 12.5 one gets added (node-6).

      In addition, we should have QE verify the steps to make sure those are right, but let's first get it more clear?

      [1] https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-primary-control-plane-node-healthy-cluster_expanding-the-cluster
      [2] https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-primary-control-plane-node-unhealthy-cluster_expanding-the-cluster

              rhn-support-tshwartz Talia Shwartzberg
              oourfali Oved Ourfali
              Crystal Chun
              Crystal Chun Crystal Chun
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: