Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29114

Installer creates CPMS incorrectly for vSphere IPI when static IPs are configured

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when installing a cluster on vSphere with static IP addresses, the cluster could create control plane machines without static IP addresses due to a conflict with other Technology Preview features. With this update, the Control Plane Machine Set Operator correctly manages the static IP assignment for control plane machines. (link:https://issues.redhat.com/browse/OCPBUGS-29114[*OCPBUGS-29114*])
      Show
      * Previously, when installing a cluster on vSphere with static IP addresses, the cluster could create control plane machines without static IP addresses due to a conflict with other Technology Preview features. With this update, the Control Plane Machine Set Operator correctly manages the static IP assignment for control plane machines. (link: https://issues.redhat.com/browse/OCPBUGS-29114 [* OCPBUGS-29114 *])
    • Bug Fix
    • Done

      Description of problem:

      When installing a new vSphere cluster with static IPs, control plane machine sets (CPMS) are also enabled in TechPreviewNoUpgrade and the installer applies the incorrect config to the CPMS resulting in masters being recreated.

      Version-Release number of selected component (if applicable):

      4.15

      How reproducible:

      always

      Steps to Reproduce:

      1. create install-config.yaml with static IPs following documentation
      2. run `openshift-install create cluster`
      3. as install progresses, watch the machines definitions
          

      Actual results:

      new master machines are created

      Expected results:

      all machines are the same as what was created by the installer.

      Additional info:

          

            [OCPBUGS-29114] Installer creates CPMS incorrectly for vSphere IPI when static IPs are configured

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Neil Girard added a comment -

            rhn-support-puplench / rhn-support-wyahia , Reviewing the other JIRA, you can see changes are already made and are being tested.  Should be available in the near future.  The fix just barely missed 4.15.0.

            Neil Girard added a comment - rhn-support-puplench / rhn-support-wyahia , Reviewing the other JIRA, you can see changes are already made and are being tested.  Should be available in the near future.  The fix just barely missed 4.15.0.

            Shang Gao added a comment -

            Thanks for the info, after pre-merge testing with https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/275, master nodes are not reconciled anymore by CPMS, move to verified.

            Shang Gao added a comment - Thanks for the info, after pre-merge testing with https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/275 , master nodes are not reconciled anymore by CPMS, move to verified.

            Neil Girard added a comment - - edited

            Let me take a look.  There is another bug that is being addressed that this may be hitting.

             

            Looks like CPMSO is logging:

            I0219 07:46:32.076950       1 updates.go:478] "msg"="Machine requires an update" "controller"="controlplanemachineset" "diff"=["Workspace.ResourcePool: /DEVQEdatacenter/host/DEVQEcluster//Resources != /DEVQEdatacenter/host/DEVQEcluster/Resources"] "index"=2 "name"="sgao-devqe-vblw8-master-2" "namespace"="openshift-machine-api" "reconcileID"="5f47f5a5-0a90-4168-bfcc-dae0fad9b953" "updateStrategy"="RollingUpdate" 

            This bug is being addressed in https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/275

            The Jira is https://issues.redhat.com//browse/SPLAT-1440

            Neil Girard added a comment - - edited Let me take a look.  There is another bug that is being addressed that this may be hitting.   Looks like CPMSO is logging: I0219 07:46:32.076950 1 updates.go:478] "msg" = "Machine requires an update" "controller" = "controlplanemachineset" "diff" =[ "Workspace.ResourcePool: /DEVQEdatacenter/host/DEVQEcluster //Resources != /DEVQEdatacenter/host/DEVQEcluster/Resources" ] "index" =2 "name" = "sgao-devqe-vblw8-master-2" "namespace" = "openshift-machine-api" "reconcileID" = "5f47f5a5-0a90-4168-bfcc-dae0fad9b953" "updateStrategy" = "RollingUpdate" This bug is being addressed in https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/275 The Jira is https://issues.redhat.com//browse/SPLAT-1440

            Shang Gao added a comment - - edited

            Here's cpms log https://privatebin.corp.redhat.com/?0fe4ef2af2f17636#CQbMBZWzCKmhTp3Ae7of1EvBS7C1vKbznJrJja8aLdAG

            I'm using static IP defined as

                hosts:
                - role: bootstrap
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.120/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
                - role: control-plane
                #  failureDomain: us-east-1a
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.121/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
                - role: control-plane
                #  failureDomain: us-east-1b
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.122/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
                - role: control-plane
                #  failureDomain: us-east-1c
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.123/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
                - role: compute
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.124/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
                - role: compute
                  networkDevice:
                    ipAddrs:
                    - 192.168.221.125/24
                    gateway: 192.168.221.1
                    nameservers:
                    - 192.168.221.1
            

            Shang Gao added a comment - - edited Here's cpms log https://privatebin.corp.redhat.com/?0fe4ef2af2f17636#CQbMBZWzCKmhTp3Ae7of1EvBS7C1vKbznJrJja8aLdAG I'm using static IP defined as hosts: - role: bootstrap networkDevice: ipAddrs: - 192.168.221.120/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1 - role: control-plane # failureDomain: us-east-1a networkDevice: ipAddrs: - 192.168.221.121/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1 - role: control-plane # failureDomain: us-east-1b networkDevice: ipAddrs: - 192.168.221.122/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1 - role: control-plane # failureDomain: us-east-1c networkDevice: ipAddrs: - 192.168.221.123/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1 - role: compute networkDevice: ipAddrs: - 192.168.221.124/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1 - role: compute networkDevice: ipAddrs: - 192.168.221.125/24 gateway: 192.168.221.1 nameservers: - 192.168.221.1

            Shang Gao added a comment -

            rhn-support-ngirard Hello, this bug still exist on OCP 4.16.0-0.nightly-2024-02-19-050601, master nodes are reconciled by dhcp IP nodes, could you help to take a look? Thanks.

            [root@preserve-sgao-vsphere8 static-ip]# oc get nodes
            NAME                              STATUS                     ROLES                  AGE   VERSION
            sgao-devqe-vblw8-master-1         Ready,SchedulingDisabled   control-plane,master   46m   v1.29.1+edc2c12
            sgao-devqe-vblw8-master-2         Ready                      control-plane,master   46m   v1.29.1+edc2c12
            sgao-devqe-vblw8-master-5wqzx-0   Ready                      control-plane,master   35m   v1.29.1+edc2c12
            sgao-devqe-vblw8-master-m4mnt-1   Ready                      control-plane,master   15m   v1.29.1+edc2c12
            sgao-devqe-vblw8-worker-0         Ready                      worker                 35m   v1.29.1+edc2c12
            sgao-devqe-vblw8-worker-1         Ready                      worker                 35m   v1.29.1+edc2c12
            [root@preserve-sgao-vsphere8 static-ip]# oc get machine -n openshift-machine-api
            NAME                              PHASE      TYPE   REGION   ZONE   AGE
            sgao-devqe-vblw8-master-1         Deleting                          48m
            sgao-devqe-vblw8-master-2         Running                           48m
            sgao-devqe-vblw8-master-5wqzx-0   Running                           42m
            sgao-devqe-vblw8-master-m4mnt-1   Running                           18m
            sgao-devqe-vblw8-worker-0         Running                           47m
            sgao-devqe-vblw8-worker-1         Running                           48m
            

            Shang Gao added a comment - rhn-support-ngirard Hello, this bug still exist on OCP 4.16.0-0.nightly-2024-02-19-050601, master nodes are reconciled by dhcp IP nodes, could you help to take a look? Thanks. [root@preserve-sgao-vsphere8 static -ip]# oc get nodes NAME STATUS ROLES AGE VERSION sgao-devqe-vblw8-master-1 Ready,SchedulingDisabled control-plane,master 46m v1.29.1+edc2c12 sgao-devqe-vblw8-master-2 Ready control-plane,master 46m v1.29.1+edc2c12 sgao-devqe-vblw8-master-5wqzx-0 Ready control-plane,master 35m v1.29.1+edc2c12 sgao-devqe-vblw8-master-m4mnt-1 Ready control-plane,master 15m v1.29.1+edc2c12 sgao-devqe-vblw8-worker-0 Ready worker 35m v1.29.1+edc2c12 sgao-devqe-vblw8-worker-1 Ready worker 35m v1.29.1+edc2c12 [root@preserve-sgao-vsphere8 static -ip]# oc get machine -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE sgao-devqe-vblw8-master-1 Deleting 48m sgao-devqe-vblw8-master-2 Running 48m sgao-devqe-vblw8-master-5wqzx-0 Running 42m sgao-devqe-vblw8-master-m4mnt-1 Running 18m sgao-devqe-vblw8-worker-0 Running 47m sgao-devqe-vblw8-worker-1 Running 48m

            Hi rhn-support-ngirard,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi rhn-support-ngirard , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Neil Girard added a comment - - edited

            Changes have been made and PRs are open.  Waiting on review / merging of PRs for 4.16 and then will backport to 4.15.

            Neil Girard added a comment - - edited Changes have been made and PRs are open.  Waiting on review / merging of PRs for 4.16 and then will backport to 4.15.

            Vikas Laad added a comment - - edited

            periodic-ci-openshift-release-master-nightly-X.X-e2e-vsphere-static-ovn is still failing.

            Vikas Laad added a comment - - edited periodic-ci-openshift-release-master-nightly-X.X-e2e-vsphere-static-ovn is still failing.

            Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the "Target Backport Versions" field to indicate which version(s) will receive the fix.

            OpenShift Jira Bot added a comment - Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the " Target Backport Versions " field to indicate which version(s) will receive the fix.

              rhn-support-ngirard Neil Girard
              rhn-support-ngirard Neil Girard
              Shang Gao Shang Gao
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: