Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-27335

Installation fails with 1 master and 2 workers as the console deployment set the number of replicas based on the InfrastructureTopology rather than the ControlPlaneTopology

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 4.16.0
    • 4.12.z
    • Management Console
    • None
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Description of problem:

      The node selector for the console deployment requires deploying it on the master nodes, The node selector for the console deployment requires deploying it on the master nodes, while the replica count is determined by the infrastructureTopology, which primarily tracks the workers' setup.
      
      When an OpenShift cluster is installed with a single master node and multiple workers, this leads the console deployment to request 2 replicas as infrastructureTopology is set to HighlyAvailable. Instead, ControlPlaneTopology is set to SingleReplica as expected.
      
      

      Version-Release number of selected component (if applicable):

      4.16

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Install an openshift cluster with 1 master and 2 workers
      

      Actual results:

      The installation fails as the replicas for the console deployment is set to 2.
      
        apiVersion: config.openshift.io/v1
        kind: Infrastructure
        metadata:
          creationTimestamp: "2024-01-18T08:34:47Z"
          generation: 1
          name: cluster
          resourceVersion: "517"
          uid: d89e60b4-2d9c-4867-a2f8-6e80207dc6b8
        spec:
          cloudConfig:
            key: config
            name: cloud-provider-config
          platformSpec:
            aws: {}
            type: AWS
        status:
          apiServerInternalURI: https://api-int.adstefa-a12.qe.devcluster.openshift.com:6443
          apiServerURL: https://api.adstefa-a12.qe.devcluster.openshift.com:6443
          controlPlaneTopology: SingleReplica
          cpuPartitioning: None
          etcdDiscoveryDomain: ""
          infrastructureName: adstefa-a12-6wlvm
          infrastructureTopology: HighlyAvailable
          platform: AWS
          platformStatus:
            aws:
              region: us-east-2
            type: AWS
      
      
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        annotations:
         .... 
        creationTimestamp: "2024-01-18T08:54:23Z"
        generation: 3
        labels:
          app: console
          component: ui
        name: console
        namespace: openshift-console
      spec:
        progressDeadlineSeconds: 600
        replicas: 2
      
      
      

      Expected results:

      The replica is set to 1, tracking the ControlPlaneTopology value instead of hte infrastructureTopology.

      Additional info:

          

            [OCPBUGS-27335] Installation fails with 1 master and 2 workers as the console deployment set the number of replicas based on the InfrastructureTopology rather than the ControlPlaneTopology

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Setting the 'Affects Version' to 4.12.z since this issues is affecting previous versions as well.

            https://issues.redhat.com/browse/OCPBUGS-31502

            Jakub Hadvig added a comment - Setting the 'Affects Version' to 4.12.z since this issues is affecting previous versions as well. https://issues.redhat.com/browse/OCPBUGS-31502

            Tested with payload 4.16.0-0.ci-2024-01-31-151542.
            1. Launch a normal cluster with 1 master node and 2 worker nodes. Check console operator, deployment and pods, replica is set as 1 and console pod number is 1.
            2. Launch hypershift cluster, set hosted cluster with one worker node, Check console operator, deployment and pods, replica is set as 1 and console pod number is 1.
            The bug is fixed.

            Yanping Zhang added a comment - Tested with payload 4.16.0-0.ci-2024-01-31-151542. 1. Launch a normal cluster with 1 master node and 2 worker nodes. Check console operator, deployment and pods, replica is set as 1 and console pod number is 1. 2. Launch hypershift cluster, set hosted cluster with one worker node, Check console operator, deployment and pods, replica is set as 1 and console pod number is 1. The bug is fixed.

            Alessandro Di Stefano added a comment - - edited

            Hi yanpzhan1 can you send the result of the following in the case of hypershift (in the guest/hosted cluster)?

             

            oc get infrastructure -o yaml 

            Alessandro Di Stefano added a comment - - edited Hi yanpzhan1 can you send the result of the following in the case of hypershift (in the guest/hosted cluster)?   oc get infrastructure -o yaml

            Using payload 4.16.0-0.nightly-2024-01-21-154905 to launch hypershift cluster, and configure hosted cluster only one worker node. Check on hosted cluster, the console operator is abnormal, and console deployment has replicas "2", but only one pod is in Running status.

            # oc get node
            NAME                          STATUS   ROLES    AGE   VERSION
            ip-10-0-142-37.ec2.internal   Ready    worker   61m   v1.29.0+f629574
            # oc get co console
            NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
            console   4.16.0-0.nightly-2024-01-21-154905   True        True          False      60m     SyncLoopRefreshProgressing: Working toward version 4.16.0-0.nightly-2024-01-21-154905, 1 replicas available
            # oc get node
            NAME                          STATUS   ROLES    AGE   VERSION
            ip-10-0-142-37.ec2.internal   Ready    worker   63m   v1.29.0+f629574
            # oc -n openshift-console get pod
            NAME                         READY   STATUS    RESTARTS   AGE
            console-544dc8c7d-klhc9      0/1     Pending   0          60m
            console-54bdd888cf-4wsc9     1/1     Running   0          61m
            console-5cdb687998-qg8p9     0/1     Pending   0          61m
            downloads-5bcd554dbf-nxvmx   1/1     Running   0          61m
            downloads-5bcd554dbf-wv4c4   0/1     Pending   0          61m
            # oc get deployment console -n openshift-console -ojsonpath='{.spec.replicas}'
            2
            # oc get deployment downloads -n openshift-console -ojsonpath='{.spec.replicas}'
            2
            # oc-n openshift-console get pod'
            NAME                         READY   STATUS    RESTARTS   AGE
            console-544dc8c7d-klhc9      0/1     Pending   0          74m
            console-54bdd888cf-4wsc9     1/1     Running   0          75m
            console-5cdb687998-qg8p9     0/1     Pending   0          75m
            downloads-5bcd554dbf-nxvmx   1/1     Running   0          75m
            downloads-5bcd554dbf-wv4c4   0/1     Pending   0          75m
            [root@MiWiFi-RB03-srv ~]# oc get deploy -n openshift-console
            NAME        READY   UP-TO-DATE   AVAILABLE   AGE
            console     1/2     1            1           76m
            downloads   1/2     2            1           76m
            

            The fix in pr is not suitable for hosted cluster, since console pods are deployed on worker nodes on this kind of cluster.
            rhn-support-adistefa could you help to consider the fix for condition when cluster is hosted cluster?

            Yanping Zhang added a comment - Using payload 4.16.0-0.nightly-2024-01-21-154905 to launch hypershift cluster, and configure hosted cluster only one worker node. Check on hosted cluster, the console operator is abnormal, and console deployment has replicas "2", but only one pod is in Running status. # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-142-37.ec2.internal Ready worker 61m v1.29.0+f629574 # oc get co console NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.16.0-0.nightly-2024-01-21-154905 True True False 60m SyncLoopRefreshProgressing: Working toward version 4.16.0-0.nightly-2024-01-21-154905, 1 replicas available # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-142-37.ec2.internal Ready worker 63m v1.29.0+f629574 # oc -n openshift-console get pod NAME READY STATUS RESTARTS AGE console-544dc8c7d-klhc9 0/1 Pending 0 60m console-54bdd888cf-4wsc9 1/1 Running 0 61m console-5cdb687998-qg8p9 0/1 Pending 0 61m downloads-5bcd554dbf-nxvmx 1/1 Running 0 61m downloads-5bcd554dbf-wv4c4 0/1 Pending 0 61m # oc get deployment console -n openshift-console -ojsonpath= '{.spec.replicas}' 2 # oc get deployment downloads -n openshift-console -ojsonpath= '{.spec.replicas}' 2 # oc-n openshift-console get pod' NAME READY STATUS RESTARTS AGE console-544dc8c7d-klhc9 0/1 Pending 0 74m console-54bdd888cf-4wsc9 1/1 Running 0 75m console-5cdb687998-qg8p9 0/1 Pending 0 75m downloads-5bcd554dbf-nxvmx 1/1 Running 0 75m downloads-5bcd554dbf-wv4c4 0/1 Pending 0 75m [root@MiWiFi-RB03-srv ~]# oc get deploy -n openshift-console NAME READY UP-TO-DATE AVAILABLE AGE console 1/2 1 1 76m downloads 1/2 2 1 76m The fix in pr is not suitable for hosted cluster, since console pods are deployed on worker nodes on this kind of cluster. rhn-support-adistefa could you help to consider the fix for condition when cluster is hosted cluster?

            Yanping Zhang added a comment - - edited

            Using payload 4.16.0-0.nightly-2024-01-21-154905 to launch a normal cluster with 1 master node and 2 worker nodes, the cluster is launched successfully. Check console operator, deployment and pods, replica is set as 1 and pod number is 1. They work as expected:
            $ oc get co console
            NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
            console 4.16.0-0.nightly-2024-01-21-154905 True False False 27m
            $ oc get deployment console -n openshift-console -ojsonpath='

            {.spec.replicas}

            '
            1
            $ oc get pod -n openshift-console
            NAME READY STATUS RESTARTS AGE
            console-5ffbb8644-lpnt5 1/1 Running 0 25m
            downloads-7597c9f7c-62sxs 1/1 Running 0 33m

            Yanping Zhang added a comment - - edited Using payload 4.16.0-0.nightly-2024-01-21-154905 to launch a normal cluster with 1 master node and 2 worker nodes, the cluster is launched successfully. Check console operator, deployment and pods, replica is set as 1 and pod number is 1. They work as expected: $ oc get co console NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.16.0-0.nightly-2024-01-21-154905 True False False 27m $ oc get deployment console -n openshift-console -ojsonpath=' {.spec.replicas} ' 1 $ oc get pod -n openshift-console NAME READY STATUS RESTARTS AGE console-5ffbb8644-lpnt5 1/1 Running 0 25m downloads-7597c9f7c-62sxs 1/1 Running 0 33m

            Hi jhadvig@redhat.com, are you ok with backporting the fix for a few releases?

            Alessandro Di Stefano added a comment - Hi jhadvig@redhat.com , are you ok with backporting the fix for a few releases?

            Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the "Target Backport Versions" field to indicate which version(s) will receive the fix.

            OpenShift Jira Bot added a comment - Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the " Target Backport Versions " field to indicate which version(s) will receive the fix.

              rhn-support-adistefa Alessandro Di Stefano
              rhn-support-adistefa Alessandro Di Stefano
              Yanping Zhang Yanping Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: