Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10342

Installation fails if < 3 workers defined and number of compute replicas not set

    XMLWordPrintable

Details

    • No
    • Agent Sprint 233, Sprint 235, Sprint 236, Sprint 238
    • 4
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

    Description

      Description of problem:

      This may be something we want to either add a validation for or document. It was initially found at a customer site but I've also confirmed it happens with just a Compact config with no workers. 
      
      They created an agent-config.yaml with 2 worker nodes but did not set the replicas in install-config.yaml, i.e. they did not set 
      compute:
      - hyperthreading: Enabled
        name: worker
        replicas: {{ num_workers }} 
      
      This resulted in an install failure as by default 3 worker replicas are created if not defined
      https://github.com/openshift/installer/blob/master/pkg/types/defaults/machinepools.go#L11
      
      See the attached console screenshot showing that the expected number of hosts doesn't match the actual.
      
      I've also duplicated this with a compact config. We can see that the install failed as start-cluster-installation.sh is looking for 6 hosts.
      
      [core@master-0 ~]$ sudo systemctl status start-cluster-installation.service
      ● start-cluster-installation.service - Service that starts cluster installation
         Loaded: loaded (/etc/systemd/system/start-cluster-installation.service; enabled; vendor preset: enabled)
         Active: activating (start) since Wed 2023-03-15 14:40:04 UTC; 3min 41s ago
       Main PID: 3365 (start-cluster-i)
          Tasks: 5 (limit: 101736)
         Memory: 1.7M
         CGroup: /system.slice/start-cluster-installation.service
                 ├─3365 /bin/bash /usr/local/bin/start-cluster-installation.sh
                 ├─5124 /bin/bash /usr/local/bin/start-cluster-installation.sh
                 ├─5132 /bin/bash /usr/local/bin/start-cluster-installation.sh
                 └─5138 diff /tmp/tmp.vIq1jH9Vf2 /etc/issue.d/90_start-install.issueMar 15 14:42:54 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
      Mar 15 14:43:04 master-0 start-cluster-installation.sh[4746]: Hosts known and ready for cluster installation (3/6)
      Mar 15 14:43:04 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
      Mar 15 14:43:15 master-0 start-cluster-installation.sh[4980]: Hosts known and ready for cluster installation (3/6)
      Mar 15 14:43:15 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
      Mar 15 14:43:25 master-0 start-cluster-installation.sh[5026]: Hosts known and ready for cluster installation (3/6)
      Mar 15 14:43:25 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
      Mar 15 14:43:35 master-0 start-cluster-installation.sh[5079]: Hosts known and ready for cluster installation (3/6)
      Mar 15 14:43:35 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
      Mar 15 14:43:45 master-0 start-cluster-installation.sh[5124]: Hosts known and ready for cluster installation (3/6)
      
      Since the compute section in install-config.yaml is optional we can't assume that it will be there 
      https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go#L126

      Version-Release number of selected component (if applicable):

      4.12

      How reproducible:

       

      Steps to Reproduce:

      1. Remove the compute section from install-config.yaml
      2. Do an install
      3. See the failure
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              bfournie@redhat.com Robert Fournier
              bfournie@redhat.com Robert Fournier
              Biagio Manzari Biagio Manzari
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: