Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2151

machine-api-operator degraded during 3+1 deployment due to minimum worker replica count is 2

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • N/A per dev
    • Set a Value

    Description

      Description of problem:

      Agent based installation fails during the 3+1 deployment. I found that the machine-api-operator degraded due to minimum worker replica count is 2 and for 3+1 deployment we need to define one worker node.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create agent.iso (openshift-install agent create image) using install-config.yaml and agent-config.yaml (PFA sample files)
      2. Deploy a 3+1 cluster using agent.iso
      3. Execute "openshift-install agent wait-for install-complete" command to wait for install complete. 

      Actual results:

      Getting below error:
      ERROR Cluster operator kube-controller-manager Degraded is True with GarbageCollector_Error: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host 
      INFO Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.12.0-0.nightly-2022-10-05-053337 
      ERROR Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.12.0-0.nightly-2022-10-05-053337 because minimum worker replica count (2) not yet met: current running replicas 1, waiting for [] 
      INFO Cluster operator machine-api Available is False with Initializing: Operator is initializing 
      INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. 
      ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas 
      INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
      INFO Cluster operator network ManagementStateDegraded is False with :  
      ERROR Cluster initialization failed because one or more operators are not functioning properly. 
      ERROR 				The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
      ERROR 				https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 

      Expected results:

      3+1 deployment should be successful.

      Additional info:

      I found that there is a condition in the machine-api-operator to check that the worker node count should be 2 which is preventing the 3+1 deployment.
      https://github.com/openshift/machine-api-operator/blob/master/pkg/operator/sync.go#L322 

      Attachments

        Issue Links

          Activity

            People

              zabitter Zane Bitter
              rhn-support-mhans Manoj Hans
              Manoj Hans Manoj Hans
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: