Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11968

Instance shouldn't be moved back from f to a

XMLWordPrintable

    • Low
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      When you include more than three failure domains in the ControlPlaneMachineSet definition, the load balancing algorithm does not prioritize existing control plane machines. If you add a fourth failure domain that is alphabetically higher in precedence than the existing three failure domains to the definition, the fourth failure domain takes precedence over any existing failure domains. This behavior can apply rolling forward updates to a control plane machine. You can prevent this issue by setting existing in-use failure domains to a higher precedence than the new and unused failure domains. This action stabilizes each control plane machine during the course of adding more than three failure domains to the definition. (OCPBUGS-11968)
      Show
      When you include more than three failure domains in the ControlPlaneMachineSet definition, the load balancing algorithm does not prioritize existing control plane machines. If you add a fourth failure domain that is alphabetically higher in precedence than the existing three failure domains to the definition, the fourth failure domain takes precedence over any existing failure domains. This behavior can apply rolling forward updates to a control plane machine. You can prevent this issue by setting existing in-use failure domains to a higher precedence than the new and unused failure domains. This action stabilizes each control plane machine during the course of adding more than three failure domains to the definition. ( OCPBUGS-11968 )
    • Known Issue
    • Done

      This is a clone of issue OCPBUGS-7921. The following is the description of the original issue:

      Description of problem:

      Tested on gcp, there are 4 failureDomains a, b, c, f in CPMS, remove one a, a new master will be created in f. If readd f to CPMS, instance will be moved back from f to a

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      Before update cpms.
            failureDomains:
              gcp:
              - zone: us-central1-a
              - zone: us-central1-b
              - zone: us-central1-c
              - zone: us-central1-f
      $ oc get machine                  
      NAME                              PHASE     TYPE            REGION        ZONE            AGE
      zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h4m
      zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   90m
      zhsungcp22-4glmq-master-plch8-1   Running   n2-standard-4   us-central1   us-central1-a   11m
      zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h
      zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h
      zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h
      
      1. Delete failureDomain "zone: us-central1-a" in cpms, new machine Running in zone f.
            failureDomains:
              gcp:
              - zone: us-central1-b
              - zone: us-central1-c
              - zone: us-central1-f 
      $ oc get machine              
      NAME                              PHASE     TYPE            REGION        ZONE            AGE
      zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h19m
      zhsungcp22-4glmq-master-b7pdl-1   Running   n2-standard-4   us-central1   us-central1-f   13m
      zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   106m
      zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h16m
      zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h16m
      zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h16m
      2. Add failureDomain "zone: us-central1-a" again, new machine running in zone a, the machine in zone f will be deleted.
            failureDomains:
              gcp:
              - zone: us-central1-a
              - zone: us-central1-f
              - zone: us-central1-c
              - zone: us-central1-b
      $ oc get machine                          
      NAME                              PHASE     TYPE            REGION        ZONE            AGE
      zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h35m
      zhsungcp22-4glmq-master-5kltp-1   Running   n2-standard-4   us-central1   us-central1-a   12m
      zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   121m
      zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h32m
      zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h32m
      zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h32m  

      Actual results:

      Instance is moved back from f to a

      Expected results:

      Instance shouldn't be moved back from f to a

      Additional info:

      https://issues.redhat.com//browse/OCPBUGS-7366

              joelspeed Joel Speed
              openshift-crt-jira-prow OpenShift Prow Bot
              Zhaohua Sun Zhaohua Sun
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: