Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Cloud Compute / Unknown
Labels:
- pre-merge-tested

Severity:
Low
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
When you include more than three failure domains in the ControlPlaneMachineSet definition, the load balancing algorithm does not prioritize existing control plane machines. If you add a fourth failure domain that is alphabetically higher in precedence than the existing three failure domains to the definition, the fourth failure domain takes precedence over any existing failure domains. This behavior can apply rolling forward updates to a control plane machine. You can prevent this issue by setting existing in-use failure domains to a higher precedence than the new and unused failure domains. This action stabilizes each control plane machine during the course of adding more than three failure domains to the definition. (~~OCPBUGS-11968~~)

Show
When you include more than three failure domains in the ControlPlaneMachineSet definition, the load balancing algorithm does not prioritize existing control plane machines. If you add a fourth failure domain that is alphabetically higher in precedence than the existing three failure domains to the definition, the fourth failure domain takes precedence over any existing failure domains. This behavior can apply rolling forward updates to a control plane machine. You can prevent this issue by setting existing in-use failure domains to a higher precedence than the new and unused failure domains. This action stabilizes each control plane machine during the course of adding more than three failure domains to the definition. ( OCPBUGS-11968 )
Release Note Type:
Known Issue
Release Note Status:
Done
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-7921~~. The following is the description of the original issue:
—
Description of problem:

Tested on gcp, there are 4 failureDomains a, b, c, f in CPMS, remove one a, a new master will be created in f. If readd f to CPMS, instance will be moved back from f to a

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Before update cpms.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f
$ oc get machine                  
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h4m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   90m
zhsungcp22-4glmq-master-plch8-1   Running   n2-standard-4   us-central1   us-central1-a   11m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h

1. Delete failureDomain "zone: us-central1-a" in cpms, new machine Running in zone f.
      failureDomains:
        gcp:
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f 
$ oc get machine              
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h19m
zhsungcp22-4glmq-master-b7pdl-1   Running   n2-standard-4   us-central1   us-central1-f   13m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   106m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h16m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h16m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h16m
2. Add failureDomain "zone: us-central1-a" again, new machine running in zone a, the machine in zone f will be deleted.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-f
        - zone: us-central1-c
        - zone: us-central1-b
$ oc get machine                          
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h35m
zhsungcp22-4glmq-master-5kltp-1   Running   n2-standard-4   us-central1   us-central1-a   12m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   121m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h32m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h32m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h32m

Actual results:

Instance is moved back from f to a

Expected results:

Instance shouldn't be moved back from f to a

Additional info:

https://issues.redhat.com//browse/OCPBUGS-7366

clones

OCPBUGS-7921 Instance shouldn't be moved back from f to a

Closed

is blocked by

OCPBUGS-7921 Instance shouldn't be moved back from f to a

Closed

is cloned by

OCPBUGS-12440 Instance shouldn't be moved back from f to a

Closed

is depended on by

OCPBUGS-12440 Instance shouldn't be moved back from f to a

Closed

links to

openshift/cluster-control-plane-machine-set-operator#198: [release-4.13] OCPBUGS-11968: Prioritise machine mapping over alphabetical mapping

Assignee:: Joel Speed

Reporter:: OpenShift Prow Bot

QA Contact:: Zhaohua Sun

Doc Contact:: Darragh Fitzmaurice

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/04/19 7:18 AM

Updated:: 2023/05/17 10:44 PM

Resolved:: 2023/05/17 10:44 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates