Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Cloud Compute / Unknown
Labels:
None

Severity:
Moderate
Regression:
None
Sprint:
CLOUD Sprint 228
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Delete/Add a failureDomain in CPMS to trigger update cannot work right on GCP

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-11-19-182111

How reproducible:

always

Steps to Reproduce:

1.Launch a 4.13 cluster on GCP
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2022-11-19-182111   True        False         80m     Cluster version is 4.13.0-0.nightly-2022-11-19-182111
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-gcp13c2.qe.gcp.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                 PHASE     TYPE            REGION        ZONE            AGE
huliu-gcp13c2-6sh7k-master-0         Running   n2-standard-4   us-central1   us-central1-a   102m
huliu-gcp13c2-6sh7k-master-1         Running   n2-standard-4   us-central1   us-central1-b   102m
huliu-gcp13c2-6sh7k-master-2         Running   n2-standard-4   us-central1   us-central1-c   102m
huliu-gcp13c2-6sh7k-worker-a-8sftf   Running   n2-standard-4   us-central1   us-central1-a   99m
huliu-gcp13c2-6sh7k-worker-b-zb48r   Running   n2-standard-4   us-central1   us-central1-b   99m
huliu-gcp13c2-6sh7k-worker-c-tlrzl   Running   n2-standard-4   us-central1   us-central1-c   99m
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset
NAME                           DESIRED   CURRENT   READY   AVAILABLE   AGE
huliu-gcp13c2-6sh7k-worker-a   1         1         1       1           102m
huliu-gcp13c2-6sh7k-worker-b   1         1         1       1           102m
huliu-gcp13c2-6sh7k-worker-c   1         1         1       1           102m
huliu-gcp13c2-6sh7k-worker-f   0         0                             102m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3       3                       Inactive   99m

2.Edit CPMS, change state to Active
liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster
controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   100m 

3.Edit CPMS, there are four failureDomains(us-central1-a,us-central1-b,us-central1-c,us-central1-f) by default, delete the first one(us-central1-a), found the new machine stuck in Provisioning

liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster
controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                 PHASE          TYPE            REGION        ZONE            AGE
huliu-gcp13c2-6sh7k-master-0         Running        n2-standard-4   us-central1   us-central1-a   104m
huliu-gcp13c2-6sh7k-master-1         Running        n2-standard-4   us-central1   us-central1-b   104m
huliu-gcp13c2-6sh7k-master-2         Running        n2-standard-4   us-central1   us-central1-c   104m
huliu-gcp13c2-6sh7k-master-gb5b4-0   Provisioning                                                 3s
huliu-gcp13c2-6sh7k-worker-a-8sftf   Running        n2-standard-4   us-central1   us-central1-a   101m
huliu-gcp13c2-6sh7k-worker-b-zb48r   Running        n2-standard-4   us-central1   us-central1-b   101m
huliu-gcp13c2-6sh7k-worker-c-tlrzl   Running        n2-standard-4   us-central1   us-central1-c   101m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                 PHASE          TYPE            REGION        ZONE            AGE
huliu-gcp13c2-6sh7k-master-0         Running        n2-standard-4   us-central1   us-central1-a   131m
huliu-gcp13c2-6sh7k-master-1         Running        n2-standard-4   us-central1   us-central1-b   131m
huliu-gcp13c2-6sh7k-master-2         Running        n2-standard-4   us-central1   us-central1-c   131m
huliu-gcp13c2-6sh7k-master-gb5b4-0   Provisioning   n2-standard-4   us-central1   us-central1-f   26m
huliu-gcp13c2-6sh7k-worker-a-8sftf   Running        n2-standard-4   us-central1   us-central1-a   127m
huliu-gcp13c2-6sh7k-worker-b-zb48r   Running        n2-standard-4   us-central1   us-central1-b   127m
huliu-gcp13c2-6sh7k-worker-c-tlrzl   Running        n2-standard-4   us-central1   us-central1-c   127m

machine-controller log:
E1121 05:10:15.654929       1 actuator.go:53] huliu-gcp13c2-6sh7k-master-gb5b4-0 error: huliu-gcp13c2-6sh7k-master-gb5b4-0: reconciler failed to Update machine: failed to register instance to instance group: failed to fetch running instances in instance group huliu-gcp13c2-6sh7k-master-us-central1-f: instanceGroupsListInstances request failed: googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/instanceGroups/huliu-gcp13c2-6sh7k-master-us-central1-f' was not found, notFound
E1121 05:10:15.655015       1 controller.go:315] huliu-gcp13c2-6sh7k-master-gb5b4-0: error updating machine: huliu-gcp13c2-6sh7k-master-gb5b4-0: reconciler failed to Update machine: failed to register instance to instance group: failed to fetch running instances in instance group huliu-gcp13c2-6sh7k-master-us-central1-f: instanceGroupsListInstances request failed: googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/instanceGroups/huliu-gcp13c2-6sh7k-master-us-central1-f' was not found, notFound, retrying in 30s seconds
I1121 05:10:15.655829       1 recorder.go:103] events "msg"="huliu-gcp13c2-6sh7k-master-gb5b4-0: reconciler failed to Update machine: failed to register instance to instance group: failed to fetch running instances in instance group huliu-gcp13c2-6sh7k-master-us-central1-f: instanceGroupsListInstances request failed: googleapi: Error 404: The resource 'projects/openshift-qe/zones/us-central1-f/instanceGroups/huliu-gcp13c2-6sh7k-master-us-central1-f' was not found, notFound" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-gcp13c2-6sh7k-master-gb5b4-0","uid":"008cbb45-2b29-493e-8985-37f87fe6a98d","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"60780"} "reason"="FailedUpdate" "type"="Warning" 

4.Edit CPMS, add the failureDomain(us-central1-a) back, found the machine stuck in Deleting

liuhuali@Lius-MacBook-Pro huali-test % oc edit controlplanemachineset cluster   controlplanemachineset.machine.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                 PHASE      TYPE            REGION        ZONE            AGE
huliu-gcp13c2-6sh7k-master-0         Running    n2-standard-4   us-central1   us-central1-a   3h37m
huliu-gcp13c2-6sh7k-master-1         Running    n2-standard-4   us-central1   us-central1-b   3h37m
huliu-gcp13c2-6sh7k-master-2         Running    n2-standard-4   us-central1   us-central1-c   3h37m
huliu-gcp13c2-6sh7k-master-gb5b4-0   Deleting   n2-standard-4   us-central1   us-central1-f   113m
huliu-gcp13c2-6sh7k-worker-a-8sftf   Running    n2-standard-4   us-central1   us-central1-a   3h34m
huliu-gcp13c2-6sh7k-worker-b-zb48r   Running    n2-standard-4   us-central1   us-central1-b   3h34m
huliu-gcp13c2-6sh7k-worker-c-tlrzl   Running    n2-standard-4   us-central1   us-central1-c   3h34m

Actual results:

When delete a failureDomain, the new machine stuck in Provisioning, when add the failureDomain back, the new machine stuck in Deleting

Expected results:

When delete a failureDomain, the new machine should get Running, when add the failureDomain back, the new machine should be deleted successfully,
Or if the machine cannot be created in the failureDomain, the new machine should be Failed when delete a failureDomain, and the machine should be deleted successfully when add the failureDomain back.

Additional info:

Must-gather: 
https://drive.google.com/file/d/1AxnVwToQ15g6M4Mc5S7rh62FygM44B6f/view?usp=sharing

worker machine created successfully in this failureDomain:
huliu-gcp13c2-6sh7k-worker-f-g5h77   Running    n2-standard-4   us-central1   us-central1-f   8m36s

links to

openshift/machine-api-provider-gcp#22: OCPBUGS-3904: Register unknown instance groups

Assignee:: Daniel Odvarka (Inactive)

Reporter:: Huali Liu

QA Contact:: Huali Liu

Doc Contact:: Jeana Routh

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/11/21 7:59 AM

Updated:: 2023/05/17 10:41 PM

Resolved:: 2023/05/17 10:41 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates