Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7366

[gcp] New machine stuck in Provisioning when delete one zone from cpms on gcp with customer vpc

XMLWordPrintable

    • None
    • CLOUD Sprint 232
    • 1
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Bug Fix
    • Done

      Description of problem:

      New machine stuck in Provisioning when delete one zone from cpms on gcp , report "The resource 'projects/openshift-qe/global/networks/zhsun-gcp-wn984-network' was not found"

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-02-12-024338

      How reproducible:

      Always

      Steps to Reproduce:

      1. Set up an gcp private cluster, by default cpms contains a,b,c,f four failureDomains, 3 masters are in a,b,c
            failureDomains:
              gcp:
              - zone: us-central1-a
              - zone: us-central1-b
              - zone: us-central1-c
              - zone: us-central1-f
      $ oc get machine       
      NAME                             PHASE          TYPE            REGION        ZONE            AGE
      zhsun-gcp-wn984-master-0         Running        n2-standard-4   us-central1   us-central1-a   33m
      zhsun-gcp-wn984-master-1         Running        n2-standard-4   us-central1   us-central1-b   33m
      zhsun-gcp-wn984-master-2         Running        n2-standard-4   us-central1   us-central1-c   33m
      zhsun-gcp-wn984-worker-a-hlcmd   Running        n2-standard-4   us-central1   us-central1-a   27m
      zhsun-gcp-wn984-worker-b-4249t   Running        n2-standard-4   us-central1   us-central1-b   27m
      zhsun-gcp-wn984-worker-c-8qcjq   Running        n2-standard-4   us-central1   us-central1-c   27m
      2. Delete one failureDomain a, now failureDomains look like below:
            failureDomains:
              gcp:
              - zone: us-central1-b
              - zone: us-central1-c
              - zone: us-central1-f
      3. Check machines
      

      Actual results:

      New master stuck in Provisioning status. 
      $ oc get machine            
      NAME                             PHASE          TYPE            REGION        ZONE            AGE
      zhsun-gcp-wn984-master-0         Running        n2-standard-4   us-central1   us-central1-a   85m
      zhsun-gcp-wn984-master-1         Running        n2-standard-4   us-central1   us-central1-b   85m
      zhsun-gcp-wn984-master-2         Running        n2-standard-4   us-central1   us-central1-c   85m
      zhsun-gcp-wn984-master-mb7rw-0   Provisioning   n2-standard-4   us-central1   us-central1-f   52m
      zhsun-gcp-wn984-worker-a-hlcmd   Running        n2-standard-4   us-central1   us-central1-a   79m
      zhsun-gcp-wn984-worker-b-4249t   Running        n2-standard-4   us-central1   us-central1-b   79m
      zhsun-gcp-wn984-worker-c-8qcjq   Running        n2-standard-4   us-central1   us-central1-c   79m
       $ oc logs -f machine-api-controllers-6678fc6587-hdl5k -c machine-controller
      E0213 09:08:00.059876       1 actuator.go:54] zhsun-gcp-wn984-master-mb7rw-0 error: zhsun-gcp-wn984-master-mb7rw-0: reconciler failed to Update machine: failed to register instance to instance group: failed to ensure that instance group zhsun-gcp-wn984-master-us-central1-f is a proper instance group: failed to register the new instance group named zhsun-gcp-wn984-master-us-central1-f: instanceGroupInsert request failed: googleapi: Error 404: The resource 'projects/openshift-qe/global/networks/zhsun-gcp-wn984-network' was not found, notFound
      E0213 09:08:00.059929       1 controller.go:315] zhsun-gcp-wn984-master-mb7rw-0: error updating machine: zhsun-gcp-wn984-master-mb7rw-0: reconciler failed to Update machine: failed to register instance to instance group: failed to ensure that instance group zhsun-gcp-wn984-master-us-central1-f is a proper instance group: failed to register the new instance group named zhsun-gcp-wn984-master-us-central1-f: instanceGroupInsert request failed: googleapi: Error 404: The resource 'projects/openshift-qe/global/networks/zhsun-gcp-wn984-network' was not found, notFound, retrying in 30s seconds
      I0213 09:08:00.060001       1 recorder.go:103] events "msg"="zhsun-gcp-wn984-master-mb7rw-0: reconciler failed to Update machine: failed to register instance to instance group: failed to ensure that instance group zhsun-gcp-wn984-master-us-central1-f is a proper instance group: failed to register the new instance group named zhsun-gcp-wn984-master-us-central1-f: instanceGroupInsert request failed: googleapi: Error 404: The resource 'projects/openshift-qe/global/networks/zhsun-gcp-wn984-network' was not found, notFound" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun-gcp-wn984-master-mb7rw-0","uid":"b973d674-dd26-477d-a68d-6bcedc5f1011","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"36164"} "reason"="FailedUpdate" "type"="Warning"

      Expected results:

      New master should be Running

      Additional info:

       

            dodvarka@redhat.com Daniel Odvarka (Inactive)
            rhn-support-zhsun Zhaohua Sun
            Zhaohua Sun Zhaohua Sun
            Jeana Routh Jeana Routh
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: