Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37791

[GCP CAPI]Ingress controller complains the worker is in the wrongSubnetwork in disconnected+private install

XMLWordPrintable

    • Important
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When running a disconnected + private GCP CAPI cluster installation(example failed job https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-gcp-ipi-disc-priv-capi-amd-mixarch-f28-destructive/1818856018684153856), it failed with the following error:
      
      level=info msg=Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.383level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/XXXXXXXXXXXX/zones/us-central1-a/instances/ci-op-2033xdkq-530e8-qmljx-worker-a-qfcz9' is expected to be in the subnetwork 'projects/XXXXXXXXXXXX/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-master-subnet' but is in the subnetwork 'projects/XXXXXXXXXXXX/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-worker-subnet'., wrongSubnetwork384level=error msg=The cloud-controller-manager logs may contain more details.)
      
      
      In the cloud-controller-manager pod log, 
      https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-multi-nightly-gcp-ipi-disc-priv-capi-amd-mixarch-f28-destructive/1818856018684153856/artifacts/gcp-ipi-disc-priv-capi-amd-mixarch-f28-destructive/gather-extra/artifacts/pods/openshift-cloud-controller-manager_gcp-cloud-controller-manager-9fc7bffdc-f5w5w_cloud-controller-manager.log
      
      I0801 05:55:37.949118       1 gce_loadbalancer_internal.go:612] ensureInternalInstanceGroup(k8s-ig--544be3f7b5733dc5, us-central1-a): adding nodes: [ci-op-2033xdkq-530e8-qmljx-worker-a-qfcz9]
      E0801 05:55:38.165808       1 gce_loadbalancer.go:206] Failed to EnsureLoadBalancer(ci-op-2033xdkq-530e8-qmljx, openshift-ingress, router-default, aab35500360fd4a6ab2c840364bb35d8, us-central1), err: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-a/instances/ci-op-2033xdkq-530e8-qmljx-worker-a-qfcz9' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-worker-subnet'., wrongSubnetwork
      E0801 05:55:38.165878       1 controller.go:298] error processing service openshift-ingress/router-default (retrying with exponential backoff): failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-a/instances/ci-op-2033xdkq-530e8-qmljx-worker-a-qfcz9' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-worker-subnet'., wrongSubnetwork
      I0801 05:55:38.165986       1 event.go:389] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-a/instances/ci-op-2033xdkq-530e8-qmljx-worker-a-qfcz9' is expected to be in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-master-subnet' but is in the subnetwork 'projects/openshift-qe/regions/us-central1/subnetworks/ci-op-2033xdkq-530e8-worker-subnet'., wrongSubnetwork"
      
      it has the same error about worker is in the worker subnet, but expected it to be located in the master subnet. 
      
      If the above prow job link is not available for you, please see https://drive.google.com/drive/folders/1ftukRDUR6hBYPvwpwBJRYNZDxoUIVZWR?usp=drive_link for the must-gather logs collected for another same failure job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/300573/ 
        

      Version-Release number of selected component (if applicable):

       4.17.0-0.nightly-multi-2024-07-31-212714

      How reproducible:

         Always for CAPI install, confirmed it's working well in Terraform with the same cluster configuration

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            joelspeed Joel Speed
            rh-ee-gpei Gaoyun Pei
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: