Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-3875

[ACM 2.6.5] Submariner: - Azure platform spawns incorrect number of gateway nodes

    XMLWordPrintable

Details

    Description

      Description of problem:

      ACM 2.7 / Submariner 0.14.1

      During submariner deployment on Azure platform, incorrect number of gateway nodes has been applied.

      SubmarinerConfig has been set with 1 gateway node to be created.

      apiVersion: submarineraddon.open-cluster-management.io/v1alpha1
      kind: SubmarinerConfig
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"submarineraddon.open-cluster-management.io/v1alpha1","kind":"SubmarinerConfig","metadata":{"annotations":{},"name":"submariner","namespace":"mbabushk-az"},"spec":{"IPSecNATTPort":4505,"cableDriver":"libreswan","credentialsSecret":{"name":"mbabushk-az-azure-creds"},"gatewayConfig":{"aws":{"instanceType":"c5d.large"},"gateways":1},"loadBalancerEnable":false,"subscriptionConfig":{"channel":"stable-0.14","source":"submariner-catalog","sourceNamespace":"openshift-marketplace","startingCSV":"submariner.v0.14.1"}}}
        creationTimestamp: "2022-12-20T13:57:54Z"
        finalizers:
        - submarineraddon.open-cluster-management.io/config-cleanup
        generation: 2
        name: submariner
        namespace: mbabushk-az
        resourceVersion: "535251"
        uid: 2da38a81-25e6-447f-90e3-673b2524f449
      spec:
        IPSecIKEPort: 500
        IPSecNATTPort: 4505
        NATTDiscoveryPort: 4900
        NATTEnable: true
        airGappedDeployment: false
        cableDriver: libreswan
        credentialsSecret:
          name: mbabushk-az-azure-creds
        gatewayConfig:
          aws:
            instanceType: c5d.large
          azure:
            instanceType: Standard_D4s_v3
          gateways: 1
          gcp:
            instanceType: n1-standard-4
          rhos:
            instanceType: PnTAE.CPU_16_Memory_32768_Disk_80
        imagePullSpecs: {}
        insecureBrokerConnection: false
        loadBalancerEnable: false
        subscriptionConfig:
          channel: stable-0.14
          source: submariner-catalog
          sourceNamespace: openshift-marketplace
          startingCSV: submariner.v0.14.1
      status:
        conditions:
        - lastTransitionTime: "2022-12-20T13:58:09Z"
          message: SubmarinerConfig was applied
          reason: SubmarinerConfigApplied
          status: "True"
          type: SubmarinerConfigApplied
        - lastTransitionTime: "2022-12-20T13:58:31Z"
          message: Submariner cluster environment was prepared
          reason: SubmarinerClusterEnvPrepared
          status: "True"
          type: SubmarinerClusterEnvironmentPrepared
        - lastTransitionTime: "2022-12-20T14:01:35Z"
          message: 1 node(s) ("mbabushk-az-pm564-subgw-centralus-3-z5fng") are labeled as
            gateways
          reason: Success
          status: "True"
          type: SubmarinerGatewaysLabeled
        managedClusterInfo:
          clusterName: mbabushk-az
          infraId: mbabushk-az-pm564
          networkType: OpenShiftSDN
          platform: Azure
          region: centralus
          vendor: OpenShift
          vendorVersion: 4.10.45

       

      But 3 gateway nodes have been created instead:

      $  KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc get nodes -l submariner.io/gateway
      NAME                                        STATUS   ROLES    AGE     VERSION
      mbabushk-az-pm564-subgw-centralus-1-jk9x7   Ready    worker   5h28m   v1.23.12+8a6bfe4
      mbabushk-az-pm564-subgw-centralus-2-b9jmv   Ready    worker   5h28m   v1.23.12+8a6bfe4
      mbabushk-az-pm564-subgw-centralus-3-z5fng   Ready    worker   5h28m   v1.23.12+8a6bfe4 
      $  KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc -n submariner-operator get pods
      NAME                                             READY   STATUS    RESTARTS   AGE
      submariner-addon-d6c94587-rzv4q                  1/1     Running   0          5h32m
      submariner-gateway-28sdq                         1/1     Running   0          5h28m
      submariner-gateway-jt76c                         1/1     Running   0          5h28m
      submariner-gateway-qpd9x                         1/1     Running   0          5h11m
      submariner-globalnet-2857n                       1/1     Running   0          5h11m
      submariner-globalnet-mr557                       1/1     Running   0          5h28m
      submariner-globalnet-rp6qn                       1/1     Running   0          5h28m
      submariner-lighthouse-agent-846b4f4f9d-6bvd4     1/1     Running   0          5h31m
      submariner-lighthouse-coredns-7c446bb7f4-fzq7n   1/1     Running   0          5h31m
      submariner-lighthouse-coredns-7c446bb7f4-zr8lw   1/1     Running   0          5h31m
      submariner-metrics-proxy-55j78                   2/2     Running   0          5h28m
      submariner-metrics-proxy-ckzhs                   2/2     Running   0          5h29m
      submariner-metrics-proxy-q9lnj                   2/2     Running   0          5h28m
      submariner-operator-6ccdd4f58-jjc4z              1/1     Running   0          5h32m
      submariner-routeagent-4gmjp                      1/1     Running   0          5h28m
      submariner-routeagent-5scph                      1/1     Running   0          5h28m
      submariner-routeagent-9jrtd                      1/1     Running   0          5h10m
      submariner-routeagent-bwj6f                      1/1     Running   0          5h31m
      submariner-routeagent-cxkr5                      1/1     Running   0          5h31m
      submariner-routeagent-fdt8d                      1/1     Running   0          5h31m
      submariner-routeagent-n9dkr                      1/1     Running   0          5h31m
      submariner-routeagent-x8vg4                      1/1     Running   0          5h31m
      submariner-routeagent-xtrjc                      1/1     Running   0          5h31m 

      Two issues have been found in logs:
      In log of the cluster machineset-controller, repeating line of machine count set check:

      I1220 13:58:27.967672       1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-3, need 1, 
      creating 1
      I1220 13:58:27.967761       1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) )
      
      I1220 13:58:53.215748       1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-2, need 1, creating 1
      I1220 13:58:53.215770       1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) )
      
      I1220 13:59:10.273560       1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-1, need 1, creating 1
      I1220 13:59:10.273584       1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) ) 

      Full log attached.

      And in the log machine-controller, gateway node creation error for each node, while the gateway node was actually created.

      I1220 13:58:34.960285       1 logr.go:252] events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create
       vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind"
      :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet
      a1","resourceVersion":"79170"} "reason"="FailedCreate"
      
      I1220 13:58:34.960285       1 logr.go:252] events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create
       vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind"
      :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet
      a1","resourceVersion":"79170"} "reason"="FailedCreate"
      
      I1220 13:58:34.960285       1 logr.go:252] events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create
       vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind"
      :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet
      a1","resourceVersion":"79170"} "reason"="FailedCreate"
      
      I1220 13:58:34.960285       1 logr.go:252] events "msg"="Warning"  "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create
       vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind"
      :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet
      a1","resourceVersion":"79170"} "reason"="FailedCreate" 

      Full log attached.

       

      Subctl gather logs attached.

       

      Attachments

        Issue Links

          Activity

            People

              asuryana Aswin Suryanarayanan
              mbabushk@redhat.com Maxim Babushkin
              Maxim Babushkin Maxim Babushkin
              ACM QE Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: