-
Bug
-
Resolution: Done
-
Major
-
ACM 2.7.0
Description of problem:
ACM 2.7 / Submariner 0.14.1
During submariner deployment on Azure platform, incorrect number of gateway nodes has been applied.
SubmarinerConfig has been set with 1 gateway node to be created.
apiVersion: submarineraddon.open-cluster-management.io/v1alpha1 kind: SubmarinerConfig metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"submarineraddon.open-cluster-management.io/v1alpha1","kind":"SubmarinerConfig","metadata":{"annotations":{},"name":"submariner","namespace":"mbabushk-az"},"spec":{"IPSecNATTPort":4505,"cableDriver":"libreswan","credentialsSecret":{"name":"mbabushk-az-azure-creds"},"gatewayConfig":{"aws":{"instanceType":"c5d.large"},"gateways":1},"loadBalancerEnable":false,"subscriptionConfig":{"channel":"stable-0.14","source":"submariner-catalog","sourceNamespace":"openshift-marketplace","startingCSV":"submariner.v0.14.1"}}} creationTimestamp: "2022-12-20T13:57:54Z" finalizers: - submarineraddon.open-cluster-management.io/config-cleanup generation: 2 name: submariner namespace: mbabushk-az resourceVersion: "535251" uid: 2da38a81-25e6-447f-90e3-673b2524f449 spec: IPSecIKEPort: 500 IPSecNATTPort: 4505 NATTDiscoveryPort: 4900 NATTEnable: true airGappedDeployment: false cableDriver: libreswan credentialsSecret: name: mbabushk-az-azure-creds gatewayConfig: aws: instanceType: c5d.large azure: instanceType: Standard_D4s_v3 gateways: 1 gcp: instanceType: n1-standard-4 rhos: instanceType: PnTAE.CPU_16_Memory_32768_Disk_80 imagePullSpecs: {} insecureBrokerConnection: false loadBalancerEnable: false subscriptionConfig: channel: stable-0.14 source: submariner-catalog sourceNamespace: openshift-marketplace startingCSV: submariner.v0.14.1 status: conditions: - lastTransitionTime: "2022-12-20T13:58:09Z" message: SubmarinerConfig was applied reason: SubmarinerConfigApplied status: "True" type: SubmarinerConfigApplied - lastTransitionTime: "2022-12-20T13:58:31Z" message: Submariner cluster environment was prepared reason: SubmarinerClusterEnvPrepared status: "True" type: SubmarinerClusterEnvironmentPrepared - lastTransitionTime: "2022-12-20T14:01:35Z" message: 1 node(s) ("mbabushk-az-pm564-subgw-centralus-3-z5fng") are labeled as gateways reason: Success status: "True" type: SubmarinerGatewaysLabeled managedClusterInfo: clusterName: mbabushk-az infraId: mbabushk-az-pm564 networkType: OpenShiftSDN platform: Azure region: centralus vendor: OpenShift vendorVersion: 4.10.45
But 3 gateway nodes have been created instead:
$ KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc get nodes -l submariner.io/gateway NAME STATUS ROLES AGE VERSION mbabushk-az-pm564-subgw-centralus-1-jk9x7 Ready worker 5h28m v1.23.12+8a6bfe4 mbabushk-az-pm564-subgw-centralus-2-b9jmv Ready worker 5h28m v1.23.12+8a6bfe4 mbabushk-az-pm564-subgw-centralus-3-z5fng Ready worker 5h28m v1.23.12+8a6bfe4
$ KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc -n submariner-operator get pods NAME READY STATUS RESTARTS AGE submariner-addon-d6c94587-rzv4q 1/1 Running 0 5h32m submariner-gateway-28sdq 1/1 Running 0 5h28m submariner-gateway-jt76c 1/1 Running 0 5h28m submariner-gateway-qpd9x 1/1 Running 0 5h11m submariner-globalnet-2857n 1/1 Running 0 5h11m submariner-globalnet-mr557 1/1 Running 0 5h28m submariner-globalnet-rp6qn 1/1 Running 0 5h28m submariner-lighthouse-agent-846b4f4f9d-6bvd4 1/1 Running 0 5h31m submariner-lighthouse-coredns-7c446bb7f4-fzq7n 1/1 Running 0 5h31m submariner-lighthouse-coredns-7c446bb7f4-zr8lw 1/1 Running 0 5h31m submariner-metrics-proxy-55j78 2/2 Running 0 5h28m submariner-metrics-proxy-ckzhs 2/2 Running 0 5h29m submariner-metrics-proxy-q9lnj 2/2 Running 0 5h28m submariner-operator-6ccdd4f58-jjc4z 1/1 Running 0 5h32m submariner-routeagent-4gmjp 1/1 Running 0 5h28m submariner-routeagent-5scph 1/1 Running 0 5h28m submariner-routeagent-9jrtd 1/1 Running 0 5h10m submariner-routeagent-bwj6f 1/1 Running 0 5h31m submariner-routeagent-cxkr5 1/1 Running 0 5h31m submariner-routeagent-fdt8d 1/1 Running 0 5h31m submariner-routeagent-n9dkr 1/1 Running 0 5h31m submariner-routeagent-x8vg4 1/1 Running 0 5h31m submariner-routeagent-xtrjc 1/1 Running 0 5h31m
Two issues have been found in logs:
In log of the cluster machineset-controller, repeating line of machine count set check:
I1220 13:58:27.967672 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-3, need 1, creating 1 I1220 13:58:27.967761 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) ) I1220 13:58:53.215748 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-2, need 1, creating 1 I1220 13:58:53.215770 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) ) I1220 13:59:10.273560 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-1, need 1, creating 1 I1220 13:59:10.273584 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) )
Full log attached.
And in the log machine-controller, gateway node creation error for each node, while the gateway node was actually created.
I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate"
Full log attached.
Subctl gather logs attached.
- is cloned by
-
ACM-3875 [ACM 2.6.5] Submariner: - Azure platform spawns incorrect number of gateway nodes
- Closed