-
Bug
-
Resolution: Done
-
Major
-
ACM 2.7.0
Description of problem:
ACM 2.7 / Submariner 0.14.1
During submariner deployment on Azure platform, incorrect number of gateway nodes has been applied.
SubmarinerConfig has been set with 1 gateway node to be created.
apiVersion: submarineraddon.open-cluster-management.io/v1alpha1
kind: SubmarinerConfig
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"submarineraddon.open-cluster-management.io/v1alpha1","kind":"SubmarinerConfig","metadata":{"annotations":{},"name":"submariner","namespace":"mbabushk-az"},"spec":{"IPSecNATTPort":4505,"cableDriver":"libreswan","credentialsSecret":{"name":"mbabushk-az-azure-creds"},"gatewayConfig":{"aws":{"instanceType":"c5d.large"},"gateways":1},"loadBalancerEnable":false,"subscriptionConfig":{"channel":"stable-0.14","source":"submariner-catalog","sourceNamespace":"openshift-marketplace","startingCSV":"submariner.v0.14.1"}}}
creationTimestamp: "2022-12-20T13:57:54Z"
finalizers:
- submarineraddon.open-cluster-management.io/config-cleanup
generation: 2
name: submariner
namespace: mbabushk-az
resourceVersion: "535251"
uid: 2da38a81-25e6-447f-90e3-673b2524f449
spec:
IPSecIKEPort: 500
IPSecNATTPort: 4505
NATTDiscoveryPort: 4900
NATTEnable: true
airGappedDeployment: false
cableDriver: libreswan
credentialsSecret:
name: mbabushk-az-azure-creds
gatewayConfig:
aws:
instanceType: c5d.large
azure:
instanceType: Standard_D4s_v3
gateways: 1
gcp:
instanceType: n1-standard-4
rhos:
instanceType: PnTAE.CPU_16_Memory_32768_Disk_80
imagePullSpecs: {}
insecureBrokerConnection: false
loadBalancerEnable: false
subscriptionConfig:
channel: stable-0.14
source: submariner-catalog
sourceNamespace: openshift-marketplace
startingCSV: submariner.v0.14.1
status:
conditions:
- lastTransitionTime: "2022-12-20T13:58:09Z"
message: SubmarinerConfig was applied
reason: SubmarinerConfigApplied
status: "True"
type: SubmarinerConfigApplied
- lastTransitionTime: "2022-12-20T13:58:31Z"
message: Submariner cluster environment was prepared
reason: SubmarinerClusterEnvPrepared
status: "True"
type: SubmarinerClusterEnvironmentPrepared
- lastTransitionTime: "2022-12-20T14:01:35Z"
message: 1 node(s) ("mbabushk-az-pm564-subgw-centralus-3-z5fng") are labeled as
gateways
reason: Success
status: "True"
type: SubmarinerGatewaysLabeled
managedClusterInfo:
clusterName: mbabushk-az
infraId: mbabushk-az-pm564
networkType: OpenShiftSDN
platform: Azure
region: centralus
vendor: OpenShift
vendorVersion: 4.10.45
But 3 gateway nodes have been created instead:
$ KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc get nodes -l submariner.io/gateway NAME STATUS ROLES AGE VERSION mbabushk-az-pm564-subgw-centralus-1-jk9x7 Ready worker 5h28m v1.23.12+8a6bfe4 mbabushk-az-pm564-subgw-centralus-2-b9jmv Ready worker 5h28m v1.23.12+8a6bfe4 mbabushk-az-pm564-subgw-centralus-3-z5fng Ready worker 5h28m v1.23.12+8a6bfe4
$ KUBECONFIG=logs/mbabushk-az-kubeconfig.yaml oc -n submariner-operator get pods NAME READY STATUS RESTARTS AGE submariner-addon-d6c94587-rzv4q 1/1 Running 0 5h32m submariner-gateway-28sdq 1/1 Running 0 5h28m submariner-gateway-jt76c 1/1 Running 0 5h28m submariner-gateway-qpd9x 1/1 Running 0 5h11m submariner-globalnet-2857n 1/1 Running 0 5h11m submariner-globalnet-mr557 1/1 Running 0 5h28m submariner-globalnet-rp6qn 1/1 Running 0 5h28m submariner-lighthouse-agent-846b4f4f9d-6bvd4 1/1 Running 0 5h31m submariner-lighthouse-coredns-7c446bb7f4-fzq7n 1/1 Running 0 5h31m submariner-lighthouse-coredns-7c446bb7f4-zr8lw 1/1 Running 0 5h31m submariner-metrics-proxy-55j78 2/2 Running 0 5h28m submariner-metrics-proxy-ckzhs 2/2 Running 0 5h29m submariner-metrics-proxy-q9lnj 2/2 Running 0 5h28m submariner-operator-6ccdd4f58-jjc4z 1/1 Running 0 5h32m submariner-routeagent-4gmjp 1/1 Running 0 5h28m submariner-routeagent-5scph 1/1 Running 0 5h28m submariner-routeagent-9jrtd 1/1 Running 0 5h10m submariner-routeagent-bwj6f 1/1 Running 0 5h31m submariner-routeagent-cxkr5 1/1 Running 0 5h31m submariner-routeagent-fdt8d 1/1 Running 0 5h31m submariner-routeagent-n9dkr 1/1 Running 0 5h31m submariner-routeagent-x8vg4 1/1 Running 0 5h31m submariner-routeagent-xtrjc 1/1 Running 0 5h31m
Two issues have been found in logs:
In log of the cluster machineset-controller, repeating line of machine count set check:
I1220 13:58:27.967672 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-3, need 1, creating 1 I1220 13:58:27.967761 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) ) I1220 13:58:53.215748 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-2, need 1, creating 1 I1220 13:58:53.215770 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) ) I1220 13:59:10.273560 1 controller.go:277] Too few replicas for machine.openshift.io/v1beta1, Kind=MachineSet openshift-machine-api/mbabushk-az-pm564-subgw-centralus-1, need 1, creating 1 I1220 13:59:10.273584 1 controller.go:283] Creating machine 1 of 1, ( spec.replicas(1) > currentMachineCount(0) )
Full log attached.
And in the log machine-controller, gateway node creation error for each node, while the gateway node was actually created.
I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate" I1220 13:58:34.960285 1 logr.go:252] events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"mbabushk-az-pm564-subgw-centralus-3-z5fng\"s: failed to create vm mbabushk-az-pm564-subgw-centralus-3-z5fng: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind" :"Machine","namespace":"openshift-machine-api","name":"mbabushk-az-pm564-subgw-centralus-3-z5fng","uid":"a39a5709-b0fc-4d93-af36-bea689832846","apiVersion":"machine.openshift.io/v1bet a1","resourceVersion":"79170"} "reason"="FailedCreate"
Full log attached.
Subctl gather logs attached.
- clones
-
ACM-2494 [ACM 2.7.2] Submariner - Azure platform spawns incorrect number of gateway nodes
-
- Closed
-