-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
-
False
-
-
-
Important
-
None
an 30 21:08:28.770: Ensure active gateway node "submariner-gateway-x4jm7" has established connections Jan 30 21:08:29.029: Found submariner endpoint for "submqe-aws": &v1.Endpoint{TypeMeta:v1.TypeMeta{Kind:"Endpoint", APIVersion:"submariner.io/v1"}, ObjectMeta:v1.ObjectMeta{Name:"submqe-aws-submariner-cable-submqe-aws-10-0-114-39", GenerateName:"", Namespace:"submariner-operator", SelfLink:"", UID:"c7c40b49-896c-4096-90fb-d423db0a7710", ResourceVersion:"690022", Generation:1, CreationTimestamp:time.Date(2025, time.January, 30, 20, 59, 57, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"submariner-gateway", Operation:"Update", APIVersion:"submariner.io/v1", Time:time.Date(2025, time.January, 30, 20, 59, 57, 0, time.Local), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc00072ed98), Subresource:""}}}, Spec:v1.EndpointSpec{ClusterID:"submqe-aws", CableName:"submariner-cable-submqe-aws-10-0-114-39", HealthCheckIP:"10.131.2.2", Hostname:"ip-10-0-114-39", Subnets:[]string{"172.30.0.0/16", "10.128.0.0/14"}, PrivateIP:"10.0.114.39", PublicIP:"3.16.152.78", NATEnabled:true, Backend:"libreswan", BackendConfig:map[string]string{"natt-discovery-port":"4490", "preferred-server":"false", "udp-port":"4505"}}} Jan 30 21:08:29.286: Performing fail-over to passive gateway Jan 30 21:08:29.543: Jan 30 21:08:29.542: INFO: ExecWithOptions &{Command:[sh -c echo 1 > /proc/sys/kernel/sysrq && echo b > /proc/sysrq-trigger] Namespace:submariner-operator PodName:submariner-gateway-x4jm7 ContainerName:submariner-gateway Stdin:<nil> CaptureStdout:false CaptureStderr:true PreserveWhitespace:false} Jan 30 21:09:31.017: Jan 30 21:09:31.017: INFO: Retrying due to error Timeout occurred
Basically the ExecWithOptions keeps timin out. I can see the pod restart and node also switching to NotReady. Gateway also failovers, but it keeps trying to ExecWithOptions 5 times [with 5 sec delay between each try]. I tried increasin timeout to even 1 minute and doesn't help.
not an issue on 4.16 so something changed with 4.17