-
Bug
-
Resolution: Done
-
Critical
-
4.12
-
None
-
None
-
3
-
WINC - Sprint 232
-
1
-
Rejected
-
False
-
Description of problem:
When creating services in a OVN-HybridOverlay cluster with Windows workers, we are experiencing intermittent reachability issues for the external-ip when the number of pods from the expose deployment is bigger than 1: [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get svc -n winc-38186 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE win-webserver LoadBalancer 172.30.38.192 34.136.170.199 80:30246/TCP 41m cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get deploy -n winc-38186 NAME READY UP-TO-DATE AVAILABLE AGE win-webserver 6/6 6 6 42m [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get pods -n winc-38186 NAME READY STATUS RESTARTS AGE win-webserver-597fb4c9cc-8ccwg 1/1 Running 0 6s win-webserver-597fb4c9cc-f54x5 1/1 Running 0 6s win-webserver-597fb4c9cc-jppxb 1/1 Running 0 97s win-webserver-597fb4c9cc-twn9b 1/1 Running 0 6s win-webserver-597fb4c9cc-x5rfr 1/1 Running 0 6s win-webserver-597fb4c9cc-z8sfv 1/1 Running 0 6s [cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 curl: (7) Failed to connect to 34.136.170.199 port 80: Connection timed out [cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 curl: (7) Failed to connect to 34.136.170.199 port 80: Connection timed out When having a look at the Load Balancer service, we can see that the externalTrafficPolicy is of type "Cluster": [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get svc -n winc-38186 win-webserver -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2022-11-25T13:29:00Z" finalizers: - service.kubernetes.io/load-balancer-cleanup labels: app: win-webserver name: win-webserver namespace: winc-38186 resourceVersion: "169364" uid: 4a229123-ee88-47b6-99ce-814522803ad8 spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.38.192 clusterIPs: - 172.30.38.192 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - nodePort: 30246 port: 80 protocol: TCP targetPort: 80 selector: app: win-webserver sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: 34.136.170.199 Recreating the Service setting externalTrafficPolicy to Local seems to solve the issue: $ oc describe svc win-webserver -n winc-38186 Name: win-webserver Namespace: winc-38186 Labels: app=win-webserver Annotations: <none> Selector: app=win-webserver Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.38.192 IPs: 172.30.38.192 LoadBalancer Ingress: 34.136.170.199 Port: <unset> 80/TCP TargetPort: 80/TCP NodePort: <unset> 30246/TCP Endpoints: 10.132.0.18:80,10.132.0.19:80,10.132.0.20:80 + 3 more... Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalTrafficPolicy 66m service-controller Cluster -> Local Normal EnsuringLoadBalancer 63m (x3 over 113m) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 63m service-controller Local -> Cluster Normal EnsuredLoadBalancer 62m (x3 over 113m) service-controller Ensured load balancer $ oc get svc -n winc-test NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE linux-webserver LoadBalancer 172.30.175.95 34.136.11.87 8080:30715/TCP 152m win-check LoadBalancer 172.30.50.151 35.194.12.34 80:31725/TCP 4m33s win-webserver LoadBalancer 172.30.15.95 35.226.129.1 80:30409/TCP 152m [cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 While the other service which has externalTrafficPolicy set to "Cluster" is still failing: [cloud-user@preserve-jfrancoa tmp]$ curl 35.226.129.1 curl: (7) Failed to connect to 35.226.129.1 port 80: Connection timed out
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-11-24-203151 True False 7h2m Cluster version is 4.12.0-0.nightly-2022-11-24-203151 $ oc get network cluster -o yaml apiVersion: config.openshift.io/v1 kind: Network metadata: creationTimestamp: "2022-11-25T06:56:50Z" generation: 2 name: cluster resourceVersion: "2952" uid: e9ad729c-36a4-4e71-9a24-740352b11234 spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 externalIP: policy: {} networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 status: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 clusterNetworkMTU: 1360 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16
How reproducible:
Always, sometimes it takes more curl calls to the External IP, but it always ends up timeouting
Steps to Reproduce:
1. Deploy a Windows cluster with OVN-Hybrid overlay on GCP, the following Jenkins job can be used for it: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/158926/ 2. Create a deployment and a service, for example: kind: Service metadata: labels: app: win-check name: win-check namespace: winc-test spec: #externalTrafficPolicy: Local ports: - port: 80 targetPort: 80 selector: app: win-check type: LoadBalancer --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: win-check name: win-check namespace: winc-test spec: replicas: 6 selector: matchLabels: app: win-check template: metadata: labels: app: win-check name: win-check spec: containers: - command: - pwsh.exe - -command - $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); }; image: mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022 name: win-check securityContext: runAsNonRoot: false windowsOptions: runAsUserName: ContainerAdministrator nodeSelector: kubernetes.io/os: windows tolerations: - key: os value: Windows 3.Get the external IP for the service: $ oc get svc -n winc-test NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE linux-webserver LoadBalancer 172.30.175.95 34.136.11.87 8080:30715/TCP 94m win-check LoadBalancer 172.30.82.251 35.239.175.209 80:30530/TCP 29s win-webserver LoadBalancer 172.30.15.95 35.226.129.1 80:30409/TCP 94m 4. Try to curl the external-ip: $ curl 35.239.175.209 curl: (7) Failed to connect to 35.239.175.209 port 80: Connection timed out
Actual results:
The Load Balancer IP is not reachable, thus impacting in the service availability
Expected results:
The Load Balancer IP is available at all times
Additional info: