-
Bug
-
Resolution: Done
-
Critical
-
4.12
-
None
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
Rejected
-
WINC - Sprint 232
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When creating services in a OVN-HybridOverlay cluster with Windows workers, we are experiencing intermittent reachability issues for the external-ip when the number of pods from the expose deployment is bigger than 1: [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get svc -n winc-38186 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE win-webserver LoadBalancer 172.30.38.192 34.136.170.199 80:30246/TCP 41m cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get deploy -n winc-38186 NAME READY UP-TO-DATE AVAILABLE AGE win-webserver 6/6 6 6 42m [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get pods -n winc-38186 NAME READY STATUS RESTARTS AGE win-webserver-597fb4c9cc-8ccwg 1/1 Running 0 6s win-webserver-597fb4c9cc-f54x5 1/1 Running 0 6s win-webserver-597fb4c9cc-jppxb 1/1 Running 0 97s win-webserver-597fb4c9cc-twn9b 1/1 Running 0 6s win-webserver-597fb4c9cc-x5rfr 1/1 Running 0 6s win-webserver-597fb4c9cc-z8sfv 1/1 Running 0 6s [cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 curl: (7) Failed to connect to 34.136.170.199 port 80: Connection timed out [cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa openshift-tests-private]$ curl 34.136.170.199 curl: (7) Failed to connect to 34.136.170.199 port 80: Connection timed out When having a look at the Load Balancer service, we can see that the externalTrafficPolicy is of type "Cluster": [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get svc -n winc-38186 win-webserver -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2022-11-25T13:29:00Z" finalizers: - service.kubernetes.io/load-balancer-cleanup labels: app: win-webserver name: win-webserver namespace: winc-38186 resourceVersion: "169364" uid: 4a229123-ee88-47b6-99ce-814522803ad8 spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.38.192 clusterIPs: - 172.30.38.192 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - nodePort: 30246 port: 80 protocol: TCP targetPort: 80 selector: app: win-webserver sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: 34.136.170.199 Recreating the Service setting externalTrafficPolicy to Local seems to solve the issue: $ oc describe svc win-webserver -n winc-38186 Name: win-webserver Namespace: winc-38186 Labels: app=win-webserver Annotations: <none> Selector: app=win-webserver Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.38.192 IPs: 172.30.38.192 LoadBalancer Ingress: 34.136.170.199 Port: <unset> 80/TCP TargetPort: 80/TCP NodePort: <unset> 30246/TCP Endpoints: 10.132.0.18:80,10.132.0.19:80,10.132.0.20:80 + 3 more... Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalTrafficPolicy 66m service-controller Cluster -> Local Normal EnsuringLoadBalancer 63m (x3 over 113m) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 63m service-controller Local -> Cluster Normal EnsuredLoadBalancer 62m (x3 over 113m) service-controller Ensured load balancer $ oc get svc -n winc-test NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE linux-webserver LoadBalancer 172.30.175.95 34.136.11.87 8080:30715/TCP 152m win-check LoadBalancer 172.30.50.151 35.194.12.34 80:31725/TCP 4m33s win-webserver LoadBalancer 172.30.15.95 35.226.129.1 80:30409/TCP 152m [cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 <html><body><H1>Windows Container Web Server</H1></body></html>[cloud-user@preserve-jfrancoa tmp]$ curl 35.194.12.34 While the other service which has externalTrafficPolicy set to "Cluster" is still failing: [cloud-user@preserve-jfrancoa tmp]$ curl 35.226.129.1 curl: (7) Failed to connect to 35.226.129.1 port 80: Connection timed out
Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.nightly-2022-11-24-203151 True False 7h2m Cluster version is 4.12.0-0.nightly-2022-11-24-203151
$ oc get network cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
creationTimestamp: "2022-11-25T06:56:50Z"
generation: 2
name: cluster
resourceVersion: "2952"
uid: e9ad729c-36a4-4e71-9a24-740352b11234
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
externalIP:
policy: {}
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
status:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
clusterNetworkMTU: 1360
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
How reproducible:
Always, sometimes it takes more curl calls to the External IP, but it always ends up timeouting
Steps to Reproduce:
1. Deploy a Windows cluster with OVN-Hybrid overlay on GCP, the following Jenkins job can be used for it: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/158926/
2. Create a deployment and a service, for example:
kind: Service
metadata:
labels:
app: win-check
name: win-check
namespace: winc-test
spec:
#externalTrafficPolicy: Local
ports:
- port: 80
targetPort: 80
selector:
app: win-check
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: win-check
name: win-check
namespace: winc-test
spec:
replicas: 6
selector:
matchLabels:
app: win-check
template:
metadata:
labels:
app: win-check
name: win-check
spec:
containers:
- command:
- pwsh.exe
- -command
- $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/');
$listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening)
{ $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows
Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content);
$response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer,
0, $buffer.Length); $response.Close(); };
image: mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022
name: win-check
securityContext:
runAsNonRoot: false
windowsOptions:
runAsUserName: ContainerAdministrator
nodeSelector:
kubernetes.io/os: windows
tolerations:
- key: os
value: Windows
3.Get the external IP for the service:
$ oc get svc -n winc-test
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
linux-webserver LoadBalancer 172.30.175.95 34.136.11.87 8080:30715/TCP 94m
win-check LoadBalancer 172.30.82.251 35.239.175.209 80:30530/TCP 29s
win-webserver LoadBalancer 172.30.15.95 35.226.129.1 80:30409/TCP 94m
4. Try to curl the external-ip:
$ curl 35.239.175.209
curl: (7) Failed to connect to 35.239.175.209 port 80: Connection timed out
Actual results:
The Load Balancer IP is not reachable, thus impacting in the service availability
Expected results:
The Load Balancer IP is available at all times
Additional info:
- depends on
-
WINC-977 Update kube-proxy submodule to sdn-4.13-kubernetes-1.26.0
-
- Closed
-
- relates to
-
CORENET-2562 Kube 1.26 rebase for OpenShift SDN
-
- Closed
-
- links to