This is a clone of issue OCPBUGS-26761. The following is the description of the original issue:
—
Description of problem:
Workloads on Windows Server 2019, are having packet loss. When a pod health check is trying to reach a Windows webserver pod IP, the destination IP of reply packets are getting changed to some other IP(pod IP) causing the packet to never reach the host layer. SourceVIP on the HNS Load Balancer is being set to the host IP. This should not be the case.
Version-Release number of selected component (if applicable):
OCP 4.12
How reproducible:
Occurs only on Windows 2019 after several hours of runtime.
Steps to Reproduce:
1. Add a Windows 2019 node to OCP cluster 2. Deploy 5-10 Windows pods behind a cluster IP kind: Deployment apiVersion: apps/v1 metadata: name: win-webserverlog-2019 labels: app: win-webserver-log-2019 spec: replicas: 5 selector: matchLabels: app: win-webserver-log-2019 template: metadata: name: win-webserverlog-2019 labels: app: win-webserver-log-2019 spec: nodeSelector: kubernetes.io/os: windows restartPolicy: Always runtimeClassName: windows-2019 containers: - resources: {} readinessProbe: httpGet: path: / port: 80 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 2 periodSeconds: 10 successThreshold: 1 failureThreshold: 10 terminationMessagePath: /dev/termination-log name: windowswebserverlog livenessProbe: httpGet: path: / port: 80 scheme: HTTP initialDelaySeconds: 10 timeoutSeconds: 2 periodSeconds: 10 successThreshold: 1 failureThreshold: 10 imagePullPolicy: IfNotPresent volumeMounts: - name: foo mountPath: 'C:\Temp\pod' terminationMessagePolicy: File image: 'mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019' volumes: - name: foo hostPath: path: 'C:\Temp\pod' type: '' dnsPolicy: ClusterFirst tolerations: - key: os value: Windows --- kind: Service apiVersion: v1 metadata: name: windows-service-2019 spec: ipFamilies: - IPv4 ports: - protocol: TCP port: 8080 targetPort: 80 internalTrafficPolicy: Cluster type: ClusterIP ipFamilyPolicy: SingleStack sessionAffinity: None selector: app: win-webserver-log-2019 --- apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: windows-2019 handler: 'runhcs-wcow-process' scheduling: nodeSelector: kubernetes.io/os: 'windows' kubernetes.io/arch: 'amd64' node.kubernetes.io/windows-build: '10.0.17763' tolerations: - effect: NoSchedule key: os operator: Equal value: "Windows" 3. With no changes, pods will restart on their own due within 1-2 hours due to health probe timeouts 4. More the number of pods, more the changes of pods restarting quickly
Actual results:
All the Pod restarts or a good number of pods restarts on a node due to Kube health check timeout
Expected results:
Healthcheck packets from Kubelet should not be dropped and result in UN-necessary pod restarts
Additional info:
- blocks
-
OCPBUGS-28226 Packet loss on Windows 2019 nodes
- Closed
- clones
-
OCPBUGS-26761 Packet loss on Windows 2019 nodes
- Closed
- is blocked by
-
OCPBUGS-26761 Packet loss on Windows 2019 nodes
- Closed
- is cloned by
-
OCPBUGS-28226 Packet loss on Windows 2019 nodes
- Closed
- links to
-
RHSA-2023:120235 Red Hat OpenShift for Windows Containers 10.15.0 security release
- mentioned on