This is a clone of issue OCPBUGS-26761. The following is the description of the original issue:
—
Description of problem:
Workloads on Windows Server 2019, are having packet loss. When a pod health check is trying to reach a Windows webserver pod IP, the destination IP of reply packets are getting changed to some other IP(pod IP) causing the packet to never reach the host layer.
SourceVIP on the HNS Load Balancer is being set to the host IP. This should not be the case.
Version-Release number of selected component (if applicable):
OCP 4.12
How reproducible:
Occurs only on Windows 2019 after several hours of runtime.
Steps to Reproduce:
1. Add a Windows 2019 node to OCP cluster
2. Deploy 5-10 Windows pods behind a cluster IP
kind: Deployment
apiVersion: apps/v1
metadata:
name: win-webserverlog-2019
labels:
app: win-webserver-log-2019
spec:
replicas: 5
selector:
matchLabels:
app: win-webserver-log-2019
template:
metadata:
name: win-webserverlog-2019
labels:
app: win-webserver-log-2019
spec:
nodeSelector:
kubernetes.io/os: windows
restartPolicy: Always
runtimeClassName: windows-2019
containers:
- resources: {}
readinessProbe:
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 2
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
terminationMessagePath: /dev/termination-log
name: windowswebserverlog
livenessProbe:
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 2
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
imagePullPolicy: IfNotPresent
volumeMounts:
- name: foo
mountPath: 'C:\Temp\pod'
terminationMessagePolicy: File
image: 'mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019'
volumes:
- name: foo
hostPath:
path: 'C:\Temp\pod'
type: ''
dnsPolicy: ClusterFirst
tolerations:
- key: os
value: Windows
---
kind: Service
apiVersion: v1
metadata:
name: windows-service-2019
spec:
ipFamilies:
- IPv4
ports:
- protocol: TCP
port: 8080
targetPort: 80
internalTrafficPolicy: Cluster
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
app: win-webserver-log-2019
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: windows-2019
handler: 'runhcs-wcow-process'
scheduling:
nodeSelector:
kubernetes.io/os: 'windows'
kubernetes.io/arch: 'amd64'
node.kubernetes.io/windows-build: '10.0.17763'
tolerations:
- effect: NoSchedule
key: os
operator: Equal
value: "Windows"
3. With no changes, pods will restart on their own due within 1-2 hours due to health probe timeouts
4. More the number of pods, more the changes of pods restarting quickly
Actual results:
All the Pod restarts or a good number of pods restarts on a node due to Kube health check timeout
Expected results:
Healthcheck packets from Kubelet should not be dropped and result in UN-necessary pod restarts
Additional info:
- blocks
-
OCPBUGS-28226 Packet loss on Windows 2019 nodes
-
- Closed
-
- clones
-
OCPBUGS-26761 Packet loss on Windows 2019 nodes
-
- Closed
-
- is blocked by
-
OCPBUGS-26761 Packet loss on Windows 2019 nodes
-
- Closed
-
- is cloned by
-
OCPBUGS-28226 Packet loss on Windows 2019 nodes
-
- Closed
-
- links to
-
RHSA-2023:120235
Red Hat OpenShift for Windows Containers 10.15.0 security release
- mentioned on