This is a clone of issue OCPBUGS-28700. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-28575. The following is the description of the original issue:
—
Description of problem:
Windows Nodes are occasionally failing to reach the ready state due to CNI issues
Version-Release number of selected component (if applicable):
How reproducible:
Able to reproduce in AWS
Steps to Reproduce:
1. SSH onto the Windows Node 2. In powershell run: rm /k/cni/config/cni.conf; New-Item /k/cni/config/cni.conf -type file 3. Wait for node to go to not ready
Actual results:
Windows Node goes to NotReady, and does not go back to Ready, the following can be seen in the logs: WMCO log: 2024-01-29T06:09:15Z DEBUG wc 10.0.193.178 run {"cmd": "C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator", "out": "I0129 06:07:50.347432 4868 cleanup.go:111] error getting services ConfigMap associated with version annotation, falling back to use latest services ConfigMap: node is missing version annotation\nI0129 06:09:15.448458 4868 cleanup.go:159] removed services: [\"kube-proxy\" \"csi-proxy\" \"hybrid-overlay-node\" \"windows_exporter\" \"kubelet\" \"containerd\"]\n"} kubelet log: I0129 06:06:16.789966 4484 kubelet.go:2393] "Container runtime status" status="Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" E0129 06:06:16.789966 4484 kubelet.go:2396] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" Node events are generated: { "lastHeartbeatTime": "2024-01-29T06:07:53Z", "lastTransitionTime": "2024-01-29T05:33:40Z", "message": "container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized", "reason": "KubeletNotReady", "status": "False", "type": "Ready" }
Expected results:
Windows Node goes to NotReady for less than 5 minutes, before returning to Ready
Additional info:
Has only been observed after the merge of https://github.com/openshift/windows-machine-config-operator/pull/1979 Manual workaround: SSH onto the Windows VM, delete /k/cni/config/cni.conf and wait a few minutes
- blocks
-
OCPBUGS-29004 Windows Nodes go to not ready due to empty cni config file
- Closed
- clones
-
OCPBUGS-28700 Windows Nodes go to not ready due to empty cni config file
- Closed
- is blocked by
-
OCPBUGS-28700 Windows Nodes go to not ready due to empty cni config file
- Closed
- is cloned by
-
OCPBUGS-29004 Windows Nodes go to not ready due to empty cni config file
- Closed
- links to
-
RHSA-2023:124870 Red Hat OpenShift for Windows Containers 9.0.1 security update
- mentioned on