This is a clone of issue OCPBUGS-28700. The following is the description of the original issue:
โ
This is a clone of issue OCPBUGS-28575. The following is the description of the original issue:
โ
Description of problem:
Windows Nodes are occasionally failing to reach the ready state due to CNI issues
Version-Release number of selected component (if applicable):
How reproducible:
Able to reproduce in AWS
Steps to Reproduce:
1. SSH onto the Windows Node 2. In powershell run: rm /k/cni/config/cni.conf; New-Item /k/cni/config/cni.conf -type file 3. Wait for node to go to not ready
Actual results:
Windows Node goes to NotReady, and does not go back to Ready, the following can be seen in the logs: WMCO log: 2024-01-29T06:09:15Z DEBUG wc 10.0.193.178 run {"cmd": "C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator", "out": "I0129 06:07:50.347432 4868 cleanup.go:111] error getting services ConfigMap associated with version annotation, falling back to use latest services ConfigMap: node is missing version annotation\nI0129 06:09:15.448458 4868 cleanup.go:159] removed services: [\"kube-proxy\" \"csi-proxy\" \"hybrid-overlay-node\" \"windows_exporter\" \"kubelet\" \"containerd\"]\n"} kubelet log: I0129 06:06:16.789966 4484 kubelet.go:2393] "Container runtime status" status="Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" E0129 06:06:16.789966 4484 kubelet.go:2396] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" Node events are generated: { "lastHeartbeatTime": "2024-01-29T06:07:53Z", "lastTransitionTime": "2024-01-29T05:33:40Z", "message": "container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized", "reason": "KubeletNotReady", "status": "False", "type": "Ready" }
Expected results:
Windows Node goes to NotReady for less than 5 minutes, before returning to Ready
Additional info:
Has only been observed after the merge of https://github.com/openshift/windows-machine-config-operator/pull/1979 Manual workaround: SSH onto the Windows VM, delete /k/cni/config/cni.conf and wait a few minutes
- blocks
-
OCPBUGS-29004 Windows Nodes go to not ready due to empty cni config file
-
- Closed
-
- clones
-
OCPBUGS-28700 Windows Nodes go to not ready due to empty cni config file
-
- Closed
-
- is blocked by
-
OCPBUGS-28700 Windows Nodes go to not ready due to empty cni config file
-
- Closed
-
- is cloned by
-
OCPBUGS-29004 Windows Nodes go to not ready due to empty cni config file
-
- Closed
-
- links to
-
RHSA-2023:124870 Red Hat OpenShift for Windows Containers 9.0.1 security update
- mentioned on
Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.