Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28847

Windows Nodes go to not ready due to empty cni config file

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • 4.14.z
    • 4.16.0
    • Windows Containers
    • None
    • No
    • 0
    • WINC - Sprint 249
    • 1
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-28700. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-28575. The following is the description of the original issue:

      Description of problem:

      Windows Nodes are occasionally failing to reach the ready state due to CNI issues
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

      Able to reproduce in AWS
          

      Steps to Reproduce:

          1. SSH onto the Windows Node
          2. In powershell run: rm /k/cni/config/cni.conf; New-Item /k/cni/config/cni.conf -type file
          3. Wait for node to go to not ready
          

      Actual results:

      Windows Node goes to NotReady, and does not go back to Ready, the following can be seen in the logs:
      
      WMCO log:
      2024-01-29T06:09:15Z	DEBUG	wc 10.0.193.178	run	{"cmd": "C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator", "out": "I0129 06:07:50.347432    4868 cleanup.go:111] error getting services ConfigMap associated with version annotation, falling back to use latest services ConfigMap: node is missing version annotation\nI0129 06:09:15.448458    4868 cleanup.go:159] removed services: [\"kube-proxy\" \"csi-proxy\" \"hybrid-overlay-node\" \"windows_exporter\" \"kubelet\" \"containerd\"]\n"}
      
      kubelet log:
      I0129 06:06:16.789966    4484 kubelet.go:2393] "Container runtime status" status="Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
      E0129 06:06:16.789966    4484 kubelet.go:2396] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
      
      Node events are generated:
                          {
                              "lastHeartbeatTime": "2024-01-29T06:07:53Z",
                              "lastTransitionTime": "2024-01-29T05:33:40Z",
                              "message": "container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized",
                              "reason": "KubeletNotReady",
                              "status": "False",
                              "type": "Ready"
                          }
          

      Expected results:

      Windows Node goes to NotReady for less than 5 minutes, before returning to Ready
          

      Additional info:

      Has only been observed after the merge of https://github.com/openshift/windows-machine-config-operator/pull/1979
      Manual workaround:  SSH onto the Windows VM, delete /k/cni/config/cni.conf and wait a few minutes
          

              rh-ee-ssoto Sebastian Soto
              openshift-crt-jira-prow OpenShift Prow Bot
              Weinan Liu Weinan Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: