Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-218

containerd failed to recover state: failed to reserve sandbox name

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 4.11
    • Windows Containers
    • Moderate
    • None
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      After a windows worker node went NotReady from an attempt to schedule 200pods, it did not restore after multiple reboots and deleting pods through kube-api.
      kubelet would not start because containerd was not available.
      containerd would not start because of the following error:

      time="2022-08-17T17:39:17.384124700Z" level=info msg="containerd successfully booted in 0.058586s"                                                                                                                      
      time="2022-08-17T17:39:17.416802600Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\": name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\" is reserved for \"899b7a59bea8109a1ed607facfa73eb10daad77cca58db08e29081431e1c5adb\""
      

      Version-Release number of selected component (if applicable):
      4.11.0-0.nightly-2022-08-15-074436

      How reproducible:
      NodeNotReady is reproducible under high numbers of pods.

      Steps to Reproduce:
      1. Create 4.11 cluster in AWS with windows workers (m5.2xlarge used here)
      2. Run node-density workload from this commit, with 200 pods per node.

      Actual results:
      workers' containerd is unable to start or recover from this error.

      Expected results:
      Windows workers are able to clean their state when pods have been deleted.

      Additional info:

              team-winc Team WinC
              ancollin@redhat.com Andrew Collins
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: