-
Bug
-
Resolution: Obsolete
-
Major
-
None
-
4.11
-
Moderate
-
None
-
3
-
Rejected
-
False
-
Description of problem:
After a windows worker node went NotReady from an attempt to schedule 200pods, it did not restore after multiple reboots and deleting pods through kube-api.
kubelet would not start because containerd was not available.
containerd would not start because of the following error:
time="2022-08-17T17:39:17.384124700Z" level=info msg="containerd successfully booted in 0.058586s" time="2022-08-17T17:39:17.416802600Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\": name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\" is reserved for \"899b7a59bea8109a1ed607facfa73eb10daad77cca58db08e29081431e1c5adb\""
Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-08-15-074436
How reproducible:
NodeNotReady is reproducible under high numbers of pods.
Steps to Reproduce:
1. Create 4.11 cluster in AWS with windows workers (m5.2xlarge used here)
2. Run node-density workload from this commit, with 200 pods per node.
Actual results:
workers' containerd is unable to start or recover from this error.
Expected results:
Windows workers are able to clean their state when pods have been deleted.
Additional info:
- is related to
-
WINC-1115 Document or restrict max pods deployed on Windows node
-
- Closed
-