Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 4.11
Component/s: Windows Containers
Labels:
- perf-scale

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
After a windows worker node went NotReady from an attempt to schedule 200pods, it did not restore after multiple reboots and deleting pods through kube-api.
kubelet would not start because containerd was not available.
containerd would not start because of the following error:

time="2022-08-17T17:39:17.384124700Z" level=info msg="containerd successfully booted in 0.058586s"                                                                                                                      
time="2022-08-17T17:39:17.416802600Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\": name \"node-density-702_01b7218a-node-density-20220816_b4986a0f-b087-4e14-9c6d-7afe9a1b0160_1\" is reserved for \"899b7a59bea8109a1ed607facfa73eb10daad77cca58db08e29081431e1c5adb\""

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-08-15-074436

How reproducible:
NodeNotReady is reproducible under high numbers of pods.

Steps to Reproduce:
1. Create 4.11 cluster in AWS with windows workers (m5.2xlarge used here)
2. Run node-density workload from this commit, with 200 pods per node.

Actual results:
workers' containerd is unable to start or recover from this error.

Expected results:
Windows workers are able to clean their state when pods have been deleted.

Additional info:

is related to

WINC-1115 Document or restrict max pods deployed on Windows node

Closed

Assignee:: Team WinC

Reporter:: Andrew Collins

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/08/17 6:18 PM

Updated:: 2025/07/29 11:51 AM

Resolved:: 2024/03/04 6:22 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide