-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.16.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
0
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description:
I a 4.16.40 production environment, for the past 23 weeks a customer has seen 3-4 occurrences of the same issue:
- Reports of pods stuck in Pending state (for hours at a time), while having MachineSets that would be eligible for autoscaling to accommodate the Pending workload. However, the pods continue pending & autoscaler is not seen to be adding any new nodes.
- After further inspection, they always see nodes with the ToBeDeletedByClusterAutoscaler and proceed to manual removal, which in turn resumes auto scaling activity and those Pending pods being finally scheduled.
The above seems to be a direct match of the patch OCPBUGS-54231.
- Due to the time-consuming efforts to execute the CNI migation to OVN-K (and a broad list of CNI Migration related bugs that took some time to fix), the customer is unfortunately still stuck on 4.16 and, therefore, we would need a OCPBUGS-54231 bugfix backport to a 4.16.z-stream.
- duplicates
-
OCPBUGS-54231 After scale down the last node has ToBeDeletedByClusterAutoscaler taint
-
- Closed
-