-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
After enabling TechPreviewNoUpgrade cluster become unstable. PDB's blocking drain node due to which OAS and KAS-O pods get in pending state.
Version-Release number of selected component (if applicable):
Cluster version is 4.17.0-0.nightly-2024-08-29-115325
How reproducible:
Sometimes
Steps to Reproduce:
1. Enable featuregate TechPreviewNoUpgrade 2. Wait for KAS Operator to restart.
Actual results:
Cluster becomes unstable and KAS got in degraded state
oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.17.0-0.nightly-2024-08-29-115325 True False False 8h baremetal 4.17.0-0.nightly-2024-08-29-115325 True False False 9h cloud-controller-manager 4.17.0-0.nightly-2024-08-29-115325 True False False 9h cloud-credential 4.17.0-0.nightly-2024-08-29-115325 True False False 9h cluster-api 4.17.0-0.nightly-2024-08-29-115325 True False False 7h11m cluster-autoscaler 4.17.0-0.nightly-2024-08-29-115325 True False False 9h config-operator 4.17.0-0.nightly-2024-08-29-115325 True False False 9h console 4.17.0-0.nightly-2024-08-29-115325 True False False 8h control-plane-machine-set 4.17.0-0.nightly-2024-08-29-115325 True False False 8h csi-snapshot-controller 4.17.0-0.nightly-2024-08-29-115325 True True False 9h CSISnapshotControllerProgressing: Waiting for Deployment to deploy pods dns 4.17.0-0.nightly-2024-08-29-115325 True False False 8h etcd 4.17.0-0.nightly-2024-08-29-115325 True False False 8h image-registry 4.17.0-0.nightly-2024-08-29-115325 True False False 8h ingress 4.17.0-0.nightly-2024-08-29-115325 True False False 8h insights 4.17.0-0.nightly-2024-08-29-115325 True False False 8h kube-apiserver 4.17.0-0.nightly-2024-08-29-115325 True True False 8h NodeInstallerProgressing: 2 nodes are at revision 8; 1 node is at revision 9 kube-controller-manager 4.17.0-0.nightly-2024-08-29-115325 True False False 8h kube-scheduler 4.17.0-0.nightly-2024-08-29-115325 True False False 8h kube-storage-version-migrator 4.17.0-0.nightly-2024-08-29-115325 True False False 9h machine-api 4.17.0-0.nightly-2024-08-29-115325 True False False 8h machine-approver 4.17.0-0.nightly-2024-08-29-115325 True False False 9h machine-config 4.17.0-0.nightly-2024-08-29-115325 True False False 8h marketplace 4.17.0-0.nightly-2024-08-29-115325 True False False 8h monitoring 4.17.0-0.nightly-2024-08-29-115325 True False False 8h network 4.17.0-0.nightly-2024-08-29-115325 True True False 9h Deployment "/openshift-multus/multus-admission-controller" is not available (awaiting 1 nodes)... node-tuning 4.17.0-0.nightly-2024-08-29-115325 True False False 8h olm 4.17.0-0.nightly-2024-08-29-115325 True False False 7h6m openshift-apiserver 4.17.0-0.nightly-2024-08-29-115325 True False False 8h openshift-controller-manager 4.17.0-0.nightly-2024-08-29-115325 True False False 8h openshift-samples 4.17.0-0.nightly-2024-08-29-115325 True False False 8h operator-lifecycle-manager 4.17.0-0.nightly-2024-08-29-115325 True False False 8h operator-lifecycle-manager-catalog 4.17.0-0.nightly-2024-08-29-115325 True False False 8h operator-lifecycle-manager-packageserver 4.17.0-0.nightly-2024-08-29-115325 True False False 8h service-ca 4.17.0-0.nightly-2024-08-29-115325 True True False 9h Progressing: ... storage 4.17.0-0.nightly-2024-08-29-115325 True True False 7h9m AWSEBSProgressing: Waiting for Deployment to deploy pods...
oc get pod apiserver-7f457c758c-n6zvd -n openshift-apiserver -o yaml | yq -y '.status.conditions' - lastProbeTime: null lastTransitionTime: '2024-08-30T01:20:50Z' message: '0/6 nodes are available: 2 node(s) didn''t match Pod''s node affinity/selector, 2 node(s) didn''t match pod anti-affinity rules, 2 node(s) were unschedulable. no new claims to deallocate, preemption: 0/6 nodes are available: 2 node(s) didn''t match pod anti-affinity rules, 4 Preemption is not helpful for scheduling.' reason: Unschedulable status: 'False' type: PodScheduled oc get no NAME STATUS ROLES AGE VERSION ip-10-0-22-60.us-east-2.compute.internal Ready control-plane,master 9h v1.30.3 ip-10-0-29-105.us-east-2.compute.internal Ready worker 8h v1.30.3 ip-10-0-34-200.us-east-2.compute.internal Ready worker 8h v1.30.3 ip-10-0-52-222.us-east-2.compute.internal Ready control-plane,master 9h v1.30.3 ip-10-0-73-70.us-east-2.compute.internal Ready worker 8h v1.30.3 ip-10-0-84-49.us-east-2.compute.internal Ready,SchedulingDisabled control-plane,master 9h v1.30.3 rahulgangwar@rgangwar-mac new % oc get po -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-7f457c758c-4dkfj 0/2 Pending 0 66m apiserver-7f457c758c-c6pv5 2/2 Running 0 7h13m apiserver-7f457c758c-zlwrj 2/2 Running 0 7h12m rahulgangwar@rgangwar-mac new % oc get po -n openshift-kube-apiserver-operator NAME READY STATUS RESTARTS AGE kube-apiserver-operator-76b6bdc567-72hlr 0/1 Pending 0 6h57m
Expected results:
Cluster should stable and CO's should not degraded state.
Additional info:
must-gather- https://drive.google.com/file/d/1xXyQJLkubZKmCYWQQm_rCIReFN83ggek/view?usp=sharing
slack discussion - https://redhat-internal.slack.com/archives/CH76YSYSC/p1724997165869729