-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14
-
None
-
Important
-
No
-
False
-
Description of problem:
capi pods exited and restarted. There was an original capi pod which failed/finished at Wed, 17 Jan 2024 12:18:04 and the new pod's logs started at 2024-01-17T12:57:50Z. The original pod was evicted because of Status: Failed Reason: Evicted Message: The node was low on resource: ephemeral-storage. Threshold quantity: 6351693871, available: 6143252Ki. Container manager was using 332Ki, request is 0, has larger consumption of ephemeral-storage.
Version-Release number of selected component (if applicable):
(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc version Client Version: 4.14.0-0.nightly-2024-01-15-085353 Kustomize Version: v5.0.1 Server Version: 4.14.9 Kubernetes Version: v1.27.9+5c56cc3 (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc get hc -A --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig NAMESPACE NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE clusters hosted-0 4.14.9 hosted-0-admin-kubeconfig Completed True False The hosted control plane is available (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ Updated hypershift operator version to : https://quay.io/repository/hypershift/hypershift-operator/manifest/sha256:6ffead94e14d7aa56e4fd9adb9e45e337c1b32a3fc191c10a5c1306c3990fc9f
How reproducible:
Happens sometimes
Steps to Reproduce:
1.Deploy a hub cluster, and on it hosted cluster with 6 nodes , agent provider. 2.I ran this automation test,for scaling down : https://gitlab.cee.redhat.com/ocp-edge-qe/ocp-edge-auto/-/blob/master/edge_tests/deployment/installer/scale/test_scale_nodepool.py?ref_type=heads#L43 3. Although the test has passed, the theardown that scale the node-pool back to 2 nodes, took more than 30 min, so the theardown failed for timeout. 4. I don't know when exactly, but after some 3 hours, the nodepool scale to 2 nodes ended, if needed, I can recheck with a longet timeout, to give the accurate time.
Actual results:
more than 30 min scaling up to 2 nodes
Expected results:
scaling in a reasonable time (<10min)
Additional info: