-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.14, 4.15
-
None
-
Moderate
-
No
-
Rejected
-
False
-
Description of problem:
unavailable monitoring operator : monitoring 4.15.0-rc.3 False True True 29m UpdatingNodeExporter: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of openshift-monitoring/node-exporter: context deadline exceeded
auto scaling nodepool from 2 nodes to 5, (by setting load of 50 pods of 256Mi each), 3rd node is up after 04:49 min, but the next 2 nodes are not ready after 20 min (timeout), and their agents are stuck in "Joined" stage for most of the time.
After disabling the autoscaling and setting node number to be 2, hosted cluster nodes shows 4 nodes of which 2 are in not ready status in the nodepool there are 2 nodes and autoscaling off , as expected, but an irrelevant msg of "Scaaling down MachineSet to 2 replicas (actual 4)"
Version-Release number of selected component (if applicable):
[kni@ocp-edge119 ocp-edge-auto_cluster]$ oc version Client Version: 4.14.0-0.nightly-2023-07-27-104118 Kustomize Version: v5.0.1 [kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get hc -A --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig NAMESPACE NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE clusters hosted-0 4.15.0-rc.3 hosted-0-admin-kubeconfig Completed True False The hosted control plane is available [kni@ocp-edge119 ocp-edge-auto_cluster]$
How reproducible:
happens sometimes
Steps to Reproduce:
1.Deploy a hub cluster, and on it hosted cluster with 6 nodes , agent provider, (I used https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/job-runner/2205/) 2. run test_toggling_autoscaling_nodepool (https://gitlab.cee.redhat.com/ocp-edge-qe/ocp-edge-auto/-/blob/master/edge_tests/deployment/installer/scale/test_scale_nodepool.py#L322) 3.test fail for pulling wrong number of nodes for 20 min timeout
Actual results:
(.venv) [kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get co --kubeconfig ~/clusterconfigs/hosted-0/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.15.0-rc.3 False True True 63m UpdatingNodeExporter: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of openshift-monitoring/node-exporter: context deadline exceeded
[kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get nodes --kubeconfig ~/clusterconfigs/hosted-0/auth/kubeconfig NAME STATUS ROLES AGE VERSION hosted-worker-0-1 NotReady worker 65m v1.28.5+c84a6b8 hosted-worker-0-2 Ready worker 18h v1.28.5+c84a6b8 hosted-worker-0-4 NotReady worker 96m v1.28.5+c84a6b8 hosted-worker-0-5 Ready worker 18h v1.28.5+c84a6b8
(.venv) [kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get nodepool -A -o wide --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig NAMESPACE NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE clusters hosted-0 hosted-0 2 2 False False 4.15.0-rc.3 Scaling down MachineSet to 2 replicas (actual 4)
Expected results:
(.venv) [kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get co --kubeconfig ~/clusterconfigs/hosted-0/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.15.0-rc.3 True True True 63m
would expect 5 nodes in the test itself, but after disabling autoscaling and scaling explicitly to 2 nodes:
[kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get nodes --kubeconfig ~/clusterconfigs/hosted-0/auth/kubeconfig NAME STATUS ROLES AGE VERSION hosted-worker-0-2 Ready worker 18h v1.28.5+c84a6b8 hosted-worker-0-5 Ready worker 18h v1.28.5+c84a6b8
(.venv) [kni@ocp-edge119 ocp-edge-auto_cluster]$ oc get nodepool -A -o wide --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig NAMESPACE NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE clusters hosted-0 hosted-0 2 2 False False 4.15.0-rc.3
Additional info:
- is incorporated by
-
OCPBUGS-29287 Pods are stuck in "Terminating" status causing nodepool autoscaling to fail adding new nodes
- New
- links to