-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.9.z
-
None
This bug is a backport clone of [Bugzilla Bug 2090151](https://bugzilla.redhat.com/show_bug.cgi?id=2090151). The following is the description of the original bug:
—
Version :
4.9.0-0.nightly-2022-05-24-200205
Sometimes scale-up job hit following error, but eventually, all nodes are Ready and cluster is healthy.
TASK [openshift_node : Wait for node to report ready] **************************
Wednesday 25 May 2022 14:25:10 +0800 (0:00:19.202) 0:13:32.778 *********
FAILED - RETRYING: Wait for node to report ready (30 retries left).
<-SNIP->
FAILED - RETRYING: Wait for node to report ready (1 retries left).
fatal: [ip-10-0-60-71.us-east-2.compute.internal -> localhost]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["oc", "get", "node", "ip-10-0-60-71.us-east-2.compute.internal", "-kubeconfig=/tmp/installer-aVed14/auth/kubeconfig", "-output=jsonpath=
fatal: [ip-10-0-61-254.us-east-2.compute.internal -> localhost]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["oc", "get", "node", "ip-10-0-61-254.us-east-2.compute.internal", "-
"], "delta": "0:00:00.266898", "end": "2022-05-25 14:35:24.213355", "rc": 0, "start": "2022-05-25 14:35:23.946457", "stderr": "", "stderr_lines": [], "stdout": "False", "stdout_lines": ["False"]}
The timeline is:
1.[6:24-6:34] Approve CSR and wait for 10 min
TASK [openshift_node : Approve node CSRs] **************************************
Wednesday 25 May 2022 14:24:51 +0800 (0:04:04.743) 0:13:13.576 *********
2.[6:34], scale-up up job reported error, time out
3.[6:37:09], node reported Ready
May 25 06:37:09 ip-10-0-60-71.us-east-2.compute.internal hyperkube[2526]: I0525 06:37:09.201219 2526 kubelet_node_status.go:581] "Recording event message for node" node="ip-10-0-60-71.us-east-2.compute.in ternal" event="NodeReady"
- lastHeartbeatTime: "2022-05-25T07:16:01Z"
lastTransitionTime: "2022-05-25T06:37:09Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
How to reproduce it (as minimally and precisely as possible)?
> 30%
Steps to Reproduce:
1. Create a cluster with OVN network
2. Do scale up against above cluster
Expected results:
Scale-up job finished successfully
Suggestion:
Increase wait time to 16-18 mins.
Additional info:
this issue is applicable for 4.9 4.10 and 4.11