Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.9.z
Component/s: Installer / openshift-ansible
Labels:
None

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Blocked by Bugzilla Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2090151
Target Version:

4.9.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This bug is a backport clone of [Bugzilla Bug 2090151](https://bugzilla.redhat.com/show_bug.cgi?id=2090151). The following is the description of the original bug:
—
Version :
4.9.0-0.nightly-2022-05-24-200205

Sometimes scale-up job hit following error, but eventually, all nodes are Ready and cluster is healthy.

TASK [openshift_node : Wait for node to report ready] **************************
Wednesday 25 May 2022 14:25:10 +0800 (0:00:19.202) 0:13:32.778 *********
FAILED - RETRYING: Wait for node to report ready (30 retries left).
<-~~SNIP~~->
FAILED - RETRYING: Wait for node to report ready (1 retries left).
fatal: [ip-10-0-60-71.us-east-2.compute.internal -> localhost]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["oc", "get", "node", "ip-10-0-60-71.us-east-2.compute.internal", "-~~kubeconfig=/tmp/installer-aVed14/auth/kubeconfig", "~~-output=jsonpath=

{.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.249540", "end": "2022-05-25 14:35:24.212666", "rc": 0, "start": "2022-05-25 14:35:23.963126", "stderr": "", "stderr_lines": [], "stdout": "False", "stdout_lines": ["False"]}
fatal: [ip-10-0-61-254.us-east-2.compute.internal -> localhost]: FAILED! => {"attempts": 30, "changed": false, "cmd": ["oc", "get", "node", "ip-10-0-61-254.us-east-2.compute.internal", "-~~kubeconfig=/tmp/installer-aVed14/auth/kubeconfig", "~~-output=jsonpath={.status.conditions[?(@.type=="Ready")].status}

"], "delta": "0:00:00.266898", "end": "2022-05-25 14:35:24.213355", "rc": 0, "start": "2022-05-25 14:35:23.946457", "stderr": "", "stderr_lines": [], "stdout": "False", "stdout_lines": ["False"]}

The timeline is:

1.[6:24-6:34] Approve CSR and wait for 10 min
TASK [openshift_node : Approve node CSRs] **************************************
Wednesday 25 May 2022 14:24:51 +0800 (0:04:04.743) 0:13:13.576 *********

2.[6:34], scale-up up job reported error, time out

3.[6:37:09], node reported Ready
May 25 06:37:09 ip-10-0-60-71.us-east-2.compute.internal hyperkube[2526]: I0525 06:37:09.201219 2526 kubelet_node_status.go:581] "Recording event message for node" node="ip-10-0-60-71.us-east-2.compute.in ternal" event="NodeReady"

lastHeartbeatTime: "2022-05-25T07:16:01Z"
lastTransitionTime: "2022-05-25T06:37:09Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready

How to reproduce it (as minimally and precisely as possible)?
> 30%

Steps to Reproduce:
1. Create a cluster with OVN network
2. Do scale up against above cluster

Expected results:
Scale-up job finished successfully

Suggestion:
Increase wait time to 16-18 mins.

Additional info:
this issue is applicable for 4.9 4.10 and 4.11

Assignee:: Unassigned

Reporter:: OpenShift Prow Bot

QA Contact:: Gaoyun Pei

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/12/02 9:10 PM

Updated:: 2022/12/02 9:14 PM

Resolved:: 2022/12/02 9:14 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates