-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-9.6
-
None
-
No
-
Low
-
rhel-ha
-
2
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
What were you trying to do that didn't work?
Run regression test with simple setup & start cluster on upstream CI machines.
Probable reason formulated by pcs developer: pcs starts corosync + pcmk on all nodes and then checks whether the nodes have started. That's done by looking for each node status in crm_mon xml output. However, it looks like the check happens sooner than the started node actually appear in the xml. Probably caused by the nodes being slow to start.
Please provide the package NVR for which the bug is seen:
seen in pcs-0.11.9-99+git.57.ed947.el9
How reproducible is this bug?:
rarely, we couldn't reproduce the issue on our QA and engineering beaker instances, only occurrence is happening on upstream CI machines and not always.
Steps to reproduce
run pcs regression test pcs,cli,Setup on upstream CI machines. The test will call "pcs cluster start --all --wait" after setup.
Expected results
snippet from the command with --debug option:
--Debug Communication Output End-- node02: Starting Cluster... node03: Starting Cluster... Waiting for node(s) to start... Sending HTTP Request to: https://node01:2224/remote/pacemaker_node_status Data: None Sending HTTP Request to: https://node03:2224/remote/pacemaker_node_status Data: None Sending HTTP Request to: https://node02:2224/remote/pacemaker_node_status Data: None Response Code: 400 --Debug Response Start-- Error: Node 'node03' does not appear to exist in configuration
Actual results
The command "pcs cluster start --all --wait" counts with slower machines.