-
Bug
-
Resolution: Done
-
Normal
-
rhos-17.1.z
-
3
-
False
-
-
False
-
openstack-tripleo-heat-templates-14.3.1-17.1.20250321211026.e7c7ce3.el9osttrunk
-
None
-
-
-
PIDONE 18.0.4, PIDONE 18.0.5, PIDONE 18.0.6, PIDONE 18.0.7, PIDONE 18.0.8
-
5
-
Low
Description of problem:
On large deployments with instanceha, deployment fails with many different errors such as this:
~~~
Oct 17 23:19:42 puppet-user: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-compute-0-compute-instanceha-role]: Could not evaluate: pcs -f node attribute overcloud-compute-0 | grep -e ' overcloud-compute-0:.*compute-instanceha-role=true
~~~
and:
~~~
<13>Oct 9 22:41:33 puppet-user: Error: /Stage[main]/Tripleo::Fencing/Pacemaker::Stonith::Fence_ipmilan[00:00:00:00:00:00]/Pcmk_stonith[stonith-fence_ipmilan-0000000000]: Could not evaluate: pcs -f constraint location | grep stonith-fence_ipmilan-000000000 > /dev/null 2>&1 failed: . Too many tries
~~~
Version-Release number of selected component (if applicable):
17.1.3
How reproducible:
Always
Steps to Reproduce:
1. Deploy with 198 hosts (3 controllers and 195 instanceha)
2.
3.
Actual results:
Random failures of deployment
Expected results:
No issues
Additional info:
Looks like a concurrency bug in pacemaker where when we do more than 4 concurrent cibadmin --query (and maybe --push) , we get a timeout error but we don't appear to fail at that point, we fail later on when we try to use the generated cib.xml file with "-f" and "-f" appears to be empty because #
is undefined for some reasons (this is speculation at this point beside the concurrent cibadmin which we can reproduce manually).
- links to
-
RHBA-2025:148328 Red Hat OpenStack Platform 17.1 bug fix and enhancement advisory