Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: rhos-17.1.6
Affects Version/s: rhos-17.1.z
Component/s: openstack-tripleo-heat-templates
Labels:
- Triaged

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2319767
Fixed in Build:
openstack-tripleo-heat-templates-14.3.1-17.1.20250321211026.e7c7ce3.el9osttrunk
Regression:
None
Intelligence Requested:
Market:
Errata Link:
https://errata.engineering.redhat.com/advisory/148328
Target Version:

rhos-17.1.6

Sprint:
PIDONE 18.0.4, PIDONE 18.0.5, PIDONE 18.0.6, PIDONE 18.0.7, PIDONE 18.0.8
sprint_count:
5
Severity:
Low

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:
On large deployments with instanceha, deployment fails with many different errors such as this:
~~~
Oct 17 23:19:42 puppet-user: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-compute-0-compute-instanceha-role]: Could not evaluate: pcs -f node attribute overcloud-compute-0 | grep -e ' overcloud-compute-0:.*compute-instanceha-role=true
~~~

and:
~~~
<13>Oct 9 22:41:33 puppet-user: Error: /Stage[main]/Tripleo::Fencing/Pacemaker::Stonith::Fence_ipmilan[00:00:00:00:00:00]/Pcmk_stonith[stonith-fence_ipmilan-0000000000]: Could not evaluate: pcs -f constraint location | grep stonith-fence_ipmilan-000000000 > /dev/null 2>&1 failed: . Too many tries
~~~

Version-Release number of selected component (if applicable):
17.1.3

How reproducible:
Always

Steps to Reproduce:
1. Deploy with 198 hosts (3 controllers and 195 instanceha)
2.
3.

Actual results:
Random failures of deployment

Expected results:
No issues

Additional info:
Looks like a concurrency bug in pacemaker where when we do more than 4 concurrent cibadmin --query (and maybe --push) , we get a timeout error but we don't appear to fail at that point, we fail later on when we try to use the generated cib.xml file with "-f" and "-f" appears to be empty because #

{cib}

is undefined for some reasons (this is speculation at this point beside the concurrent cibadmin which we can reproduce manually).

links to

RHBA-2025:148328 Red Hat OpenStack Platform 17.1 bug fix and enhancement advisory

Assignee:: Daniel Barzilay

Reporter:: RH Bugzilla Integration

QA Contact:: Joe Hakim Rahme

Team:: rhos-dfg-pidone

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/10/18 12:29 PM

Updated:: 2025/09/13 3:46 PM

Resolved:: 2025/04/23 1:44 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty