- 
    
Bug
 - 
    Resolution: Done
 - 
    
Normal
 - 
    rhos-17.1.z
 
- 
        3
 - 
        False
 - 
        
 - 
        False
 - 
        openstack-tripleo-heat-templates-14.3.1-17.1.20250321211026.e7c7ce3.el9osttrunk
 - 
        None
 - 
        
 - 
        
 
- 
        PIDONE 18.0.4, PIDONE 18.0.5, PIDONE 18.0.6, PIDONE 18.0.7, PIDONE 18.0.8
 - 
        5
 - 
        Low
 
Description of problem:
On large deployments with instanceha, deployment fails with many different errors such as this:
~~~
Oct 17 23:19:42 puppet-user: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-compute-0-compute-instanceha-role]: Could not evaluate: pcs -f  node attribute  overcloud-compute-0 | grep -e ' overcloud-compute-0:.*compute-instanceha-role=true
~~~
and:
~~~
<13>Oct  9 22:41:33 puppet-user: Error: /Stage[main]/Tripleo::Fencing/Pacemaker::Stonith::Fence_ipmilan[00:00:00:00:00:00]/Pcmk_stonith[stonith-fence_ipmilan-0000000000]: Could not evaluate: pcs -f  constraint location | grep stonith-fence_ipmilan-000000000 > /dev/null 2>&1 failed: . Too many tries
~~~
Version-Release number of selected component (if applicable):
17.1.3
How reproducible:
Always
Steps to Reproduce:
1. Deploy with 198 hosts (3 controllers and 195 instanceha)
2.
3.
Actual results:
Random failures of deployment
Expected results:
No issues
Additional info:
Looks like a concurrency bug in pacemaker where when we do more than 4 concurrent cibadmin --query (and maybe --push) , we get a timeout error but we don't appear to fail at that point, we fail later on when we try to use the generated cib.xml file with "-f" and "-f" appears to be empty because #
is undefined for some reasons (this is speculation at this point beside the concurrent cibadmin which we can reproduce manually).
- links to
 - 
                    
        
        RHBA-2025:148328
        Red Hat OpenStack Platform 17.1 bug fix and enhancement advisory