What were you trying to do that didn't work?
I run a pacemaker-2 cluster. I do have a location constraint with two rules. The constraint is referenced in an ACL role. After upgrade to pacemaker-3, pacemaker no longer works. 'pcs cluster start' returns success, however 'pcs status' prints an error:
Cannot upgrade configuration (claiming pacemaker-3.10 schema) to at least pacemaker-4.0 because it does not validate with any schema from pacemaker-3.10 to the latest Upgrade failed: Schema transform failed Error outputting status info from the fencer or CIB
What is the impact of this issue to you?
I haven't actually test cluster upgrade from RHEL 9 to RHEL 10. But it looks like after the upgrade, customers could end up with a non-functioning cluster. If this is really the case, then it should be prevented.
Please provide the package NVR for which the bug is seen:
pacemaker-2.1.8-39.cfd45a819f.git.el10.x86_64
How reproducible is this bug?:
always, easily
Steps to reproduce
Configure a CIB with an ACL role referencing a location constraint with two rules:
<constraints> <rsc_location id="location-d3" rsc="d3"> <rule id="location-d3-rule" boolean-op="and" score="INFINITY"> <date_expression id="location-d3-rule-expr" operation="gt" start="2021-01-01"/> </rule> <rule id="location-d3-rule-1" boolean-op="and" score="INFINITY"> <date_expression id="location-d3-rule-1-expr" operation="gt" start="2022-01-01"/> </rule> </rsc_location> </constraints> <acls> <acl_role id="test"> <acl_permission id="test-deny" kind="deny" reference="location-d3"/> </acl_role> </acls>
Upgrade from pacemaker 2 to pacemaker 3.
Expected results
I see two options (there may be more):
- Pacemaker modifies the CIB better so it can start with it
- Upgrade from RHEL 9 to RHEL 10 is prevented with an explanatory error message
Actual results
This is caused by pacemaker dropping support for multiple rules in a location constraint. There is a transformation in pacemaker which modifies the CIB so that it matches new CIB schema. However, that transformation modifies IDs of affected location constraints. If those constraints' IDs are referenced in ACLs, the CIB is not valid. In such case, pacemaker logs contain following messages:
pacemaker-schedulerd[24945] (xml_log) error: IDREF attribute reference references an unknown ID "location-d3" pacemaker-schedulerd[24945] (apply_upgrade) error: Schema upgrade from pacemaker-3.10 to pacemaker-4.0 failed: XSL transform pipeline produced an invalid configuration pacemaker-schedulerd[24945] (xml_log) error: Element rsc_location has extra content: rule pacemaker-schedulerd[24945] (xml_log) error: Element constraints has extra content: rsc_location pacemaker-schedulerd[24945] (pcmk__update_configured_schema) error: Cannot upgrade configuration (claiming pacemaker-3.10 schema) to at least pacemaker-4.0 because it does not validate with any schema from pacemaker-3.10 to the latest pacemaker-schedulerd[24945] (pcmk__log_transition_summary) error: Calculated transition 0 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-6.bz2 pacemaker-schedulerd[24945] (pcmk__log_transition_summary) notice: Configuration errors found during scheduler processing, please run "crm_verify -L" to identify issues
'crm_verify -LVV' explains what's wrong:
(xml_log) error: IDREF attribute reference references an unknown ID "location-d3" (apply_upgrade) error: Schema upgrade from pacemaker-3.10 to pacemaker-4.0 failed: XSL transform pipeline produced an invalid configuration (xml_log) error: Element rsc_location has extra content: rule (xml_log) error: Element constraints has extra content: rsc_location Cannot upgrade configuration (claiming pacemaker-3.10 schema) to at least pacemaker-4.0 because it does not validate with any schema from pacemaker-3.10 to the latest The cluster will NOT be able to use this configuration. Please manually update the configuration to conform to the pacemaker-4.0 syntax. error: CIB did not pass schema validation Configuration invalid (with errors)
Even though this is nicely debugable as pacemaker logs point to the root cause of the issue, it would be better if users didn't get into this situation in the first place.
- links to