Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7597

After a cluster upgrade, cannot run pcs/crm_attribute on offline CIB on pacemaker remotes

    • pacemaker-2.1.7-2.el8
    • None
    • Moderate
    • rhel-sst-high-availability
    • ssg_filesystems_storage_and_HA
    • 17
    • 20
    • 5
    • QE ack, Dev ack
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Bug Fix
    • Hide
      Cause (the user action or circumstances that trigger the bug): Attempting to run CIB queries or modifications on a Pacemaker Remote node that has an older Pacemaker version than the cluster that it is connected to
      Consequence (what the user experience is when the bug occurs): CIB queries and modifications may fail due to newer schema files being unavailable on older Pacemaker Remote node
      Fix (what has changed to fix the bug; do not include overly technical details): When the cluster connects to a Pacemaker Remote node, it now sends any schema files newer than the remote's
      Result (what happens now that the patch is applied): CIB queries and modifications complete successfully on a Pacemaker Remote node that has an older Pacemaker version than the cluster it is connected to
      Show
      Cause (the user action or circumstances that trigger the bug): Attempting to run CIB queries or modifications on a Pacemaker Remote node that has an older Pacemaker version than the cluster that it is connected to Consequence (what the user experience is when the bug occurs): CIB queries and modifications may fail due to newer schema files being unavailable on older Pacemaker Remote node Fix (what has changed to fix the bug; do not include overly technical details): When the cluster connects to a Pacemaker Remote node, it now sends any schema files newer than the remote's Result (what happens now that the patch is applied): CIB queries and modifications complete successfully on a Pacemaker Remote node that has an older Pacemaker version than the cluster it is connected to
    • All
    • 2.1.7
    • None

      Description of problem:
      I want to update a running pacemaker cluster that includes a pacemaker remote node.

      [root@ratester3 ~]# crm_mon -1
      Stack: corosync
      Current DC: ratester1 (version 2.0.1-4.el8_0.4-0eb7991564) - partition with quorum
      Last updated: Tue Oct 8 06:02:14 2019
      Last change: Tue Oct 8 05:49:02 2019 by root via cibadmin on ratester2

      3 nodes configured
      1 resource configured

      Online: [ ratester1 ratester2 ]
      RemoteOnline: [ ratester3 ]

      Active resources:

      ratester3 (ocf::pacemaker:remote): Started ratester1

      I'm updating only the real cluster nodes (ratester1 and ratester2) to a newer pacemaker
      version:

      [root@ratester3 ~]# crm_mon -1
      Stack: corosync
      Current DC: ratester2 (version 2.0.2-3.el8-744a30d655) - partition with quorum
      Last updated: Tue Oct 8 07:36:09 2019
      Last change: Tue Oct 8 06:58:22 2019 by ratester3 via crm_attribute on ratester2

      3 nodes configured
      1 resource configured

      Online: [ ratester1 ratester2 ]
      RemoteOnline: [ ratester3 ]

      Active resources:

      ratester3 (ocf::pacemaker:remote): Started ratester2

      At this point, I'm trying to set an attribute in the live CIB, everything still
      works ok:

      [root@ratester3 ~]# pcs node attribute ratester2 foo=foo_value

      However, if I try to run the same operation on an offline CIB, the same operation fails:

      [root@ratester3 ~]# pcs cluster cib > cib.xml
      [root@ratester3 ~]# pcs -f cib.xml node attribute ratester2 bar=bar_value
      Error: unable to set attribute bar
      Error performing operation: Protocol not supported
      Error setting bar=bar_value (section=nodes, set=nodes-2): Protocol not supported

      In fact, with debug info enabled, it seems this is because the feature set version
      has been bumped in the cluster, even if the pacemaker remote hasn't been upgraded
      yet.

      [root@ratester3 ~]# PCMK_debug=yes PCMK_logfile=/dev/stdout pcs -f cib.xml node attribute ratester2 bar=bar_value
      Error: unable to set attribute bar
      Set r/w permissions for uid=189, gid=189 on /dev/stdout
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (crm_log_args) notice: Invoked: /usr/sbin/crm_attribute -t nodes --node ratester2 --name bar --update bar_value
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (validate_with_relaxng) info: Creating RNG parser context
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_file_signon) debug: crm_attribute: Opened connection to local file 'cib.xml'
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_file_perform_op_delegate) info: cib_query on /cib/configuration/nodes/node[translate(@uname,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') ='ratester2']|/cib/configuration/resources/primitive[@class='ocf'][@provider='pacemaker'][@type='remote'][translate(@id,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') ='ratester2']|/cib/configuration/resources/primitive/meta_attributes/nvpair[@name='remote-node'][translate(@value,'ABCDEF
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_process_xpath) debug: Processing cib_query op for /cib/configuration/nodes/node[translate(@uname,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') ='ratester2']|/cib/configuration/resources/primitive[@class='ocf'][@provider='pacemaker'][@type='remote'][translate(@id,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz') ='ratester2']|/cib/configuration/resources/primitive/meta_attributes/nvpair[@name='remote-node'][translate(@value,'A
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (query_node_uuid) info: Mapped node name 'ratester2' to UUID 2
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_file_perform_op_delegate) info: cib_query on //cib/configuration/nodes//node[@id='2']//instance_attributes//nvpair[@name='bar']
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_process_xpath) debug: cib_query: //cib/configuration/nodes//node[@id='2']//instance_attributes//nvpair[@name='bar'] does not exist
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_file_perform_op_delegate) info: cib_modify on nodes
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_perform_op) error: Discarding update with feature set '3.2.0' greater than our own '3.1.0'
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (update_attr_delegate) info: Update <node id="2">
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (update_attr_delegate) info: Update <instance_attributes id="nodes-2">
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (update_attr_delegate) info: Update <nvpair id="nodes-2-bar" name="bar" value="bar_value"/>
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (update_attr_delegate) info: Update </instance_attributes>
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (update_attr_delegate) info: Update </node>
      Error performing operation: Protocol not supported
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (cib_file_signoff) debug: Disconnecting from the CIB manager
      Oct 08 08:59:06 ratester3 crm_attribute [26653] (crm_xml_cleanup) info: Cleaning up memory from libxml2
      Error setting bar=bar_value (section=nodes, set=nodes-2): Protocol not supported

      This is problematic for our use of pacemaker and pcs in OpenStack, for
      a couple of reasons:

      1. operators can upgrade their cluster nodes in a random order, so we
      can't guarantee that they will upgrade all their pacemaker remotes
      before upgrading the real cluster nodes.

      2. likewise, we are using bundles, which run pacemaker remotes, and we
      can't guarantee that operators will restart all containers with
      up-to-date container images before upgrading the real cluster
      nodes.

      3. in OpenStack we have an idiomatic way of calling pcs with offline
      CIB, because we drive the creation of pcs resources from puppet and
      we have to implement a means of checking for resource differences
      between two puppet runs.

      Version-Release number of selected component (if applicable):
      pacemaker-2.0.1-4.el8_0.4.x86_64

      How reproducible:
      Always

      Steps to Reproduce:
      1. create a cluster with a pacemaker remote node (with e.g. pacemaker-2.0.1-4.el8_0.4.x86_64)
      2. upgrade the real cluster node to a pacemaker rpm that ships a different feature set (e.g. pacemaker-2.0.2-3.el8.x86_64)
      3. from the non-upgraded remote node, try to update a node attribute in a offline CIB

      Actual results:
      no attribute can be updated in the offline CIB attribute because the node's feature set lags behind.

      Expected results:
      adding/updating attributes in the offline CIB should still work even of the real cluster nodes have been upgraded.

      Additional info:

              rhn-support-msmazova Marketa Smazova
              rhn-engineering-dciabrin Damien Ciabrini
              Christopher Lumens Christopher Lumens
              Marketa Smazova Marketa Smazova
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: