Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-85680

'pcs cluster start --all --wait' can return response code 400 when machines are slow to start

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhel-9.6
    • pcs
    • None
    • No
    • Low
    • rhel-ha
    • 2
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      Run regression test with simple setup & start cluster on upstream CI machines.

      Probable reason formulated by pcs developer: pcs starts corosync + pcmk on all nodes and then checks whether the nodes have started. That's done by looking for each node status in crm_mon xml output. However, it looks like the check happens sooner than the started node actually appear in the xml. Probably caused by the nodes being slow to start.

      Please provide the package NVR for which the bug is seen:

      seen in pcs-0.11.9-99+git.57.ed947.el9

      How reproducible is this bug?:

      rarely, we couldn't reproduce the issue on our QA and engineering beaker instances, only occurrence is happening on upstream CI machines and not always.

      Steps to reproduce

      run pcs regression test pcs,cli,Setup on upstream CI machines. The test will call "pcs cluster start --all --wait" after setup.

      Expected results

      snippet from the command with --debug option:

      --Debug Communication Output End--
      node02: Starting Cluster...
      node03: Starting Cluster...
      Waiting for node(s) to start...
      Sending HTTP Request to: https://node01:2224/remote/pacemaker_node_status
      Data: None
      Sending HTTP Request to: https://node03:2224/remote/pacemaker_node_status
      Data: None
      Sending HTTP Request to: https://node02:2224/remote/pacemaker_node_status
      Data: None
      Response Code: 400
      --Debug Response Start--
      Error: Node 'node03' does not appear to exist in configuration

      Actual results

      The command "pcs cluster start --all --wait" counts with slower machines.

              tojeline@redhat.com Tomas Jelinek
              mmazoure Michal Mazourek
              Tomas Jelinek Tomas Jelinek
              Cluster QE Cluster QE
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: