Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: rhel-8.10
Affects Version/s: rhel-8.0.0
Component/s: pcs
Labels:

Fixed in Build:
pcs-0.10.17-5.el8
Regression:
None
Severity:
Low
AssignedTeam:
rhel-ha
Sub-System Group:

ssg_filesystems_storage_and_HA

Dev Target Milestone:
13
Internal Target Milestone:
23
Story Points:
3
ACKs Check:

QE ack
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
No
Sprint:
None

Preliminary Testing:
Pass
Testable Builds:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2786566
Errata Link:
https://errata.devel.redhat.com/advisory/124578
Test Coverage:
None

Release Note Type:
Release Note Not Required
Release Note Text:

Hide
Feature:
Provide a guidance in error messages when adding or removing a node in a cluster with odd number of nodes, SBD enabled without disks, and auto_tie_breaker disabled.

Reason:
Originally, pcs in this situation just informed, that it was going to enable auto_tie_breaker, and then exited with an error saying corosync was running. This was not explanatory and it didn't provide enough information for users to solve this issue.

Result:
Error messages have been updated. They now explain that auto_tie_breaker must be enabled due to SBD and provide instructions to stop the cluster, enable auto_tie_breaker, start the cluster and run command for adding or removing a node again.

Show
Feature: Provide a guidance in error messages when adding or removing a node in a cluster with odd number of nodes, SBD enabled without disks, and auto_tie_breaker disabled. Reason: Originally, pcs in this situation just informed, that it was going to enable auto_tie_breaker, and then exited with an error saying corosync was running. This was not explanatory and it didn't provide enough information for users to solve this issue. Result: Error messages have been updated. They now explain that auto_tie_breaker must be enabled due to SBD and provide instructions to stop the cluster, enable auto_tie_breaker, start the cluster and run command for adding or removing a node again.

Experience:
Architecture:

Unspecified
Bugzilla Bug:
RHBZ: 2227234

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None
Internal Target Milestone numeric:
57,005

+++ This bug was initially created as a clone of Bug #2175797 +++

Description of problem:
pcs can get to a situation, when adding a new node to a cluster with sbd is impossible, because there are 2 conditions - one prompting the user that cluster has to be offline due to auto_tie_breaker and the other one that it is not possible to get CIB (because the cluster is offline). Pcs should be aware of this situation and provide a more intuitive output.

Version-Release number of selected component (if applicable):
found in pcs-0.11.4-6.el9

How reproducible:
every time the number of nodes would be even after adding a node (thus the cluster would have auto_tie_breaker) and sbd without disks is used.

Steps to Reproduce:

1. enable sbd on 3 node cluster

[root@virt-553 ~]# pcs stonith sbd enable
Running SBD pre-enabling checks...
virt-484: SBD pre-enabling checks done
virt-493: SBD pre-enabling checks done
virt-553: SBD pre-enabling checks done
Distributing SBD config...
virt-553: SBD config saved
virt-493: SBD config saved
virt-484: SBD config saved
Enabling sbd...
virt-493: sbd enabled
virt-484: sbd enabled
virt-553: sbd enabled
Warning: Cluster restart is required in order to apply these changes.

[root@virt-553 ~]# pcs cluster stop --all && pcs cluster start --all
virt-553: Stopping Cluster (pacemaker)...
virt-484: Stopping Cluster (pacemaker)...
virt-493: Stopping Cluster (pacemaker)...
virt-493: Stopping Cluster (corosync)...
virt-553: Stopping Cluster (corosync)...
virt-484: Stopping Cluster (corosync)...
virt-553: Starting Cluster...
virt-493: Starting Cluster...
virt-484: Starting Cluster...

1. Try to add new node to the cluster

1. in a running cluster

[root@virt-553 ~]# pcs cluster node add virt-551
No addresses specified for host 'virt-551', using 'virt-551'
No watchdog has been specified for node 'virt-551'. Using default watchdog '/dev/watchdog'
Warning: auto_tie_breaker quorum option will be enabled to make SBD fencing effective. Cluster has to be offline to be able to make this change.
Checking corosync is not running on nodes...
Error: virt-493: corosync is running
Error: virt-484: corosync is running
Error: virt-553: corosync is running
Running SBD pre-enabling checks...
virt-551: SBD pre-enabling checks done
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-553 ~]# echo $?
1

> Node can't be added in a running cluster due to auto_tie_breaker sbd check.

2. in a stopped cluster

[root@virt-553 ~]# pcs cluster stop --all
virt-553: Stopping Cluster (pacemaker)...
virt-493: Stopping Cluster (pacemaker)...
virt-484: Stopping Cluster (pacemaker)...
virt-484: Stopping Cluster (corosync)...
virt-553: Stopping Cluster (corosync)...
virt-493: Stopping Cluster (corosync)...

[root@virt-553 ~]# pcs cluster node add virt-551
No addresses specified for host 'virt-551', using 'virt-551'
No watchdog has been specified for node 'virt-551'. Using default watchdog '/dev/watchdog'
Error: Unable to load CIB to get guest and remote nodes from it, those nodes cannot be considered in configuration validation, use --force to override
Warning: auto_tie_breaker quorum option will be enabled to make SBD fencing effective. Cluster has to be offline to be able to make this change.
Checking corosync is not running on nodes...
virt-484: corosync is not running
virt-553: corosync is not running
virt-493: corosync is not running
Running SBD pre-enabling checks...
virt-551: SBD pre-enabling checks done
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-553 ~]# echo $?
1

> Node can't be added in a stopped cluster because CIB is unavailable.

Actual results:
In this state, it's not possible to add a node (without using --force), because the 2 checks are mutually exclusive - cluster needs to be stopped for auto_tie_breaker and cluster needs to be started to get CIB.

Expected results:
More intuitive error message, which explains the situation why the node in this state can never be added and what to do to solve it (for example disable sbd first). Alternative solutions can be discussed as well.

— Additional comment from Tomas Jelinek on 2023-03-06 16:20:14 CET —

possible solutions:

enable auto_tie_breaker
disable sbd temporarily
use --force in pcs cluster node add

is blocked by

RHEL-7700 Improve error message when adding a new node to a cluster with sbd is not possible

Closed

external trackers

PnT-DevOps Jira RHELPLAN-156851

Red Hat Issue Tracker RHELPLAN-163767

links to

RHBA-2023:124578 pcs bug fix and enhancement update

mentioned on

Commit - pcs-0.10.17-5

Merge request - pcs-0.10.17-5

(1 mentioned on)

Assignee:: Cluster QE

Reporter:: Tomas Jelinek

Developer:: Tomas Jelinek

QA Contact:: Nina Hostakova

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/09/22 8:32 PM

Updated:: 2024/09/23 8:15 PM

Resolved:: 2024/05/22 9:16 AM

Dev Target end:: 2023/11/27

Target end:: 2024/02/05

Release Date:: 2024/05/22

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates