-
Bug
-
Resolution: Obsolete
-
Undefined
-
None
-
rhel-8.6.0
-
None
-
Moderate
-
rhel-sst-high-availability
-
ssg_filesystems_storage_and_HA
-
None
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
x86_64
-
None
Description of problem:
Microsoft Azure documentation states that the totem token in the Corosync configuration file should be set to 30000 to allow for memory preserving maintenance.
Sometimes we still see Corosync losing connection to its peers even with the 30000 token setting. However, from the Corosync log it looks like its only waiting 10 second before forming new membership
Jan 27 02:48:49.832 [14503] <Hostname> corosync notice [TOTEM ] totemsrp.c:timer_function_orf_token_warning:1730 Token has not been received in 7500 ms
Jan 27 02:48:52.332 [14503] <Hostname> corosync notice [TOTEM ] totemsrp.c:timer_function_orf_token_timeout:1746 A processor failed, forming new configuration.
Jan 27 02:48:57.800 [14503] <Hostname> corosync info [KNET ] libknet.h:log_deliver_fn:682 rx: host: 1 link: 0 is up
Jan 27 02:48:57.800 [14503] <Hostname> corosync info [KNET ] libknet.h:log_deliver_fn:682 host: host: 1 (passive) best link: 0 (pri: 1)
Jan 27 02:49:04.337 [14503] <Hostname> corosync notice [TOTEM ] totemsrp.c:memb_state_operational_enter:2096 A new membership (2.93) was formed. Members left: 1
Jan 27 02:49:04.337 [14503] <Hostname> corosync notice [TOTEM ] totemsrp.c:memb_state_operational_enter:2101 Failed to receive the leave message. failed: 1
Jan 27 02:49:04.337 [14503] <Hostname> corosync notice [QUORUM] vsf_quorum.c:log_view_list:131 Members[1]: 2
Jan 27 02:49:04.337 [14503] <Hostname> corosync notice [MAIN ] main.c:corosync_sync_completed:296 Completed service synchronization, ready to provide service.
For reference here is corosync.conf, and corosync_cmapctl output.
corosync.conf
totem {
version: 2
cluster_name: <HA Cluster>
transport: knet
token: 30000
crypto_cipher: aes256
crypto_hash: sha256
}
From corosync_cmapctl
runtime.config.totem.token (u32) = 30000
runtime.config.totem.token_retransmit (u32) = 7142
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
runtime.config.totem.token_warning (u32) = 75
Edit
Based on the above I have the following questions:
1. How can I be sure that Corosync is honoring the 30 seconds token timeout?
2. Are there any additional Corosync (or Pacemaker) configurations/workarounds recommended for Azure cloud? Any known problems with Corosync/Pacemaker on Azure?
Version-Release number of selected component (if applicable):
corosync-3.0.4-2
How reproducible:
Not reproducible on demand.
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
- external trackers