Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Networking / runtime-cfg
Labels:
- OPNETriaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Customer upgrade from 4.12.21 --> 4.12.22
Observed degraded access to console/auth/routes on cluster
IPI cluster

keepalived on two nodes were broadcasting ownership of the ingress VIP address; observed that on two affected nodes, the unicast src and unicast peer IP tables were missing from /etc/keepalived/keepalived.conf

After manually editing the file to include src and peer entries + restarting the affected host pods, duplicated GARPs stopped, collisions ended and resolution to console/routes was restored. 

VRRP traffic did not appear blocked, as adding the entries allowed the node to check in and recognize it was part of the cluster for ip failover and cease broadcasting, but it is odd that this process was not automatically defined on node boot.

(customer had previously restarted all nodes in the cluster to mitigate the issue, which implies that the table was not refreshed on a restart).

Version-Release number of selected component (if applicable):

4.12.22

How reproducible:

Issue was ongoing until we manually edited the /etc/keepalived/keepalived.conf file to reflect the proper entries on the two hosts missing them (a worker and a storage node).

one time.

Steps to Reproduce:

1. observe console degrades after multiple calls to the address path
2. observed via KCS checks: https://access.redhat.com/solutions/7013445 that ARP for ingress was being sent from multiple host keepalived pods
3. observed that on the secondary hosts, these nodes did not have a valid router pod, leading to dropped packets/port rejection on 443/80, leading to degraded state.
4. Confirmed that /etc/keepalived/keepalived.conf was missing unicast peer/src entries on affected nodes
5. modified entries to match working peers
6. observed issue resolved.

Actual results:

Cluster degraded until manual intervention

Expected results:

cluster should be able to populate unicast peer entries for keepalived during cluster updates without interference

Additional info:

this is a stateside support case linked, which means all uploads will be cleaned prior to submission (no ip addresses/hostnames). Have requested sosreport from affected node for analysis + have cleaned must-gather for review.

Assignee:: Benjamin Nemec

Reporter:: Will Russell

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/07/21 7:11 PM

Updated:: 2025/10/08 12:55 PM

Resolved:: 2023/10/13 10:09 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates