Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16.z, 4.18.z, 4.20.z
Component/s: Networking / Metal LB
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None
Architecture:

All

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Anytime a node is cordon the VIP is reassigned to another node in the pool and the IP to MAC binding begins to be announced. There is no need for any other action, like restarting speaker pod or reboot of the node. However this new assignment is never permanent and looks like the speakers just leave some sort of "note" about the previous node assignment and once this node is back on the cluster, by being uncordoned, the VIP moves back to it immediately. During my testes I have seen everything single time I cordoned and uncordoned the node, no matter how long it was in that state.
During this period of cordoning and uncordoning, I saw a lot of dual MAC trying to claim the same IP.

Metallb keeps track now of this using servicel2statuses.metallb.io which is updated when these changes happen. However even this CR is inconsistent where I saw on some it doesn't get deleted, just stays in etcd with empty status while another is created with the new assignment.
When the VIP returns back to its previous node, then this empty CR gets updated and the newer one actually gets deleted.

However this doesn't seem to be the cause of the duplicate GARPs, just probably another bug so let me know if there is a need for a new one or this is expected for some services.

Looking at the code and the logs from the speakers I don't see anything particularly wrong, so the only thing that comes to mind is the possibility of timing and synchronization of the speakers. Not sure if this is caused by the metallb behavior of "wanting" the VIP to move back to the previous node and in the meantime we have 2 speakers sending GARPs at the same time, since they stop once the VIP is moved back or in cases the node is completely broken and no longer joins the cluster.

Version-Release number of selected component (if applicable):

OCP 4.16+

How reproducible:

every time

Steps to Reproduce:

    1. Configure IPaddressPool and L2adv
    2. Configure a few applications to be accessed via LoadBalancer service with metallb
    3. Have a tcpdump to monitor ARP traffic for these IPs.
    4. Force the VIPs to be allocated to another node by cordoning a node. Wait a few seconds and uncordon the node. Repeat for other nodes where other VIPs maybe allocated

Actual results:

    We see multiple GARPs being sent by both nodes which tcpdump warns they might be duplicate. On normal circumstances this is not an issue, however many customers with solutions that don't allow such behavior this causes the traffic to be blocked.

Expected results:

Additional info:

Assignee:: Federico Paolinelli

Reporter:: Andre Costa

QA Contact:: Arti Sood

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2026/02/26 8:46 AM

Updated:: 2026/03/03 11:08 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates