Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Networking / multus
Labels:
None

Severity:
Critical
Regression:
None
Blocked:
True
Blocked Reason:

Hide

This bug is critically impeding the Nokia UDM redundancy tests, which are essential for ensuring system reliability and stability before going live. Until this issue is resolved, we cannot proceed with these final tests, risking potential delays in the go-live.

Show
This bug is critically impeding the Nokia UDM redundancy tests, which are essential for ensuring system reliability and stability before going live. Until this issue is resolved, we cannot proceed with these final tests, risking potential delays in the go-live.
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

On pod deletion, clean up takes too long intermittently resulting in the replacement pod multus interface failing IPv6 DAD.

Sample reproduction:
- Worker 14 begins to remove the pod at 14:21:14:

Jul 17 14:21:14 worker14 kubenswrapper[9796]: I0717 14:21:14.904545    9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD]

- Worker 19 begins to add the pod at 14:21:14:

Jul 17 14:21:14 worker19 kubenswrapper[9438]: I0717 14:21:14.952931    9438 kubelet.go:2425] "SyncLoop ADD" source="api" pods=[NAMESPACE/POD]

- Worker 19 tries adding the network to the pod at Jul 17 14:21:15:

Jul 17 14:21:15 worker19 crio[9376]: time="2024-07-17 14:21:15.294568336Z" level=info msg="Adding pod NAMESPACE/POD to CNI network \"multus-cni-network\" (type=multus-shim)"

- But hiccups due to DAD failure at 14:21:17:

Jul 17 14:21:17 worker19 kernel: IPv6: eth1: IPv6 duplicate address <IPv6_ADDRESS> used by <MAC> detected!

- worker 14 has not finished tearing down the original pod and related netns:

Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789184337Z" level=info msg="Got pod network &{Name:<POD> Namespace:<NAMESPACE> ID:a36d6da2c26fb668b3d9a665544ae25629377656b180bd3db2b4e199c59f9793 UID:9b7db4ae-b0bc-4987-ac57-35d3c42afdb3 NetNS:/var/run/netns/9b37d0a3-61c9-4b57-b5ea-51e1964b58c0 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
Jul 17 14:21:37 worker14 crio[9601]: time="2024-07-17 14:21:37.789403797Z" level=info msg="Deleting pod <POD> from CNI network \"multus-cni-network\" (type=multus-shim)"
Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.924580    9796 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[NAMESPACE/POD]
Jul 17 14:21:38 worker14 kubenswrapper[9796]: I0717 14:21:38.936882    9796 kubelet.go:2435] "SyncLoop REMOVE" source="api" pods=[NAMESPACE/POD]

It's clear this is a timing issue where the replacement pod tries assigning the IPv6 address before the original pod network has been cleaned up.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    Somewhat intermittent but can reliably be reproduced

Steps to Reproduce:

Steps to reproduce:
- Delete a pod
- Wait for pod to be rescheduled and jump on to the new worker:
- Determine network namespace:
- - $ for ns in $(ip netns | awk '{print $1}'); do ip netns exec $ns ip a | grep -iq 'IP'; if [ $? == 0 ]; then echo $ns; fi; done
- Validate eth1 is in tentative+dadfailed:
- - $ ip netns exec <NS> ip a

Actual results:

 6: eth1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9000 qdisc noqueue state UNKNOWN link/ether 88:e9:a4:71:62:5c brd ff:ff:ff:ff:ff:ff inet6 IPv6_ADDRESS/64 scope global tentative dadfailed <--- FAILED 
valid_lft forever preferred_lft forever inet6 fe80::88e9:a400:371:625c/64 scope link valid_lft forever preferred_lft forever

Expected results:

 No IPv6 DAD failure.

Additional info:

    Note: This was not seen in the impacted cluster until upgraded to 4.14 so this might be regression or new bug.

Assignee:: Ben Pickard

Reporter:: Cory Oldford

QA Contact:: Weibin Liang

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2024/07/17 5:17 PM

Updated:: 2024/09/04 1:04 PM

Details

Description

Attachments

Activity

People

Dates