Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.z, 4.14.z
Component/s: Networking / On-Prem Host Networking
Labels:
- NetworkManager

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

- Issue with upgrade was reported with a node in NotReady state.
- The node was running and was able to ssh though the kubelet was failing.
- As a solution we forced the config to node and it fixed the issue.
But
- The issue can be seen on other node as well.
- The "nodeip-configuration.service" fails and that fails the kubelet service.

So we gathered sosreport and we see some doubtful messages on duplicate IP over NM as,

May 26 03:52:00 <node_hostname> NetworkManager[1330]: <warn> [1716695520.3453] ipv6ll[cf77b8ea4343619a,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
May 26 03:52:10 <node_hostname> kubenswrapper[2072]: I0526 03:52:10.297282 2072 kubelet_getters.go:182] "Pod status updated" pod="openshift-vsphere-infra/keepalived-<node_hostname>" status=Running
May 26 03:52:10 <node_hostname> kubenswrapper[2072]: I0526 03:52:10.297390 2072 kubelet_getters.go:182] "Pod status updated" pod="openshift-vsphere-infra/coredns-<node_hostname>" status=Running
May 26 03:52:10 <node_hostname> NetworkManager[1330]: <warn> [1716695530.3469] platform-linux: do-add-ip6-address[2: fe80::ba32:xxxx:xxx:xxxx]: failure 95 (Operation not supported)
May 26 03:52:12 <node_hostname> NetworkManager[1330]: <warn> [1716695532.3493] platform-linux: do-add-ip6-address[2: fe80::957a:xxxx:xxxx:xxxx]: failure 95 (Operation not supported)

The reason behind the failure of nodeip-configuration seems as,

15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ iface=ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ [[ -z ens192 ]]
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ echo ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + iface=ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 'Node IP interface determined as: ens192. Enabling IP forwarding...'
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: Node IP interface determined as: ens192. Enabling IP forwarding...
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 1
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 1
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: /usr/local/bin/configure-ip-forwarding.sh: line 44: /proc/sys/net/ipv6/conf/ens192/forwarding: No such file or directory
Jul 15 19:29:19 <node_hostname> systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=1/FAILURE
Jul 15 19:29:19 <node_hostname> systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
Jul 15 19:29:19 <node_hostname> systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.

So what we conclude is,

IPv6 is by default enabled in the OpenShift cluster and the cluster has
configs to disable it in additional machine configs.

Issue is only observed and reported during an upgrade with a node in NotReady
state. So, here it can be questioned why all of a sudden IPv6 is creating an
issue when initially it was running fine. Also, after issue is seen, we have a
workaround by forcing the config to node and it fixed the issue though
customer would like to know root cause as this is likely a bug.

Version-Release number of selected component (if applicable):

Upgrade 4.13.39 to 4.14.31

How reproducible:

While upgrade

Steps to Reproduce:

1. Perform upgrade from 4.13.39 to 4.14.31
2.
3.

Actual results:

Node fail to apply update due to kubelet failure.

Expected results:

Shouldn't fail.

Additional info:

Logs on drive : https://drive.google.com/drive/folders/1ls1yAByyzK-Z20i0niqyN9w0IjzrT1pd?usp=sharing ( A Must-Gather and sosreport from affected node)

Assignee:: Benjamin Nemec

Reporter:: Parikshit Khedekar

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/07/19 4:32 AM

Updated:: 2025/09/13 4:51 AM

Resolved:: 2024/07/24 2:33 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates