-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13.z, 4.14.z
-
Moderate
-
None
-
False
-
Description of problem:
- Issue with upgrade was reported with a node in NotReady state.
- The node was running and was able to ssh though the kubelet was failing.
- As a solution we forced the config to node and it fixed the issue.
But
- The issue can be seen on other node as well.
- The "nodeip-configuration.service" fails and that fails the kubelet service.
So we gathered sosreport and we see some doubtful messages on duplicate IP over NM as,
May 26 03:52:00 <node_hostname> NetworkManager[1330]: <warn> [1716695520.3453] ipv6ll[cf77b8ea4343619a,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
May 26 03:52:10 <node_hostname> kubenswrapper[2072]: I0526 03:52:10.297282 2072 kubelet_getters.go:182] "Pod status updated" pod="openshift-vsphere-infra/keepalived-<node_hostname>" status=Running
May 26 03:52:10 <node_hostname> kubenswrapper[2072]: I0526 03:52:10.297390 2072 kubelet_getters.go:182] "Pod status updated" pod="openshift-vsphere-infra/coredns-<node_hostname>" status=Running
May 26 03:52:10 <node_hostname> NetworkManager[1330]: <warn> [1716695530.3469] platform-linux: do-add-ip6-address[2: fe80::ba32:xxxx:xxx:xxxx]: failure 95 (Operation not supported)
May 26 03:52:12 <node_hostname> NetworkManager[1330]: <warn> [1716695532.3493] platform-linux: do-add-ip6-address[2: fe80::957a:xxxx:xxxx:xxxx]: failure 95 (Operation not supported)
The reason behind the failure of nodeip-configuration seems as,
15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ iface=ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ [[ -z ens192 ]]
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1682]: ++ echo ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + iface=ens192
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 'Node IP interface determined as: ens192. Enabling IP forwarding...'
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: Node IP interface determined as: ens192. Enabling IP forwarding...
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 1
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: + echo 1
Jul 15 19:29:19 <node_hostname> configure-ip-forwarding.sh[1680]: /usr/local/bin/configure-ip-forwarding.sh: line 44: /proc/sys/net/ipv6/conf/ens192/forwarding: No such file or directory
Jul 15 19:29:19 <node_hostname> systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=1/FAILURE
Jul 15 19:29:19 <node_hostname> systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
Jul 15 19:29:19 <node_hostname> systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.
So what we conclude is,
IPv6 is by default enabled in the OpenShift cluster and the cluster has
configs to disable it in additional machine configs.
Issue is only observed and reported during an upgrade with a node in NotReady
state. So, here it can be questioned why all of a sudden IPv6 is creating an
issue when initially it was running fine. Also, after issue is seen, we have a
workaround by forcing the config to node and it fixed the issue though
customer would like to know root cause as this is likely a bug.
Version-Release number of selected component (if applicable):
Upgrade 4.13.39 to 4.14.31
How reproducible:
While upgrade
Steps to Reproduce:
1. Perform upgrade from 4.13.39 to 4.14.31
2.
3.
Actual results:
Node fail to apply update due to kubelet failure.
Expected results:
Shouldn't fail.
Additional info:
Logs on drive : https://drive.google.com/drive/folders/1ls1yAByyzK-Z20i0niqyN9w0IjzrT1pd?usp=sharing ( A Must-Gather and sosreport from affected node)