-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.12.z
-
Important
-
No
-
Rejected
-
False
-
-
Customer Escalated
-
-
-
Description of problem:
Upgrade is stuck due to network cluster operator. Network cluster operator is failing as couple of ovnkube-node PODs are failing with below error in ovnkube-node container. ~~~ 2023-04-21T15:07:31.178062995Z F0421 15:07:31.178047 181667 ovnkube.go:133] error waiting for node readiness: failed to set the node masquerade route to OVN: could not find node IPv6 address to configure OVN masquerade route, addresses: [{Type:InternalIP Address:<IPv4 IP>} {Type:Hostname Address:<hostname>}] ~~~ This issue may come up if IPv6 IP assignment to the primary interface is delayed for some reason. As a result IPv6 IP does not get written to 20-ndenet.conf created by nodeip-cofiguratio service and in turn we do not get IPv6 IP in status of node custom resource. I have followed below workarounds mentioned in bug[1]. - Changed NM_ONLINE_TIMEOUT of NetworkManager-wait-online service to 300 second so that IPv6 Ip gets assigned to primary interface before starting nodeip-configuration service. NetworkManager-wait-online service is still failing after 5 minutes. - Created an environment file for kubelet to refer and hard coded both IPv4 and Ipv6 IP. After rebooting the node, I do not see IPv6 IP in status section of node custom resource. Next action plan: - Add 'may-fail=false' under IPv6 in nmconnection file of bond0. Could not apply this yet as machine config operator is in Degraded state for a different reason and the nmconnection file is being managed by machine config operator. I am not sure why nodeip-configuration service is not writing IPv6 IP to environment file 20-nodenet.conf even after 300 second delay was induced due to 'NM_ONLINE_TIMEOUT' in NetworkManager-wait-online service. I am also not sure why IPv6 IP is not showing in the status section even after mentioning both IPv4 and IPv6 Ip at 98-nodenet-override.conf in /etc/systemd/system/kubelet.service.d/ directory. [1] - https://issues.redhat.com/browse/OCPBUGS-6009
Version-Release number of selected component (if applicable):
Openshift 4.12.4
How reproducible:
Not Sure
Steps to Reproduce:
1. 2. 3.
Actual results:
Upgrade is stuck due to this reason.
Expected results:
Upgrade should proceed
Additional info:
Will add in comments section
- is caused by
-
OCPBUGS-4411 ovnkube node pod crashed after converting to a dual-stack cluster network
- Closed
- links to