[OCPBUGS-12431] ovnkube-node POD is failing due to late assignment of IPv6 IP to primary interface in dual stack environment - Red Hat Issue Tracker

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: Networking / runtime-cfg
Labels:
- hh1

Severity:
Important
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Customer Impact:

Customer Escalated
RH Private Keywords:
Escape Impact:
SDLC stage when should've been found:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

Upgrade is stuck due to network cluster operator. Network cluster operator is failing as couple of ovnkube-node PODs are failing with below error in ovnkube-node container.

~~~
2023-04-21T15:07:31.178062995Z F0421 15:07:31.178047 181667 ovnkube.go:133] error waiting for node readiness: failed to set the node masquerade route to OVN: could not find node IPv6 address to configure OVN masquerade route, addresses: [{Type:InternalIP Address:<IPv4 IP>} {Type:Hostname Address:<hostname>}]

~~~

This issue may come up if IPv6 IP assignment to the primary interface is delayed for some reason. As a result IPv6 IP does not get written to 20-ndenet.conf created by nodeip-cofiguratio service and in turn we do not get IPv6 IP in status of node custom resource. I have followed below workarounds mentioned in bug[1].

- Changed NM_ONLINE_TIMEOUT of NetworkManager-wait-online service to 300 second so that IPv6 Ip gets assigned to primary interface before starting nodeip-configuration service. NetworkManager-wait-online service is still failing after 5 minutes.
- Created an environment file for kubelet to refer and hard coded both IPv4 and Ipv6 IP. After rebooting the node, I do not see IPv6 IP in status section of node custom resource.

Next action plan:

- Add 'may-fail=false' under IPv6 in nmconnection file of bond0. Could not apply this yet as machine config operator is in Degraded state for a different reason and the nmconnection file is being managed by machine config operator.

I am not sure why nodeip-configuration service is not writing IPv6 IP to environment file 20-nodenet.conf even after 300 second delay was induced due to 'NM_ONLINE_TIMEOUT' in NetworkManager-wait-online service. I am also not sure why IPv6 IP is not showing in the status section even after mentioning both IPv4 and IPv6 Ip at 98-nodenet-override.conf in /etc/systemd/system/kubelet.service.d/ directory.

[1] - https://issues.redhat.com/browse/OCPBUGS-6009

Version-Release number of selected component (if applicable):

Openshift 4.12.4

How reproducible:

Not Sure

Steps to Reproduce:

1.
2.
3.

Actual results:

Upgrade is stuck due to this reason.

Expected results:

Upgrade should proceed

Additional info:

Will add in comments section

is caused by

OCPBUGS-4411 ovnkube node pod crashed after converting to a dual-stack cluster network

Closed

links to

https://access.redhat.com/support/cases/#/case/03482454

Assignee:: Benjamin Nemec

Reporter:: Arnab Ghosh

QA Contact:: Zhanqi Zhao

Need Info From:: Arnab Ghosh

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: 2023/04/24 6:30 AM

Updated:: 2024/07/31 6:00 PM

Resolved:: 2023/06/26 1:36 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide