Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.10.z
Component/s: Networking / ovn-kubernetes
Labels:
- SNO
- driver
- ovn

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Proposed
Sprint:
None

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test Coverage:

-

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Connections to the cluster are lost almost every time when installing out of tree driver. The state is permanent until the host server is manually rebooted.

I will share the link to sosreport collected when the issue was present. The following are the observation:

1. Kubelet service is inactive.

* kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           `-10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
   Active: inactive (dead)

2. NetworkManager-wait-online.service in the failed state.

* NetworkManager-wait-online.service - Network Manager Wait Online
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2022-12-12 14:47:17 UTC; 33min ago
     Docs: man:nm-online(1)
  Process: 3111 ExecStart=/usr/bin/nm-online -s -q (code=exited, status=1/FAILURE)
 Main PID: 3111 (code=exited, status=1/FAILURE)
      CPU: 347msDec 12 14:46:17 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: Starting Network Manager Wait Online...
Dec 12 14:47:17 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Dec 12 14:47:17 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Dec 12 14:47:17 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: Failed to start Network Manager Wait Online.
Dec 12 14:47:17 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: NetworkManager-wait-online.service: Consumed 347ms CPU time

3. ovs-configuration.service in the failed state.

* ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2022-12-12 14:49:41 UTC; 30min ago
  Process: 3632 ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes (code=exited, status=1/FAILURE)
 Main PID: 3632 (code=exited, status=1/FAILURE)
      CPU: 5.391sDec 12 14:49:41 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]:     link/ether b4:96:91:c0:85:53 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Dec 12 14:49:41 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: + ip route show
Dec 12 14:49:41 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: 10.88.0.0/16 dev cni-podman0 proto kernel scope link src 10.88.0.1 linkdown
Dec 12 14:49:41 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net systemd[1]: ovs-configuration.service: Consumed 5.391s CPU time
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: + ip -6 route show
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: ::1 dev lo proto kernel metric 256 pref medium
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: fe80::/64 dev cni-podman0 proto kernel metric 256 linkdown pref medium
Dec 12 14:49:42 master0.a1202-7-11-u10-s3-oe20rannic-sno.lab.neat.nsn-rdnet.net configure-ovs.sh[3632]: + exit 1

Version-Release number of selected component (if applicable):

4.10.32

How reproducible:

Always

Steps to Reproduce:

1. Apply the MC that will load the drivers. ( MC available in must-gather )
2. During the mcp reboot the connection to the node will be lost.

Actual results:

Connection lost to the node after mcp reboot.

Expected results:

Connection should not lost with the node after mcp reboot.

Additional info:

Workaround: 

- Manually rebooting the node will restore the connections.

Assignee:: Ben Bennett

Reporter:: Akshit Kumawat

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/12/13 4:13 PM

Updated:: 2025/09/13 2:25 PM

Resolved:: 2022/12/14 12:07 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide