Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.15.z
Component/s: Networking / ovn-kubernetes
Labels:
- sbr-untriaged

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.18, 4.18.z
Target Backport Versions:

4.15.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Technical Impact Notes:
10/08 Single node failing pod to api; Sev 3 case
PX Review Complete:

Description of problem:

Pod to pod communication timing out happening only on one node of a cluster.

Initial issue happened when setting up the nvidia-driver-daemonset

Not all pods are affected as "openshift-network-diagnostics" pods running on that host seems to work, but others are failing.

All fails with error:
dial tcp 172.30.0.1:443: i/o timeout
Version-Release number of selected component (if applicable):

Openshift 4.15.28

How reproducible:

Seems always reproducible in that specific node

Steps to Reproduce:

1.Deploy nvidia-driver-daemonset

Actual results:

Only observed error in that node so far is:

$ less openvswitch/journalctl_--no-pager_--unit_ovs-vswitchd
...
Aug 08 04:11:43 node.cluster.example.com ovs-vswitchd[3116]: ovs|00002|dpif(handler416)|WARN|system@ovs-system: execute ct(commit,zone=111,mark=0/0x1,nat(src)),ct(zone=42,nat),recirc(0x11a957) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=0a:58:xx:yy:zz:e7,dl_dst=0a:58:xx:yy:zz:18,nw_src=10.xxx.17.231,nw_dst=10.xxx.16.24,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=8140,tp_dst=40832,tcp_flags=psh|ack tcp_csum:6a30

Expected results:

No error

Additional info:

This is a baremetal node with GPU, but is not the only one, there are other 2 that have are part of a different machine-config-pool and doesn't have any reported issue.

Affected Platforms:
Agnostic cluster with virtualized and baremetal nodes

is related to

OCPBUGS-12251 Continuation of reopened BZ2100045 - OVS complains Invalid Argument on TCP packets going into conntrack

Closed

Assignee:: sdn-team bot

Reporter:: Mario Abajo Duran

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/10/01 3:37 PM

Updated:: 2024/11/11 5:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates