Loading...

Type: Feature Request
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16, 4.18, 4.17
Component/s: Network - Core
Labels:
None

Target Version:
None
Activity Type:
Quality / Stability / Reliability
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Description of problem:
In an external host (not part of the OpenShift cluster), Cu has created static routes pointing to the POD ranges.

When not using routingViaHost:true he can ping the ping without any problem
When using routingViaHost:true the POD receives the ICMP request, sends out an ICMP reply but doesn´t get out of the OpenShift node. The OpenShift cluster doesn´t have any added routes in the host.

In the next capture it is shown the source IP of the external host (10.1.10.7) and the destination POD (10.130.0.60)

```
sh-4.4# tcpdump -nel -i any host 10.1.10.7
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
10:14:46.306134 In 52:54:00:81:ae:50 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1841, length 64
10:14:46.306222 Out 0a:58:0a:82:00:01 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1841, length 64
10:14:46.306253 P 0a:58:0a:82:00:3c ethertype IPv4 (0x0800), length 100: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1841, length 64
10:14:47.306098 In 52:54:00:81:ae:50 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1842, length 64
10:14:47.306193 Out 0a:58:0a:82:00:01 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1842, length 64
10:14:47.306224 P 0a:58:0a:82:00:3c ethertype IPv4 (0x0800), length 100: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1842, length 64
10:14:48.305904 In 52:54:00:81:ae:50 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1843, length 64
10:14:48.306005 Out 0a:58:0a:82:00:01 ethertype IPv4 (0x0800), length 100: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1843, length 64
10:14:48.306041 P 0a:58:0a:82:00:3c ethertype IPv4 (0x0800), length 100: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1843, length 64
^C

```

Why is the ICMP reply not forwarded to an external interface?

If capture in the interface of the POD I can see the following:

```
sh-4.4# ip link | grep 160
160: 4dd0988f2194e24@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default
sh-4.4# tcpdump -nel -i 4dd0988f2194e24
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on 4dd0988f2194e24, link-type EN10MB (Ethernet), capture size 262144 bytes
10:16:22.307980 0a:58:0a:82:00:01 > 0a:58:0a:82:00:3c, ethertype IPv4 (0x0800), length 98: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1937, length 64
10:16:22.308089 0a:58:0a:82:00:3c > 0a:58:0a:82:00:01, ethertype IPv4 (0x0800), length 98: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1937, length 64
10:16:23.308037 0a:58:0a:82:00:01 > 0a:58:0a:82:00:3c, ethertype IPv4 (0x0800), length 98: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1938, length 64
10:16:23.308101 0a:58:0a:82:00:3c > 0a:58:0a:82:00:01, ethertype IPv4 (0x0800), length 98: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1938, length 64
10:16:24.308036 0a:58:0a:82:00:01 > 0a:58:0a:82:00:3c, ethertype IPv4 (0x0800), length 98: 10.1.10.7 > 10.130.0.60: ICMP echo request, id 65037, seq 1939, length 64
10:16:24.308112 0a:58:0a:82:00:3c > 0a:58:0a:82:00:01, ethertype IPv4 (0x0800), length 98: 10.130.0.60 > 10.1.10.7: ICMP echo reply, id 65037, seq 1939, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
```
Where the destination MAC of the ICMP reply is the next interface:

```
sh-4.4# ip link | grep 0a:58:0a:82:00:01
<nothing>
sh-4.4# arp -an | grep 0a:58:0a:82:00:01
? (10.130.0.1) at 0a:58:0a:82:00:01 [ether] PERM on ovn-k8s-mp0
```

but doing a tcpdump in that interface shows nothing

```
sh-4.4# tcpdump -nel -i ovn-k8s-mp0 host 10.1.10.7
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovn-k8s-mp0, link-type EN10MB (Ethernet), capture size 262144 bytes
```
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Reproduction, from an external Linux (10.1.10.4):

[cloud-user@ocp-provisioner ~]$ oc get network cluster -o yaml
[...]
spec:
clusterNetwork:

cidr: 10.130.0.0/16
hostPrefix: 20
This is a single node openshift, otherwise i would need to specify the appropriate /20:

[root@ocp-provisioner cloud-user]# ip route add 10.130.0.0/16 via 10.1.10.9
[cloud-user@ocp-provisioner ~]$ oc -n default get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
shell-demo 1/1 Running 0 10m 10.130.0.25 master-1.ocp3.f5-udf.com <none> <none>
[cloud-user@ocp-provisioner ~]$ ping 10.130.0.25
PING 10.130.0.25 (10.130.0.25) 56(84) bytes of data.
64 bytes from 10.130.0.25: icmp_seq=1 ttl=62 time=4.89 ms
64 bytes from 10.130.0.25: icmp_seq=2 ttl=62 time=1.78 ms
64 bytes from 10.130.0.25: icmp_seq=3 ttl=62 time=0.793 ms
^C
— 10.130.0.25 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.793/2.488/4.893/1.747 ms
Applying the change:
[cloud-user@ocp-provisioner ~]$ oc patch network.operator/cluster --type merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":

{"routingViaHost":true}

}}}}'
network.operator.openshift.io/cluster patched
Wait until OVN change is applied and test

[cloud-user@ocp-provisioner ~]$ watch oc get co
[...]
[cloud-user@ocp-provisioner ~]$ ping 10.130.0.25
PING 10.130.0.25 (10.130.0.25) 56(84) bytes of data.
^C
— 10.130.0.25 ping statistics —
5 packets transmitted, 0 received, 100% packet loss, time 4123ms

Actual results:
Traffic is getting dropped

Expected results:
The traffic should not get dropped

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with "sbr-untriaged"
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"
For guidance on using this template please see
OCPBUGS Template Training for Networking components

relates to

OCPBUGS-58325 F5 CIS - Reply from pod dropped inside OVN

New

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide