-
Bug
-
Resolution: Not a Bug
-
Major
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
All
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Customer is experiencing intermittent packet drops when traffic flows from an external mainframe application through MetalLB to a pod (Orchestrator). The workaround involves setting rp_filter=0 on all interfaces, but this is not persistent, and requires manual intervention after pod restarts or node reboots. The customer requests a persistent and supported mechanism to set rp_filter=0 that also applies to CNI-created pod interfaces (e.g., via OVN-Kubernetes).
Version-Release number of selected component (if applicable):
OpenShift 4.1x
Component: ovn-kubernetes, possibly tuned or machine-config-operator if related to applying system-level settings.
How reproducible:
Intermittent – the issue appears during certain pod rollouts or under high-volume traffic scenarios.
With rp_filter=0 manually configured on all interfaces, the issue has not occurred for up to 16 days of continuous operation.
Without the configuration, the issue may appear within 3–5 days during normal application load or redeployments.
Steps to reproduce:
- Deploy a pod that receives traffic from an external source via a MetalLB load balancer.
- Ensure the external client sends persistent or high-volume TCP traffic to the MetalLB VIP.
- Without setting rp_filter=0 on the pod network interfaces, observe that:
- Some initial TCP SYN packets do not reach the pod.
- Packet retransmissions may occur.
- In some cases, the application fails to receive connections or data.
- Apply rp_filter=0 on all interfaces, including pod interfaces created by OVN-Kubernetes:
for iface in /proc/sys/net/ipv4/conf/*; do echo 0 > "$iface/rp_filter"; done
- Observe that traffic flows correctly and reliably reaches the application pod over extended timeframes.
Actual results:
After configuring rp_filter=0 on all interfaces (including pod network interfaces) via a manual script, the customer observed significantly improved stability of their application (Pulse), increasing the successful runtime from 5 days to over 16 days. However, this setting is not persistent across node reboots, pod deployments, or node restarts, requiring manual re-application. Additionally, current tools like Tuned profiles do not apply rp_filter=0 to pod interfaces due to CNI limitations. As a result, there is no supported, persistent mechanism to enforce rp_filter=0 for all network interfaces, especially those used by pods.
Expected results:
A supported and persistent mechanism to configure rp_filter=0 on all relevant network interfaces, including those dynamically created for pod attachments by the CNI (OVN-Kubernetes). This mechanism should survive node reboots, pod redeployments, and workload rollouts without requiring manual intervention. Ideally, this configuration should be manageable via MachineConfig, Tuned profiles, or an operator-level setting that applies the value consistently across the entire node network stack.
Additional info:
Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.
Affected Platforms:
Customer Environment:
Cluster: OpenShift on vSphere
CNI: OVN-Kubernetes
IPsec: Enabled
MetalLB: Used to expose services externally
rp_filter: Set to 0 manually by customer for all interfaces, including pod veths
Current workaround / observation:
- Setting rp_filter=0 persistently on all interfaces including pod interfaces mitigates the issue.
- Without it, traffic is dropped or delayed due to reverse path filtering logic.
- However, customer confirms that Tuned does not apply this setting to pod interfaces due to CNI constraints.
Customer is requesting:
- Confirmation from engineering on best way to persist rp_filter=0 on all interfaces (including pod interfaces).
- RFE (Request for Enhancement) to support managing rp_filter on pod veths, possibly via machine config or other supported mechanism.