-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
OpenShift 4.14.44 cluster running IPSEC on OVNKubernetes - Observing that certain nodes in the cluster are unable to communicate with one another as expected - packets are dropped (or lost) in transit and are not answered by peers. - Tested using IPSEC validation script to query between nodes [1] and observed that we are unable to successfully call peer pods (openshift-dns pod to openshift-dns pod across hosts, expected unobstructed call, requires geneve + ipsec tunnel encapsulation. Succeeds from all nodes to all nodes but one, and on that one inpacted node, fails to call about half of the nodes in the cluster. - We confirmed that the IC-subnet range for the nodes are valid (100.96.0.0/16 including on impacted node) - We restarted the ovnkube-node pods on all hosts, no change - Validated to the best of our ability that IPSEC handshakes look good and normal but pulled sampling from these host nodes + sosreports for review. - Need assistance confirming that OVNKube and IPSEC flows are working properly.
Version-Release number of selected component (if applicable):
4.14.44
How reproducible:
- replicated on two clusters (customer envs), but on one cluster rebooting the host node appeared to mitigate the behavior for a few days before it came back - left the other cluster impacted for diagnostics purposes.
Steps to Reproduce:
- Unknown - 4.14.44 cluster with manually migrated IC-subnet value (100.96.0.0/16) - IPSEC defined at install time - Vsphere - pod subnet overlap in 100.80.0.0/12 subnet requires IC-migration
Actual results:
Pod to pod communication failure between impacted host and neighboring node.
Expected results:
communication should not be blocked on 4.14.44 - we are upstream of both IPSEC and OVNKUBE handling issues that previously impacted communication as outlined in: https://access.redhat.com/solutions/7091399 https://access.redhat.com/solutions/7088635 https://access.redhat.com/solutions/7103865
Additional info:
- Additional data and uploads in next comments (internal).
- Possibly related to: https://issues.redhat.com/browse/OCPBUGS-42616 (?)