Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Networking / ovn-kubernetes
Labels:
- SDN-Bug-Backlog-Pruning

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
SDN Sprint 240, SDN Sprint 241
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Issue: Customer has deployed a new cluster and has observed that console/auth routes are degraded. Observed that openshift-ingress-operator is firing alert on reachability for the canary route.

- Observed that the canary route pods are up/serving traffic
- Observed that the router pods CAN reach the route/pods for canary traffic
- Observed that the openshift-ingress-operator pod CANNOT communicate with the coreDNS service IP at 172.30.0.10 (therefore failing route lookup/subsequent call)
- Observed that the openshift-ingress-operator pod CANNOT communicate with the service for ingress-canary 
- Observed that TEST POD in NEW NAMESPACE can ALSO not call the service for ingress-canary
- Observed that deleting the service is not automatically recreated; rebuilding a fresh service with new clusterIP allows TEST POD to succeed in calls to the new service for ingress-canary, but operator pod is still unable to call the service IP.
- Observed that the target lr-lb-list for the service on all hosts exists, and is valid/plumbed as expected to the correct canary backends. 
- Attempted to rebuild the OVNKUBE database (*succeeded) --> no change.

- No firewall rules in the way (single switch interconnect for all vms on the cluster)
- geneve port is unblocked; and we can call specific pod IP's throughout the cluster
- The default kube-apiservice clusterIP (172.30.0.1) is reachable from all pods, not blocked, which implies that generally, ovn flows are working but CERTAIN flows are obstructed

- all nodes are on the same vlan/subnet and the network plane is flat for the platform.

Version-Release number of selected component (if applicable):

4.13.0, vmware, UPI, ovnkubernetes

How reproducible:

every time

Steps to Reproduce:

1. spin up a test pod using generic ubi8 image with IP tools from quay.io/rhn_support_wrussell/iputils-container:latest
2. curl target services. Attempt dig on service and observe timeout from coredns service. specify target upstream DNS service with @<upstreamIP> and observe dig succeed immediately on calls.
3. observe curl to service IP from node and pod on node fail. Recreated test services work (at least when we tested to the ingress-canary namespace service that we deleted/rebuilt)
4. observe that the openshift-ingress-operator pod can NOT curl the ingress-canary service even after being recreated.

Actual results:

communication failure in the cluster

Expected results:

pods/services that are unobstructed by networkpolicy and network layers that are flat should be able to communicate

Additional info:

data uploads in next comment attached to issue.

Assignee:: Flavio Fernandes (Inactive)

Reporter:: Will Russell

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/07/26 2:51 PM

Updated:: 2025/09/12 11:01 PM

Resolved:: 2023/10/11 9:50 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates