Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: Networking / openshift-sdn
Labels:
- SDN:Platform:SDN

Severity:
Important
Regression:
No
Sprint:
SDN Sprint 251, SDN Sprint 252
sprint_count:
2
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Impact Range:
PX Review Complete:
PX Technical Impact:

Description of problem:

Customer is updating a cluster running OpenshiftSDN from 4.12.45 to 4.13.34. During the upgrade, we observe that Routes for namespaces with NetworkPolicies do no longer work as expected (timeouts, Router returns HTTP 503) and traffic is blocked. After the upgrade finishes (Nodes are restarted with the new version), the traffic is working as expected again. Customer cluster is `platform: vsphere`.

The symptoms are the same as in ~~OCPBUGS-28920~~. If a Route in a namespace with NetworkPolicies is accessed, this fails with HTTP 503:

curl -I https://nginx-unprivileged-2-poi-user-walds-dev.apps.cl1.ocp4-sandbox.example.com
HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Connection: close
pragma: no-cache
cache-control: private, max-age=0, no-cache, no-store
Strict-Transport-Security: max-age=31536000

However after the upgrade completes, traffic works again as expected. During the upgrade, we observe that the labels on the "openshift-host-network" namespace seem to be correct:

$ oc get namespace openshift-host-network --show-labels
NAME                     STATUS   AGE   LABELS
openshift-host-network   Active   23h   kubernetes.io/metadata.name=openshift-host-network,network.openshift.io/policy-group=ingress,policy-group.network.openshift.io/host-network=,policy-group.network.openshift.io/ingress=

It looks like the issue is related to a Node configuration. Workaround is to apply the labels described in https://access.redhat.com/solutions/7055050 for OpenshiftSDN.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.45
OpenShift Container Platform 4.13.34

How reproducible:

Was so far unable to reproduce the issue on AWS, customer cluster is on vSphere. Customer can reproduce it consistently.

When pausing the `worker` MachineConfigPool before the upgrade, the issue can be reproduced and the cluster can be kept in the non-working state.

Steps to Reproduce:

1. Install a cluster with OCP 4.12.45 with OpenshiftSDN on vSphere
2. Create an application and create a NetworkPolicy allowing traffic from OpenShift Ingress ("allow-from-openshift-ingress") using the "network.openshift.io/policy-group: ingress" label
3. Observe that the application is reachable via the application Route
4. Pause the MachineConfigPool for workers: `oc patch mcp/worker --type merge --patch '{"spec":{"paused":true}}'`
5. Start the upgrade to OCP 4.13.34
6. Wait until the Cluster Network Operator is updated

Actual results:

During the upgrade and while the "worker" MCP is paused, traffic to the Route results in HTTP 503.

Expected results:

A short timeframe where traffic does not work is expected. However, this should be less than a minute. Traffic should not be blocked once all Cluster Operators have finished updating.

Additional info:

must-gather from a broken cluster is available in Support Case 03770009 (comment #22)
sosreport before the upgrade for a node is available in Support Case 03770009 (comment #24)
sosreport after the upgrade for a node is available in Support Case 03770009 (comment #25)

links to

During upgrade from OCP 4.12 to 4.13, namespaces with NetworkPolicies are not reachable

Assignee:: Nadia Pinaeva

Reporter:: Simon Krenger

QA Contact:: Anurag Saxena

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/03/27 3:43 PM

Updated:: 2024/04/17 1:23 PM

Resolved:: 2024/04/17 1:23 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates