Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: Networking / openshift-sdn
Labels:
- SDN:Platform:SDN

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Context:

Before staring the upgrade the cluster was:

all COs at 4.12.30 (incl. SDN CO)
all masters at 4.12.30 / CoreOS 412.86 (RHEL 8.6)
all other nodes at 4.12.30 / CoreOS 412.86 (RHEL 8.6)

Then upgrade from 4.12.30 to 4.14.39 started.

it was paused because of an issue on quay.io which prevented images puling.

At the moment, the situation is:

all COs updated to 4.13.52 (incl. SDN CO)
all masters updated to 4.13.52 / CoreOS 413.92 (RHEL 9.2)
all other nodes still at 4.12.30 / CoreOS 412.86 (RHEL 8.6)

Issue

We now see constantly higher iptables-restore time after sdn upgrade from 4.12 to 4.13:

Worker1 - before sdn upgrade (that is, SDN CO still at 4.12):
~~~
$ omc get pod -n openshift-sdn -o wide | grep worker1 | awk '{print $1}' | xargs -I {} omc logs -n openshift-sdn {} -c sdn | grep 'iptables restore' | grep -oE 'total time.*' | head
total time: 2841ms):
total time: 2048ms):
total time: 2004ms):
total time: 2041ms):
total time: 2579ms):
total time: 2639ms):
total time: 2018ms):
total time: 2029ms):
total time: 2891ms):
total time: 2850ms):
~~~

Worker1 - after sdn upgrade (that is, SDN CO still at 4.13):
~~~
$ omc get pod -n openshift-sdn -o wide | grep worker1 | awk '{print $1}' | xargs -I {} omc logs -n openshift-sdn {} -c sdn | grep 'iptables restore' | grep -oE 'total time.*' | head
total time: 6361ms):
total time: 11866ms):
total time: 11121ms):
total time: 9864ms):
total time: 10564ms):
total time: 10325ms):
total time: 10485ms):
total time: 10224ms):
total time: 11450ms):
total time: 11144ms):
~~~

NOTE

Before the starting of the upgrade, all nodes were CoreOS 412.86 (RHEL 8.6) and there was no issue
Now we have a mix of CoreOS 412.86 (RHEL 8.4) / CoreOS 413.92 (RHEL 9.2)
and we see the issue on BOTH the two types of nodes, therefore we suspect is something related to the CNI and not the nodes OS version.

Assignee:: Dan Winship

Reporter:: Alfonso Cancellara

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Involved:: Giovan Battista Salinetti

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/04/03 7:46 PM

Updated:: 2025/09/13 3:29 PM

Resolved:: 2025/07/09 4:26 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates