Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: Networking / On-Prem Load Balancer
Labels:
- disruption
- trt-standup

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem

Beginning around June 18th, we can see an increase in disruption metrics for 'kube-api-new-connections' in Metal and vSphere in '4.20' duing 'micro' upgrades. Changing the 'lookback days' to '1' allows us to see the pattern in the graph that leads to the 06/18/25 date. In '4.19' the pattern is less clear in Metal, but does appear present in VSphere. There are many occurrences of disruption within the job runs in the window of 4s and above. The percentage of non-zero disruption seems to be around 85-90% throughout the window.

The increased disruption values are present in a various different jobs, including the following:

periodic-ci-openshift-release-master-ci-4.20-e2e-vsphere-runc-upgrade (example)
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-upgrade-runc (example)
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm-upgrade (example)
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-upgrade-ovn-ipv6 (example)

Additional Information

It appears that the disruption occurs at the same time that there are alerts in the KubeletLog about failures to update lease.
These intervals show that a portion of the disruption correlates with the kube-apiserver shutdown. It starts prior to the shutdown, but there still may be some relevance.
Looking at PRs merged around 6/18, this one sticks out as potentially relevant (worth looking at, but I have no reason to believe it is the cause other than the timing).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

screenshot-1.png
14 kB
2025/09/09 12:43 PM
Screenshot 2025-07-30 at 1.34.10 PM.png
198 kB
2025/07/30 5:34 PM

relates to

OCPBUGS-37153 kubenswrapper: Unexpected EOF during watch stream event decoding errors

Assignee:: Benjamin Nemec

Reporter:: Stephen Goeddel

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/07/29 2:07 PM

Updated:: 2025/09/10 2:34 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates