Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: kube-apiserver
Labels:
- rits-work

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

During an investigate of the latest round of unexpected node not ready failures on metal upgrade jobs, I found two jobs that have similar issues. These are new failures to me. It seems that both are failing during the process of a gracefulshutdown of the kube-apiserver.

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Aggregation=none&Architecture=amd64&FeatureSet=default&Installer=ipi&LayeredProduct=none&Network=ovn&NetworkAccess=default&Platform=aws&Procedure=none&Scheduler=default&SecurityMode=default&Suite=serial&Topology=ha&Upgrade=none&baseEndTime=2024-10-01%2023%3A59%3A59&baseRelease=4.17&baseStartTime=2024-09-01%2000%3A00%3A00&capability=Other&columnGroupBy=Architecture%2CNetwork%2CPlatform%2CTopology&component=Node%20%2F%20Kubelet&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20aws%20serial%20ha%20none&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=CGroupMode%3Av2&includeVariant=ContainerRuntime%3Arunc&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Network%3Aovn&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&includeVariant=Topology%3Amicroshift&minFail=3&passRateAllTests=0&passRateNewTests=95&pity=5&sampleEndTime=2024-11-05%2023%3A59%3A59&sampleRelease=4.18&sampleStartTime=2024-10-29%2000%3A00%3A00&testId=openshift-tests%3A1f3a2a9f8d7b6e8deb502468746bc363&testName=%5Bsig-node%5D%20node-lifecycle%20detects%20unexpected%20not%20ready%20node

Job run 1: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1864621506026278912

Job run 2: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1864621506026278912

Drilling into job run 1 (2 has similar problem):

I see that the UnexpectedNodeNotReady happens during a graceful shutdown of the apiserver.

There are interesting events in etcd, kubeapiserver pods being killed at this time.

Could apiserver look deeper into these issues?

links to

CR regression

openshift/ptp-operator#615: OCPBUGS-45846,OCPBUGS-58794: CVE-2024-45339 openshift4/ose-ptp-rhel9-operator: Vulnerability when creating log files in github.com/golang/glog [openshift-4.18.z]

Assignee:: Arda Guclu

Reporter:: Kevin Hannon

Need Info From:: None

Contributors:: None

QA Contact:: Ke Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/12/06 3:26 PM

Updated:: 2025/07/18 1:13 PM

Resolved:: 2025/03/05 9:13 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates