Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.22
Component/s: HyperShift
Labels:
- triaged

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:

4.21.z
Target Version:

4.22.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Proposed
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The EnsureGlobalPullSecret e2e test is flaky because CountAvailableNodes() is called while nodes are temporarily NotReady due to kubelet restarts triggered by global-pull-secret-syncer.

When additional-pull-secret is updated or deleted, the global-pull-secret-syncer DaemonSet updates /var/lib/kubelet/config.json and restarts kubelet on each node. If CountAvailableNodes() is called during this window, it returns a stale count (e.g., 1 instead of 2).

This causes waitForDaemonSetsReady() to either:
- False pass: actualReady(1) == nodeCount(1) passes prematurely
- False fail: actualReady(2) != nodeCount(1) fails forever with "2/1 pods ready"

Version-Release number of selected component (if applicable):

HyperShift main branch (affects all versions with GlobalPullSecret e2e test)

How reproducible:

Intermittent - depends on timing of kubelet restart vs. node count query

Steps to Reproduce:

N/A , it's a flake that shows intermittently in pre-submits

Actual results:

Test either passes prematurely (missing pods) or fails with message like:
"DaemonSet global-pull-secret-syncer not ready: 2/1 pods ready"

Expected results:

Test should wait for nodes to stabilize after kubelet-restarting operations and use NodePool replicas as authoritative source for expected node count.

blocks

OCPBUGS-77371 GlobalPullSecret e2e test flaky due to race condition with CountAvailableNodes during kubelet restarts

is cloned by

OCPBUGS-77371 GlobalPullSecret e2e test flaky due to race condition with CountAvailableNodes during kubelet restarts

links to

openshift/hypershift#7378: OCPBUGS-67262: fix(e2e): Use NodePool replicas as authoritative source for expected node count instead of CountAvailableNodes

openshift/hypershift#7638: OCPBUGS-67262: fix(sync-global-pullsecret): compare content ignoring trailing newlines

Assignee:: Ahmed Abdalla Abdelrehim

Reporter:: Ahmed Abdalla Abdelrehim

Need Info From:: None

Contributors:: None

QA Contact:: Wen Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/12/12 1:25 AM

Updated:: 2026/02/26 10:52 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates