-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.22
Description of problem:
The EnsureGlobalPullSecret e2e test is flaky because CountAvailableNodes() is called while nodes are temporarily NotReady due to kubelet restarts triggered by global-pull-secret-syncer. When additional-pull-secret is updated or deleted, the global-pull-secret-syncer DaemonSet updates /var/lib/kubelet/config.json and restarts kubelet on each node. If CountAvailableNodes() is called during this window, it returns a stale count (e.g., 1 instead of 2). This causes waitForDaemonSetsReady() to either: - False pass: actualReady(1) == nodeCount(1) passes prematurely - False fail: actualReady(2) != nodeCount(1) fails forever with "2/1 pods ready"
Version-Release number of selected component (if applicable):
HyperShift main branch (affects all versions with GlobalPullSecret e2e test)
How reproducible:
Intermittent - depends on timing of kubelet restart vs. node count query
Steps to Reproduce:
N/A , it's a flake that shows intermittently in pre-submits
Actual results:
Test either passes prematurely (missing pods) or fails with message like: "DaemonSet global-pull-secret-syncer not ready: 2/1 pods ready"
Expected results:
Test should wait for nodes to stabilize after kubelet-restarting operations and use NodePool replicas as authoritative source for expected node count.
- blocks
-
OCPBUGS-77371 GlobalPullSecret e2e test flaky due to race condition with CountAvailableNodes during kubelet restarts
-
- New
-
- is cloned by
-
OCPBUGS-77371 GlobalPullSecret e2e test flaky due to race condition with CountAvailableNodes during kubelet restarts
-
- New
-
- links to