Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77371

GlobalPullSecret e2e test flaky due to race condition with CountAvailableNodes during kubelet restarts

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.22
    • HyperShift
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • Proposed
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-67262. The following is the description of the original issue:

      Description of problem:

      The EnsureGlobalPullSecret e2e test is flaky because CountAvailableNodes() is called while nodes are temporarily NotReady due to kubelet restarts triggered by global-pull-secret-syncer.
      
      When additional-pull-secret is updated or deleted, the global-pull-secret-syncer DaemonSet updates /var/lib/kubelet/config.json and restarts kubelet on each node. If CountAvailableNodes() is called during this window, it returns a stale count (e.g., 1 instead of 2).
      
      This causes waitForDaemonSetsReady() to either:
      - False pass: actualReady(1) == nodeCount(1) passes prematurely
      - False fail: actualReady(2) != nodeCount(1) fails forever with "2/1 pods ready"
      

      Version-Release number of selected component (if applicable):

      HyperShift main branch (affects all versions with GlobalPullSecret e2e test)
      

      How reproducible:

      Intermittent - depends on timing of kubelet restart vs. node count query
      

      Steps to Reproduce:

      N/A , it's a flake that shows intermittently in pre-submits

      Actual results:

      Test either passes prematurely (missing pods) or fails with message like:
      "DaemonSet global-pull-secret-syncer not ready: 2/1 pods ready"
      

      Expected results:

      Test should wait for nodes to stabilize after kubelet-restarting operations and use NodePool replicas as authoritative source for expected node count.
      

              rh-ee-aabdelre Ahmed Abdalla Abdelrehim
              rh-ee-aabdelre Ahmed Abdalla Abdelrehim
              None
              None
              Yu Li Yu Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: