Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-66063

[oc] Kubectl subresource GET test flaking

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Approved
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type

      Significant regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 100.00% to 96.58%.

      Sample (being evaluated) Release: 4.21
      Start Time: 2025-11-19T00:00:00Z
      End Time: 2025-11-26T12:00:00Z
      Success Rate: 96.58%
      Successes: 141
      Failures: 5
      Flakes: 0
      Base (historical) Release: 4.20
      Start Time: 2025-09-21T00:00:00Z
      End Time: 2025-10-21T23:59:59Z
      Success Rate: 100.00%
      Successes: 741
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      Summary

      Test "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type" fails due to race condition with Node Lifecycle test

      Description

      The test [sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type fails intermittently when running in parallel with [sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance].

      Root Cause

      Race condition between two concurrently running tests:

      1. The Node Lifecycle test creates a fake node with pattern e2e-fake-node-<random>
      2. The kubectl subresource test lists all nodes and selects the first node from the list
      3. The Node Lifecycle test deletes the fake node as part of its lifecycle testing
      4. The kubectl subresource test attempts to GET the node but it no longer exists

      Error Message

      Error from server (NotFound): nodes "e2e-fake-node-9r4fx" not found
      exit status 1
      

      Steps to Reproduce

      1. Run e2e tests with parallel execution enabled
      2. Wait for both tests to run concurrently
      3. Observe intermittent failure when fake node is selected by kubectl subresource test

      Evidence

      Timing from Build Log

      Both tests started and completed within the same second:

      • started: 0/237/639 "[sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance]"
      • started: 0/239/639 "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type"
      • passed: (1.7s) 2025-11-21T07:35:09 "[sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance]"
      • failed: (1.8s) 2025-11-21T07:35:09 "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type"

      Test Code Locations

      Node Lifecycle test (creates fake node):
      vendor/k8s.io/kubernetes/test/e2e/node/node_lifecycle.go:61

      fakeNode := v1.Node{
          ObjectMeta: metav1.ObjectMeta{
              Name: "e2e-fake-node-" + utilrand.String(5),
          },
          Spec: v1.NodeSpec{
              Unschedulable: true,
          },
          ...
      }
      

      kubectl subresource test (picks first node):
      vendor/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:2141-2144

      nodes, err := c.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
      framework.ExpectNoError(err)
      gomega.Expect(nodes.Items).ToNot(gomega.BeEmpty())
      node := nodes.Items[0]  // Gets first node - could be fake node!
      

      Cluster Operator Warning

      At 2025-11-21T07:35:09.123, kube-apiserver cluster operator logged:

      clusteroperator/kube-apiserver condition/Upgradeable reason/KubeletMinorVersion_KubeletVersionUnknown
      KubeletMinorVersionUpgradeable: Unable to determine the kubelet version on node e2e-fake-node-9r4fx: Version string empty
      

      This confirms the fake node existed in the cluster but had no real kubelet.

      Impact

      • Test flakes intermittently when running in parallel test execution
      • Causes CI job failures despite no actual product issues
      • Affects OpenShift 4.21 nightly upgrade testing

      Affected Versions

      • 4.21 (observed)
      • Likely affects all versions with parallel e2e test execution

      Recommendations

      The kubectl subresource test should be more defensive when selecting a node:

      1. Filter out nodes with name pattern e2e-fake-node-* from the list
      2. Select a node that is actually Ready and has a kubelet running
      3. Add retry logic if the selected node disappears between list and get operations

      Example fix in vendor/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:

      nodes, err := c.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
      framework.ExpectNoError(err)
      gomega.Expect(nodes.Items).ToNot(gomega.BeEmpty())
      
      // Filter out fake nodes created by other tests
      var realNode *v1.Node
      for i := range nodes.Items {
          if !strings.HasPrefix(nodes.Items[i].Name, "e2e-fake-node-") {
              realNode = &nodes.Items[i]
              break
          }
      }
      gomega.Expect(realNode).ToNot(gomega.BeNil(), "No real nodes found in cluster")
      

      Additional Information

      Filed by: dgoodwin@redhat.com

              aguclu@redhat.com Arda Guclu
              openshift-trt OpenShift Technical Release Team
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: