-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.21
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type
Significant regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 100.00% to 96.58%.
Sample (being evaluated) Release: 4.21
Start Time: 2025-11-19T00:00:00Z
End Time: 2025-11-26T12:00:00Z
Success Rate: 96.58%
Successes: 141
Failures: 5
Flakes: 0
Base (historical) Release: 4.20
Start Time: 2025-09-21T00:00:00Z
End Time: 2025-10-21T23:59:59Z
Success Rate: 100.00%
Successes: 741
Failures: 0
Flakes: 0
View the test details report for additional context.
Summary
Test "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type" fails due to race condition with Node Lifecycle test
Description
The test [sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type fails intermittently when running in parallel with [sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance].
Root Cause
Race condition between two concurrently running tests:
- The Node Lifecycle test creates a fake node with pattern e2e-fake-node-<random>
- The kubectl subresource test lists all nodes and selects the first node from the list
- The Node Lifecycle test deletes the fake node as part of its lifecycle testing
- The kubectl subresource test attempts to GET the node but it no longer exists
Error Message
Error from server (NotFound): nodes "e2e-fake-node-9r4fx" not found
exit status 1
Steps to Reproduce
- Run e2e tests with parallel execution enabled
- Wait for both tests to run concurrently
- Observe intermittent failure when fake node is selected by kubectl subresource test
Evidence
Timing from Build Log
Both tests started and completed within the same second:
- started: 0/237/639 "[sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance]"
- started: 0/239/639 "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type"
- passed: (1.7s) 2025-11-21T07:35:09 "[sig-node] Node Lifecycle should run through the lifecycle of a node [Conformance]"
- failed: (1.8s) 2025-11-21T07:35:09 "[sig-cli] Kubectl client kubectl subresource flag GET on status subresource of built-in type (node) returns identical info as GET on the built-in type"
Test Code Locations
Node Lifecycle test (creates fake node):
vendor/k8s.io/kubernetes/test/e2e/node/node_lifecycle.go:61
fakeNode := v1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "e2e-fake-node-" + utilrand.String(5),
},
Spec: v1.NodeSpec{
Unschedulable: true,
},
...
}
kubectl subresource test (picks first node):
vendor/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:2141-2144
nodes, err := c.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
framework.ExpectNoError(err)
gomega.Expect(nodes.Items).ToNot(gomega.BeEmpty())
node := nodes.Items[0] // Gets first node - could be fake node!
Cluster Operator Warning
At 2025-11-21T07:35:09.123, kube-apiserver cluster operator logged:
clusteroperator/kube-apiserver condition/Upgradeable reason/KubeletMinorVersion_KubeletVersionUnknown KubeletMinorVersionUpgradeable: Unable to determine the kubelet version on node e2e-fake-node-9r4fx: Version string empty
This confirms the fake node existed in the cluster but had no real kubelet.
Impact
- Test flakes intermittently when running in parallel test execution
- Causes CI job failures despite no actual product issues
- Affects OpenShift 4.21 nightly upgrade testing
Affected Versions
- 4.21 (observed)
- Likely affects all versions with parallel e2e test execution
Recommendations
The kubectl subresource test should be more defensive when selecting a node:
- Filter out nodes with name pattern e2e-fake-node-* from the list
- Select a node that is actually Ready and has a kubelet running
- Add retry logic if the selected node disappears between list and get operations
Example fix in vendor/k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:
nodes, err := c.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
framework.ExpectNoError(err)
gomega.Expect(nodes.Items).ToNot(gomega.BeEmpty())
// Filter out fake nodes created by other tests
var realNode *v1.Node
for i := range nodes.Items {
if !strings.HasPrefix(nodes.Items[i].Name, "e2e-fake-node-") {
realNode = &nodes.Items[i]
break
}
}
gomega.Expect(realNode).ToNot(gomega.BeNil(), "No real nodes found in cluster")
Additional Information
- Prow Job: periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips
- Build ID: 1991735005105098752
- Job URL: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips/1991735005105098752
- Test Type: Conformance test (upstream Kubernetes)
- Platform: AWS with OVN and FIPS
Filed by: dgoodwin@redhat.com