[OCPBUGS-32934] Daemon set test failing occassionally [4.16] - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: 4.16.0
Affects Version/s: 4.16
Component/s: openshift-controller-manager / apps
Labels:
- component-regression

Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Component Readiness has found a potential regression in [sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate [Suite:openshift/conformance/serial] [Suite:k8s].

Probability of significant regression: 98.27%

Sample (being evaluated) Release: 4.16
Start Time: 2024-04-18T00:00:00Z
End Time: 2024-04-24T23:59:59Z
Success Rate: 90.00%
Successes: 27
Failures: 3
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 83
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=openshift-controller-manager%20%2F%20apps&confidence=95&environment=sdn%20no-upgrade%20amd64%20aws%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&pity=5&platform=aws&sampleEndTime=2024-04-24%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-18%2000%3A00%3A00&testId=openshift-tests%3Aee8012d039dcb8b357bb3ddb513b54dd&testName=%5Bsig-apps%5D%20Daemon%20set%20%5BSerial%5D%20should%20surge%20pods%20onto%20nodes%20when%20spec%20was%20updated%20and%20update%20strategy%20is%20RollingUpdate%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D&upgrade=no-upgrade&variant=serial

Looking at this job as an example, test failed with this error message:

invariants were violated during daemonset update: An old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s { s: "invariants were violated during daemonset update:\nAn old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s", }

The last query from log blob:https://prow.ci.openshift.org/bbb02cd3-e004-48be-aef8-8ce6ef7acc47 shows the conflicting pods at 8:46:47.812:

{{ Apr 23 18:46:47.812: INFO: Node Version Name UID Deleted Ready
Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 1 daemon-set-42bhb f44d840a-4430-4666-addd-cc3fae7a1e8a false true
Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 2 daemon-set-c5p92 02f9a3ba-cc0c-4954-bc29-65b4b954b93b false true
Apr 23 18:46:47.812: INFO: ip-10-0-28-200.us-west-1.compute.internal 1 daemon-set-7p7hs 4ba591c7-623a-4397-b99d-3b2616b5a787 false true
Apr 23 18:46:47.812: INFO: ip-10-0-95-178.us-west-1.compute.internal 2 daemon-set-chhhl 53d07493-c0f4-46f1-9365-f67be6ac993b false true
}}

Yet if you look at journal from ip-10-0-22-60.us-west-1.compute.internal, it starts deleting pod daemon-set-42bhb at 18:46:48.140434

Apr 23 18:46:48.140434 ip-10-0-22-60 kubenswrapper[1432]: I0423 18:46:48.140408 1432 kubelet.go:2445] "SyncLoop DELETE" source="api" pods=["e2e-daemonsets-9316/daemon-set-42bhb"]

Should the test be waiting longer, or there is a legit problem with the delay?

blocks

OCPBUGS-35331 Daemon set test failing occassionally [4.15]

Closed

is cloned by

OCPBUGS-33582 Daemon set test failing occassionally

Closed

OCPBUGS-35331 Daemon set test failing occassionally [4.15]

Closed

links to

e2e: DaemonSet maxSurge test should account for terminated pods that are terminated by the test

kubernetes/kubernetes/issues/121823

openshift/kubernetes#1966: OCPBUGS-32934: UPSTREAM: 124795: e2e: DaemonSet maxSurge test should account for terminated pods

(1 links to)

Assignee:: Filip Krepinsky

Reporter:: Ken Zhang

QA Contact:: Rama Kasturi Narra

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2024/04/24 7:49 PM

Updated:: 2024/06/29 5:16 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide