-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.16
-
None
-
Proposed
-
False
-
Component Readiness has found a potential regression in [sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate [Suite:openshift/conformance/serial] [Suite:k8s].
Probability of significant regression: 98.27%
Sample (being evaluated) Release: 4.16
Start Time: 2024-04-18T00:00:00Z
End Time: 2024-04-24T23:59:59Z
Success Rate: 90.00%
Successes: 27
Failures: 3
Flakes: 0
Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 83
Failures: 0
Flakes: 0
Looking at this job as an example, test failed with this error message:
invariants were violated during daemonset update: An old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s { s: "invariants were violated during daemonset update:\nAn old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s", }
The last query from log blob:https://prow.ci.openshift.org/bbb02cd3-e004-48be-aef8-8ce6ef7acc47 shows the conflicting pods at 8:46:47.812:
{{ Apr 23 18:46:47.812: INFO: Node Version Name UID Deleted Ready
Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 1 daemon-set-42bhb f44d840a-4430-4666-addd-cc3fae7a1e8a false true
Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 2 daemon-set-c5p92 02f9a3ba-cc0c-4954-bc29-65b4b954b93b false true
Apr 23 18:46:47.812: INFO: ip-10-0-28-200.us-west-1.compute.internal 1 daemon-set-7p7hs 4ba591c7-623a-4397-b99d-3b2616b5a787 false true
Apr 23 18:46:47.812: INFO: ip-10-0-95-178.us-west-1.compute.internal 2 daemon-set-chhhl 53d07493-c0f4-46f1-9365-f67be6ac993b false true
}}
Yet if you look at journal from ip-10-0-22-60.us-west-1.compute.internal, it starts deleting pod daemon-set-42bhb at 18:46:48.140434
Apr 23 18:46:48.140434 ip-10-0-22-60 kubenswrapper[1432]: I0423 18:46:48.140408 1432 kubelet.go:2445] "SyncLoop DELETE" source="api" pods=["e2e-daemonsets-9316/daemon-set-42bhb"]
Should the test be waiting longer, or there is a legit problem with the delay?
- clones
-
OCPBUGS-32934 Daemon set test failing occassionally [4.16]
- ASSIGNED