-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.13, 4.14
-
Important
-
None
-
Rejected
-
False
-
-
-
Description of problem:
On an SNO DU some of the test workload pods get restarted due to failed probes when leaving the node running for multiple days.
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-11-06-054655
How reproducible:
Not constantly but the issue reproduced on 2 setups after leaving the application running for more than 2 days
Steps to Reproduce:
1. Deploy SNO with DU profile 2. Create test app from https://gitlab.cee.redhat.com/ocp-edge-qe/vdu-workload-emulator 3. Leave the setup running for 2 days 4. Check test app pods status
Actual results:
Some of the pods report restarts: [kni@registry.kni-qe-22 ~]$ oc -n test get pods NAME READY STATUS RESTARTS AGE test1-deployment-69499db98d-688q9 1/1 Running 0 3d20h test10-0 2/2 Running 0 3d20h test10-1 2/2 Running 0 3d20h test10-1-deployment-5b89b4668f-tmbp6 3/3 Running 0 3d20h test11-0 2/2 Running 0 3d20h test12-0 2/2 Running 0 3d20h test13-0 2/2 Running 0 3d20h test14-0 2/2 Running 0 3d20h test15-0 2/2 Running 0 3d20h test15-1 2/2 Running 0 3d20h test16-deployment-5b59fc6758-bcdmh 1/1 Running 0 3d20h test16-deployment-5b59fc6758-wl68b 1/1 Running 0 3d20h test17-0 2/2 Running 0 3d20h test17-1 2/2 Running 0 3d20h test17-2 2/2 Running 0 3d20h test18-deployment-7cd4558687-9zfp2 1/1 Running 0 3d20h test19-deployment-f56c5c8bc-jcd6w 2/2 Running 0 3d20h test19-deployment-f56c5c8bc-vmqht 2/2 Running 0 3d20h test2-deployment-7c8b7b7c65-2cbsq 1/1 Running 0 3d20h test21-deployment-6cf7bbc97f-rdr5c 1/1 Running 0 3d20h test22-deployment-bf6d49d7c-26hlv 2/2 Running 0 3d20h test23-deployment-8d6d46dcd-db9t6 2/2 Running 0 3d20h test23-deployment-8d6d46dcd-x7wbn 2/2 Running 0 3d20h test24-deployment-859b544f64-jjfq9 1/1 Running 1 (5h44m ago) 3d20h test25-deployment-6ffcc8cc6d-mc7wr 1/1 Running 1 (5h44m ago) 3d20h test26-deployment-58546d77d4-n4cc7 3/3 Running 1 (5h44m ago) 3d20h test27-deployment-8565cb6fff-4s2w8 1/1 Running 0 3d20h test28-deployment-555c6ccf69-9gqsr 5/5 Running 2 (5h43m ago) 3d20h test29-deployment-7d59454fb7-5thrb 2/2 Running 2 (3h13m ago) 3d20h test30-deployment-69f7b5bfcb-tvwnj 4/4 Running 0 3d20h test32-deployment-55d7f69dfb-x4ctb 5/5 Running 0 3d20h test33-deployment-5ddfb848d6-dc8t8 2/2 Running 1 (5h44m ago) 3d20h test33-deployment-5ddfb848d6-gjj88 2/2 Running 1 (5h44m ago) 3d20h test33-deployment-5ddfb848d6-xbm99 2/2 Running 1 (5h44m ago) 3d20h test34-deployment-6bc769fb95-g8shl 2/2 Running 3 (3h13m ago) 3d20h test34-deployment-6bc769fb95-kf2z4 2/2 Running 2 (3h13m ago) 3d20h test34-deployment-6bc769fb95-vxqzq 2/2 Running 2 (3h13m ago) 3d20h test35-deployment-776bcbb759-g8hhb 2/2 Running 2 (3h13m ago) 3d20h test36-deployment-745c767b74-f5bc2 2/2 Running 3 (3h13m ago) 3d20h test37-deployment-596b7ddd78-jb6hg 4/4 Running 0 3d20h test38-deployment-755d7664b8-x9dms 7/7 Running 0 3d20h test39-deployment-6c8d89c4b6-wqtfx 3/3 Running 0 3d20h test4-deployment-7f9665cc69-qpx6k 1/1 Running 0 3d20h test40-deployment-5bb5f98549-ldt8v 1/1 Running 0 3d20h test41-deployment-694fdf6865-7642k 2/2 Running 2 (3h13m ago) 3d20h test42-0 1/1 Running 0 3d20h test43-deployment-9b7f9ffcf-h77bh 2/2 Running 2 (3h13m ago) 3d20h test44-deployment-649dccb999-pzmhg 2/2 Running 0 3d20h test45-0 2/2 Running 2 (3h13m ago) 3d20h test46-0 2/2 Running 0 3d20h test47-deployment-6ccd4cfddb-6kqmh 3/3 Running 0 3d20h test48-deployment-d7dcfd86c-fqx69 3/3 Running 0 3d20h test48-deployment-d7dcfd86c-s27zw 3/3 Running 0 3d20h test49-deployment-6645d7d66d-tx7nf 1/1 Running 2 (3h13m ago) 3d20h test5-deployment-6d759959fc-mfl9z 7/7 Running 8 (3h11m ago) 3d20h test50-deployment-86c9ddc4c4-mvjkv 1/1 Running 0 3d20h test6-0 1/1 Running 0 3d20h test7-0 2/2 Running 0 3d20h test8-0 2/2 Running 0 3d20h test9-0 2/2 Running 0 3d20h
Expected results:
Pods do not get restarted
Additional info:
Attaching must-gather and sosreport.
- clones
-
OCPBUGS-3547 On an SNO DU some of the test workload pods get restarted due to failed probes when leaving the node running for multiple days
- Closed
- links to
-
RHBA-2023:6257 OpenShift Container Platform 4.13.z bug fix update