-
Story
-
Resolution: Won't Do
-
Major
-
None
-
None
-
None
-
Improvement
-
3
-
False
-
False
-
-
SDN Sprint 212, SDN Sprint 211, SDN Sprint 214
-
0
-
0.000
there are 4 containers on our openshift master nodes that do not respond to a
SIGTERM when the node is scheduled to reboot on an upgrade. The host eventually
times out after 30 seconds after the SIGTERM and issued a SIGKILL which then
terminates the container and the node can continue it's reboot.
this ticket is for the agnhost container with relevant logs below.
the ovndbchecker container has already been fixed with this commit as an
example.
❯ rg 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25 journal 13802:Nov 17 10:54:35.437143 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: Started crio-conmon-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope. 13803:Nov 17 10:54:35.502725 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: run-runc-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25-runc.C48ZnV.mount: Succeeded. 13804:Nov 17 10:54:35.507217 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: Started libcontainer container 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25. 13806:Nov 17 10:54:35.648448 ci-op-n70c47rd-82914-54mxf-master-0 crio[1900]: time="2021-11-17 10:54:35.648363913Z" level=info msg="Created container 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25: e2e-k8s-sig-apps-daemonset-upgrade-9253/ds1-l6j6f/ds1" id=12ce2a57-cc11-4bd7-bf54-c568f8422f3a name=/runtime.v1alpha2.RuntimeService/CreateContainer 13807:Nov 17 10:54:35.652309 ci-op-n70c47rd-82914-54mxf-master-0 crio[1900]: time="2021-11-17 10:54:35.650009675Z" level=info msg="Starting container: 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25" id=0ef79ce7-a089-4872-a824-c88e7064a1d7 name=/runtime.v1alpha2.RuntimeService/StartContainer 13808:Nov 17 10:54:35.723533 ci-op-n70c47rd-82914-54mxf-master-0 crio[1900]: time="2021-11-17 10:54:35.723453845Z" level=info msg="Started container" PID=92281 containerID=622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25 description=e2e-k8s-sig-apps-daemonset-upgrade-9253/ds1-l6j6f/ds1 id=0ef79ce7-a089-4872-a824-c88e7064a1d7 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=93b6249b837bba6b6479c71fae1f72fd1e4b69aba7c8ad5df7cf7eff9f2dd975 13809:Nov 17 10:54:36.156269 ci-op-n70c47rd-82914-54mxf-master-0 hyperkube[1925]: I1117 10:54:36.156229 1925 kubelet.go:2114] "SyncLoop (PLEG): event for pod" pod="e2e-k8s-sig-apps-daemonset-upgrade-9253/ds1-l6j6f" event=&{ID:b4bb63ec-1b81-4f37-a92b-1428f37da348 Type:ContainerStarted Data:622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25} 29893:Nov 17 11:39:49.041298 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: Stopping libcontainer container 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25. 30280:Nov 17 11:40:19.151721 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: *Stopping timed out. Killing*. 30281:Nov 17 11:40:19.151940 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: Killing process 92281 (agnhost) with signal SIGKILL. 30289:Nov 17 11:40:19.180974 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-conmon-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: Succeeded. 30290:Nov 17 11:40:19.182091 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-conmon-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: Consumed 60ms CPU time 30299:Nov 17 11:40:19.205993 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: Failed with result 'timeout'. 30300:Nov 17 11:40:19.207206 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: Stopped libcontainer container 622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25. 30301:Nov 17 11:40:19.218412 ci-op-n70c47rd-82914-54mxf-master-0 systemd[1]: crio-622f9a946738f495201087c8ebf0f03ff704219d58c1a17db2a5e7673bcb7a25.scope: Consumed 195ms CPU time
the above log came from this journal file produced from this job.
- is cloned by
-
SDN-2505 fix webhook container from ignoring SIGTERM on node reboot
-
- Closed
-