Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28626

Stateful pods deletion taking time

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.12.z, 4.14.z
    • Node / CRI-O
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • None
    • OCPNODE Sprint 249 (Blue)
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We observed some delay in removing a stateful pod after its deletion. When we delete statefulset pod, it took 2:34min to delete.
      ~~~
      [cranuser17@master2 ~]$ date
      Wed Jan 24 09:37:36 UTC 2024
      [cranuser17@master2 ~]$ time kubectl delete pods po-cran17-oamext-0
      pod "po-cran17-oamext-0" deletedreal    2m34.064s
      user    0m0.347s
      sys     0m0.062s
      ~~~
      Kubelet logs:
      Deletion initiated at Jan 24 09:37:41 and pod removed at Jan 24 09:40:15. Observed "failed to "KillContainer"" for few containers.
      ~~~
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.402881 2020497 kubelet.go:2441] "SyncLoop DELETE" source="api" p
      ods=[cran17/po-cran17-oamext-0]
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.720960 2020497 kubelet.go:2441] "SyncLoop DELETE" source="api" p
      ods=[cran17/po-cran17-oamext-0]Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404904 2020497 kuberuntime_container.go:772] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-givaxi" containerID={Type:cri-o ID:3d88eb75190929106d71ae370ff00898c8762c71af95e314a3b14edb8ebf9442}
      Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404917 2020497 kuberuntime_container.go:772] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-tracecontroller" containerID={Type:cri-o ID:8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36}
      Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404917 2020497 kuberuntime_container.go:772] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-ipsecmgmt" containerID={Type:cri-o ID:a81a41ed021eebb421b52d1ca32217a6b8b57ff6251068ec03e0a54a930705b6}
      Jan 24 09:40:13 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:40:13.630339 2020497 event.go:307] "Event occurred" object="cran17/po-cran17-oamext-0" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FailedKillPod" message="error killing pod: [failed to \"KillContainer\" for \"ctr-cran17-givaxi\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillContainer\" for \"ctr-cran17-tracecontroller\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillContainer\" for \"ctr-cran17-ipsecmgmt\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"]"
      Jan 24 09:40:13 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:40:13.630384 2020497 kubelet.go:2038] "SyncTerminatingPod exit" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7
      Jan 24 09:40:13 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:13.630438 2020497 pod_workers.go:1300] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"ctr-cran17-givaxi\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillContainer\" for \"ctr-cran17-tracecontroller\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillContainer\" for \"ctr-cran17-ipsecmgmt\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"]" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7
      Jan 24 09:40:15 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:40:15.275130 2020497 kubelet.go:2441] "SyncLoop DELETE" source="api" pods=[cran17/po-cran17-oamext-0]
      Jan 24 09:40:15 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:40:15.280642 2020497 kubelet.go:2435] "SyncLoop REMOVE" source="api" pods=[cran17/po-cran17-oamext-0]
      ~~~
      
      Below kubelet logs for specific container:
      ~~~
      $ cat kubelet.log | grep "8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36"
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.404327 2020497 kuberuntime_container.go:738] "Killing container 
      with a grace period override" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-tracecontroller" containerID="cri-o://899
      2870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36" gracePeriod=30
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.404356 2020497 kuberuntime_container.go:742] "Killing container 
      with a grace period" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-tracecontroller" containerID="cri-o://8992870f1915
      e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36" gracePeriod=30
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.514172 2020497 manager.go:1027] Destroyed container: "/kubepods.
      slice/kubepods-burstable.slice/kubepods-burstable-podff632c79_8f50_462b_8c34_2b9bd4b052c7.slice/crio-conmon-8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36.
      scope" (aliases: [], namespace: "")
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.514200 2020497 handler.go:325] Added event &{/kubepods.slice/kub
      epods-burstable.slice/kubepods-burstable-podff632c79_8f50_462b_8c34_2b9bd4b052c7.slice/crio-conmon-8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36.scope 202
      4-01-24 09:37:41.514191992 +0000 UTC m=+644.593828427 containerDeletion {<nil>}}
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.744891 2020497 generic.go:184] "GenericPLEG" podUID=ff632c79-8f5
      0-462b-8c34-2b9bd4b052c7 containerID="8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36" oldState=running newState=exited
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.815924 2020497 generic.go:334] "Generic (PLEG): container finish
      ed" podID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerID="8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36" exitCode=0
      Jan 24 09:37:41 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:41.816340 2020497 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="cran17/po-cran17-oamext-0" event=&{ID:ff632c79-8f50-462b-8c34-2b9bd4b052c7 Type:ContainerDied Data:8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36}Jan 24 09:37:58 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: I0124 09:37:58.174099 2020497 handler.go:325] Added event &{/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podff632c79_8f50_462b_8c34_2b9bd4b052c7.slice/crio-8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36.scope/container 2024-01-24 09:37:58.174094763 +0000 UTC m=+661.253731201 containerDeletion {<nil>}}
      Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404832 2020497 remote_runtime.go:349] "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36"
      Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404871 2020497 kuberuntime_container.go:747] "Container termination failed with gracePeriod" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-tracecontroller" containerID="cri-o://8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36" gracePeriod=30
      Jan 24 09:40:11 worker26.hzdc-vrf2-10-110-11-130.ocp.hz.nsn-rdnet.net kubenswrapper[2020497]: E0124 09:40:11.404917 2020497 kuberuntime_container.go:772] "Kill container failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="cran17/po-cran17-oamext-0" podUID=ff632c79-8f50-462b-8c34-2b9bd4b052c7 containerName="ctr-cran17-tracecontroller" containerID={Type:cri-o ID:8992870f1915e558ba810245fa40e7c5b274157b08d51f33eccc2ea01cde1a36}
      ~~~
      ContainerDied received for container containerName="ctr-cran17-tracecontroller" at Jan 24 09:37:41 but still reported "StopContainer from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" for same container.
      
      
      Application is configured with ceph storage volume(pvc). Tried to run application without pvc( replacing pvc with emptyDir) to confirm if delay is due to storage. Bu issue is observed without storage as well.
          

      Version-Release number of selected component (if applicable):

          4.12, 4.14

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      Statefulset pod deletion taking 2:34min  time

      Expected results:

      Pod should be deleted within gracePeriod

      Additional info:

          

              harpatil@redhat.com Harshal Patil
              rhn-support-aksjadha Akshata Jadhav
              None
              None
              Sunil Choudhary Sunil Choudhary
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: