Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Containers
Labels:
- blue
- triaged

Severity:
Low
Regression:
None
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Node shutdown time varies with crun when containers start on boot vs are started later. With runc, this cannot be observed and the system will always respect all containers' terminationGracePeriodSeconds on shutdown.

—

With crun:
If I reboot the node, wait for the node to come up fully including all pods. Then I create the deployment, scale it or simply delete the pod after reboot so that it's recreated:

shutdown and more specifically the network target are blocked for 300 seconds until the process is stopped with the KILL signal

If I reboot the node and wait for all crio containers to be brought up on boot, including that very same pod, but I do not instruct the API to start or stop pods:

shutdown and more specifically the network target are not blocked; the journal and the network targets are stopped after a few very short seconds

—

With runc:
I always get:

shutdown and more specifically the network target are blocked for 300 seconds until the process is stopped with the KILL signal

—

I'm running this test on a 4.14.23 baremetal SNO node with crun. Note that I found this in a lab while testing something for a customer with a wrong configuration. That specific customer is not using crun, but runc instead - I think it's still worth investigating this issue for crun as it is easy to reproduce and yields confusing results that diverge from the node's behavior with runc.

—

Spawn the following deployment:

cat <<'EOF' | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test
  name: test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - command:
        - /bin/bash
        - "-c"
        - |
          trap -- '' SIGINT SIGTERM
          while true; do
              date
              sleep 1
          done
        image: registry.fedoraproject.org/fedora:latest
        imagePullPolicy: IfNotPresent
        name: fedora
      terminationGracePeriodSeconds: 300
EOF

The pod of this deployment will have the following attributes:

its command cannot be stopped by SIGINT or SIGTERM, hence SIGKILL is needed
wait for 300 seconds for a graceful shutdown of the process

--> When the deployment's pod is deleted, it will take 300 seconds to stop.

Test how long it takes to shut down the node:
==========

Ping the node IP:

ping <host IP>

Shutdown the node

ssh core@<host IP>
reboot

Results:

i) SSH disconnects immediately, immediate shutdown of Network target and of journal; ping stops nearly immediately;
   processes running in crio containers are stopped **after** the journal stopped (can be seens via IPMI)  (1_1.png)

ii) SSH disconnects immediately, shutdown of crio takes a few minutes to complete; shutdown of Network target, of journal; ping stops working after minutes
   a) shutdown takes less than 5 minutes, and some containers are still stopped _after_ the journal shut down
   b) shutdown takes the expected 5 minutes because the test-... pod's containers are blocking (2_b_1.png)
      even after this, we still see: Waiting for process  (2_b_2.png)

The problem is that the node shutdown times look at first ... well ... random. Sometimes, the node takes North of 5 minutes to shut down, sometimes it's way faster. As stated above, after further testing I realized that it is tied to crun, and that it has to do with when the pod is started: if it's started right during the system startup phase, then the network will be shut down nearly immediately after running the `reboot` command. However, with crun, when I start a pod with terminationGracePeriodSeconds some time after the node booted, the shutdown will be delayed until my pod exits.

In the logs, look for the time between:

Jul 11 17:36:41 sno10.workload.bos2.lab systemd-logind[2242]: System is rebooting.
(...)

and the actual end of the log. Provided data is for the exact same system, just on 2 consecutive boots, and the only difference is if my pod is started automatically on boot, or if I start it via the API at some later point.

Assignee:: Kirill Kolyshkin

Reporter:: Andreas Karis

QA Contact:: Sunil Choudhary

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2024/07/11 5:54 PM

Updated:: 2024/09/20 6:17 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates