[RHEL-70694] podman cannot stop/restart container because /run/crun/<container_id>/tmpmount is not empty

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: rhel-9.6
Affects Version/s: rhel-9.4
Component/s: crun
Labels:
None

Regression:
No
Severity:
Moderate
Keywords:

ZStream

Pool Team:

rhel-sst-container-tools

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None
Release Blocker:
Approved Blocker
Target Backport Versions:

rhel-9.4.z

Preliminary Testing:
None
Test Coverage:
None

Experience:
Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

I am using ansible's podman_container module to deploy kepler container which is storing some data inside /run/crun/<container_id>/tmpmount. Upon stopping the container it reports failure about not being able to remove it because of "tmpmount" being not empty

[root@compute-0 ~]# podman ps | grep kepler
acb107052ec8  registry.redhat.io/openshift-power-monitoring/kepler-rhel9:v0.7.10                                        -v=2                  7 minutes ago  Up 6 minutes (healthy)  0.0.0.0:8888->8888/tcp  kepler

[root@compute-0 ~]# ls /run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040/tmpmount/
cgroup.controllers      cgroup.subtree_control  cpuset.mems.effective  machine.slice     misc.current                   sys-kernel-tracing.mount
cgroup.max.depth        cgroup.threads          dev-hugepages.mount    memory.numa_stat  proc-sys-fs-binfmt_misc.mount  system.slice
cgroup.max.descendants  cpu.stat                dev-mqueue.mount       memory.reclaim    sys-fs-fuse-connections.mount  user.slice
cgroup.procs            cpuset.cpus.effective   init.scope             memory.stat       sys-kernel-config.mount
cgroup.stat             cpuset.cpus.isolated    io.stat                misc.capacity     sys-kernel-debug.mount

[root@compute-0 ~]# podman stop kepler
2024-12-09T12:45:06.067289Z: cannot rm state directory `/run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040`: Directory not empty
Error: removing container acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040 from runtime: `/usr/bin/crun delete --force acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040` failed: exit status 1

Same happens with restart as well

[root@compute-0 ~]# podman restart kepler                               2024-12-10T12:03:39.751426Z: cannot rm state directory `/run/crun/ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c`: Directory not empty                                                                                                              
Error: removing container ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c from runtime: `/usr/bin/crun delete --force ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c` failed: exit status 1

However, in centos9 with podman-5.3.1-1.el9.x86_64 this issue doesn't surface. If it's helpful here are the parameters that were passed while creating this container

{
    "image": "{{ edpm_telemetry_kepler_image }}",
    "privileged": "true",
    "restart": "always",
    "ports": ["8888:8888"],
    "command": "-v=2",
    "net": "host",
    "environment": {
        "ENABLE_GPU": "true",
        "EXPOSE_CONTAINER_METRICS": "true",
        "ENABLE_PROCESS_METRICS": "true",
        "EXPOSE_VM_METRICS": "true",
        "EXPOSE_ESTIMATED_IDLE_POWER_METRICS": "false",
        "LIBVIRT_METADATA_URI": "http://openstack.org/xmlns/libvirt/nova/1.1"
    },
{% if edpm_telemetry_power_monitoring_healthcheck %}
    "healthcheck": {
        "test": "/openstack/healthcheck kepler",
        "mount": "/var/lib/openstack/healthchecks/kepler"
    },
{% endif %}
    "volumes": [
        "/lib/modules:/lib/modules:ro",
        "/run/libvirt:/run/libvirt:shared,ro",
        "/sys/:/sys:ro",
        "/proc:/proc"
    ]
}

What is the impact of this issue to you?

Due to this, restarting kepler podman container in openstack compute nodes is not possible as it puts the container into an unrecoverable state. Deleting and recreating is the only option.

Please provide the package NVR for which the bug is seen:

podman-4.9.4-16.el9_4.x86_64

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

links to

openstack-k8s-operators/ci-framework#2696: Add hook for upgrading crun on compute nodes

Giuseppe Scrivano added a comment - 2024/12/16 12:17 PM - edited

the easiest would be to move to a new crun version (I suggest 1.16 in this case), not hurry to the last release that has other changes.

if moving to 1.16 is not possible, we can look at a backport and in this case I also suggest the "partial" fix we have in 1.16.

Giuseppe Scrivano added a comment - 2024/12/16 12:17 PM - edited the easiest would be to move to a new crun version (I suggest 1.16 in this case), not hurry to the last release that has other changes. if moving to 1.16 is not possible, we can look at a backport and in this case I also suggest the "partial" fix we have in 1.16.

Tom Sweeney added a comment - 2024/12/13 6:44 PM

ykulkarn@redhat.com I am planning to have Giuseppe patch 1.16 with this fix and release that in the Zstream in February. We generally do not backport entire versions (i.e. replacing crun 1.16 in RHEL 9 with crun 1.19). There a lot of danger of messing up upgrades and adding changes that were unanticipated. However, we'll need to here back from Giuseppe.

Tom Sweeney added a comment - 2024/12/13 6:44 PM ykulkarn@redhat.com I am planning to have Giuseppe patch 1.16 with this fix and release that in the Zstream in February. We generally do not backport entire versions (i.e. replacing crun 1.16 in RHEL 9 with crun 1.19). There a lot of danger of messing up upgrades and adding changes that were unanticipated. However, we'll need to here back from Giuseppe.

Yadnesh Kulkarni added a comment - 2024/12/13 12:25 PM

Also, just an addition to my previous comment, in the next release cycle would we get the package with partital fix (1.16) or a much higher version including gscrivan@redhat.com's recent fix https://github.com/containers/crun/pull/1621 ?

Yadnesh Kulkarni added a comment - 2024/12/13 12:25 PM Also, just an addition to my previous comment, in the next release cycle would we get the package with partital fix (1.16) or a much higher version including gscrivan@redhat.com 's recent fix https://github.com/containers/crun/pull/1621 ?

Yadnesh Kulkarni added a comment - 2024/12/13 12:03 PM

tsweeney@redhat.com We've cooked a workaround in our playbooks so that customer's don't hit this issue atleast during deployment. Manually restarting containers will still be an issue for them. Since there's no other option we will wait till the next release cycle (Feb 25th).

Thank you all for your help.

Yadnesh Kulkarni added a comment - 2024/12/13 12:03 PM tsweeney@redhat.com We've cooked a workaround in our playbooks so that customer's don't hit this issue atleast during deployment. Manually restarting containers will still be an issue for them. Since there's no other option we will wait till the next release cycle (Feb 25th). Thank you all for your help.

Jindrich Novy added a comment - 2024/12/13 1:26 AM

tsweeney@redhat.com currently we have this in RHEL9/10:

package  branch                                    version             upstream_branch  commit
---      ---                                       ---                 ---              ---
crun     rhel-10-main                              1.19-1              ---              ---
crun     rhel-9.0.0                                1.4.4-2             ---              ---
crun     rhel-9.1.0                                1.5-1               ---              ---
crun     rhel-9.2.0                                1.8.4-1             ---              ---
crun     rhel-9.3.0                                1.8.7-1             ---              ---
crun     rhel-9.4.0                                1.14.3-2            ---              ---
crun     rhel-9.5.0                                1.16.1-1            ---              ---
crun     rhel-9-main                               1.19-2              ---              ---

Jindrich Novy added a comment - 2024/12/13 1:26 AM tsweeney@redhat.com currently we have this in RHEL9/10: package branch version upstream_branch commit --- --- --- --- --- crun rhel-10-main 1.19-1 --- --- crun rhel-9.0.0 1.4.4-2 --- --- crun rhel-9.1.0 1.5-1 --- --- crun rhel-9.2.0 1.8.4-1 --- --- crun rhel-9.3.0 1.8.7-1 --- --- crun rhel-9.4.0 1.14.3-2 --- --- crun rhel-9.5.0 1.16.1-1 --- --- crun rhel-9-main 1.19-2 --- ---

RHEL Jira bot added a comment - 2024/12/12 9:17 PM

Watson Automation set this Blocker/Exception as "Approved".

RHEL Jira bot added a comment - 2024/12/12 9:17 PM Watson Automation set this Blocker/Exception as "Approved".

Yadnesh Kulkarni added a comment - 2024/12/12 8:55 AM

Hey guys, would it possible to get an ETA as to when this fix would land in 9.4 ? Would it be anytime soon like in a week or so ?

I want to know this because I am in touch with release delivery team to get 1.16 added to rhosp repo. But if your fix can land early I can try to hold them for a while and directly get the latest crun added to it.

Jan 3 is the code freeze date for our current OpenStack release.

Yadnesh Kulkarni added a comment - 2024/12/12 8:55 AM Hey guys, would it possible to get an ETA as to when this fix would land in 9.4 ? Would it be anytime soon like in a week or so ? I want to know this because I am in touch with release delivery team to get 1.16 added to rhosp repo. But if your fix can land early I can try to hold them for a while and directly get the latest crun added to it. Jan 3 is the code freeze date for our current OpenStack release.

Giuseppe Scrivano added a comment - 2024/12/12 7:56 AM

unfortunately the fix is not in 1.19, it will be in the next release.

Giuseppe Scrivano added a comment - 2024/12/12 7:56 AM unfortunately the fix is not in 1.19, it will be in the next release.

Yadnesh Kulkarni added a comment - 2024/12/12 7:55 AM

Hello rhn-support-jnovy will this fix be backported to 9.4 ? Because that's what we are using in OpenStack compute nodes.

Yadnesh Kulkarni added a comment - 2024/12/12 7:55 AM Hello rhn-support-jnovy will this fix be backported to 9.4 ? Because that's what we are using in OpenStack compute nodes.

Jindrich Novy added a comment - 2024/12/12 7:28 AM

ykulkarn@redhat.com crun-1.19 should be in today's RHEL9.6 compose.

Jindrich Novy added a comment - 2024/12/12 7:28 AM ykulkarn@redhat.com crun-1.19 should be in today's RHEL9.6 compose.

Assignee:: Giuseppe Scrivano

Reporter:: Yadnesh Kulkarni

Developer:: Container Runtime Eng Bot

QA Contact:: Container Runtime Bugs Bot

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2024/12/10 12:17 PM

Updated:: 2025/03/07 7:00 PM

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Giuseppe Scrivano added a comment - 2024/12/16 12:17 PM, Edited by Giuseppe Scrivano - 2024/12/16 12:17 PM

Expand comment: Giuseppe Scrivano added a comment - 2024/12/16 12:17 PM, Edited by Giuseppe Scrivano - 2024/12/16 12:17 PM

Collapse comment: Tom Sweeney added a comment - 2024/12/13 6:44 PM

Expand comment: Tom Sweeney added a comment - 2024/12/13 6:44 PM

Collapse comment: Yadnesh Kulkarni added a comment - 2024/12/13 12:25 PM

Expand comment: Yadnesh Kulkarni added a comment - 2024/12/13 12:25 PM

Collapse comment: Yadnesh Kulkarni added a comment - 2024/12/13 12:03 PM

Expand comment: Yadnesh Kulkarni added a comment - 2024/12/13 12:03 PM

Collapse comment: Jindrich Novy added a comment - 2024/12/13 1:26 AM

Expand comment: Jindrich Novy added a comment - 2024/12/13 1:26 AM

Collapse comment: RHEL Jira bot added a comment - 2024/12/12 9:17 PM

Expand comment: RHEL Jira bot added a comment - 2024/12/12 9:17 PM

Collapse comment: Yadnesh Kulkarni added a comment - 2024/12/12 8:55 AM

Expand comment: Yadnesh Kulkarni added a comment - 2024/12/12 8:55 AM

Collapse comment: Giuseppe Scrivano added a comment - 2024/12/12 7:56 AM

Expand comment: Giuseppe Scrivano added a comment - 2024/12/12 7:56 AM

Collapse comment: Yadnesh Kulkarni added a comment - 2024/12/12 7:55 AM

Expand comment: Yadnesh Kulkarni added a comment - 2024/12/12 7:55 AM

Collapse comment: Jindrich Novy added a comment - 2024/12/12 7:28 AM

Expand comment: Jindrich Novy added a comment - 2024/12/12 7:28 AM

People

Dates