-
Bug
-
Resolution: Unresolved
-
Normal
-
rhel-9.4
-
None
-
No
-
Moderate
-
ZStream
-
rhel-sst-container-tools
-
3
-
False
-
-
None
-
None
-
Approved Blocker
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
I am using ansible's podman_container module to deploy kepler container which is storing some data inside /run/crun/<container_id>/tmpmount. Upon stopping the container it reports failure about not being able to remove it because of "tmpmount" being not empty
[root@compute-0 ~]# podman ps | grep kepler acb107052ec8 registry.redhat.io/openshift-power-monitoring/kepler-rhel9:v0.7.10 -v=2 7 minutes ago Up 6 minutes (healthy) 0.0.0.0:8888->8888/tcp kepler [root@compute-0 ~]# ls /run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040/tmpmount/ cgroup.controllers cgroup.subtree_control cpuset.mems.effective machine.slice misc.current sys-kernel-tracing.mount cgroup.max.depth cgroup.threads dev-hugepages.mount memory.numa_stat proc-sys-fs-binfmt_misc.mount system.slice cgroup.max.descendants cpu.stat dev-mqueue.mount memory.reclaim sys-fs-fuse-connections.mount user.slice cgroup.procs cpuset.cpus.effective init.scope memory.stat sys-kernel-config.mount cgroup.stat cpuset.cpus.isolated io.stat misc.capacity sys-kernel-debug.mount [root@compute-0 ~]# podman stop kepler 2024-12-09T12:45:06.067289Z: cannot rm state directory `/run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040`: Directory not empty Error: removing container acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040 from runtime: `/usr/bin/crun delete --force acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040` failed: exit status 1
Same happens with restart as well
[root@compute-0 ~]# podman restart kepler 2024-12-10T12:03:39.751426Z: cannot rm state directory `/run/crun/ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c`: Directory not empty Error: removing container ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c from runtime: `/usr/bin/crun delete --force ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c` failed: exit status 1
However, in centos9 with podman-5.3.1-1.el9.x86_64 this issue doesn't surface. If it's helpful here are the parameters that were passed while creating this container
{ "image": "{{ edpm_telemetry_kepler_image }}", "privileged": "true", "restart": "always", "ports": ["8888:8888"], "command": "-v=2", "net": "host", "environment": { "ENABLE_GPU": "true", "EXPOSE_CONTAINER_METRICS": "true", "ENABLE_PROCESS_METRICS": "true", "EXPOSE_VM_METRICS": "true", "EXPOSE_ESTIMATED_IDLE_POWER_METRICS": "false", "LIBVIRT_METADATA_URI": "http://openstack.org/xmlns/libvirt/nova/1.1" }, {% if edpm_telemetry_power_monitoring_healthcheck %} "healthcheck": { "test": "/openstack/healthcheck kepler", "mount": "/var/lib/openstack/healthchecks/kepler" }, {% endif %} "volumes": [ "/lib/modules:/lib/modules:ro", "/run/libvirt:/run/libvirt:shared,ro", "/sys/:/sys:ro", "/proc:/proc" ] }
What is the impact of this issue to you?
Due to this, restarting kepler podman container in openstack compute nodes is not possible as it puts the container into an unrecoverable state. Deleting and recreating is the only option.
Please provide the package NVR for which the bug is seen:
podman-4.9.4-16.el9_4.x86_64
the easiest would be to move to a new crun version (I suggest 1.16 in this case), not hurry to the last release that has other changes.
if moving to 1.16 is not possible, we can look at a backport and in this case I also suggest the "partial" fix we have in 1.16.