Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-70694

podman cannot stop/restart container because /run/crun/<container_id>/tmpmount is not empty

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • rhel-9.6
    • rhel-9.4
    • crun
    • None
    • No
    • Moderate
    • ZStream
    • rhel-sst-container-tools
    • 3
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Approved Blocker
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

       I am using ansible's podman_container module to deploy kepler container which is storing some data inside /run/crun/<container_id>/tmpmount. Upon stopping the container it reports failure about not being able to remove it because of "tmpmount"  being not empty

      [root@compute-0 ~]# podman ps | grep kepler
      acb107052ec8  registry.redhat.io/openshift-power-monitoring/kepler-rhel9:v0.7.10                                        -v=2                  7 minutes ago  Up 6 minutes (healthy)  0.0.0.0:8888->8888/tcp  kepler
      
      [root@compute-0 ~]# ls /run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040/tmpmount/
      cgroup.controllers      cgroup.subtree_control  cpuset.mems.effective  machine.slice     misc.current                   sys-kernel-tracing.mount
      cgroup.max.depth        cgroup.threads          dev-hugepages.mount    memory.numa_stat  proc-sys-fs-binfmt_misc.mount  system.slice
      cgroup.max.descendants  cpu.stat                dev-mqueue.mount       memory.reclaim    sys-fs-fuse-connections.mount  user.slice
      cgroup.procs            cpuset.cpus.effective   init.scope             memory.stat       sys-kernel-config.mount
      cgroup.stat             cpuset.cpus.isolated    io.stat                misc.capacity     sys-kernel-debug.mount
      
      [root@compute-0 ~]# podman stop kepler
      2024-12-09T12:45:06.067289Z: cannot rm state directory `/run/crun/acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040`: Directory not empty
      Error: removing container acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040 from runtime: `/usr/bin/crun delete --force acb107052ec878d975ecc40b4fad42d1faf18876143081d4fff13d32d48f8040` failed: exit status 1  

      Same happens with restart as well

      [root@compute-0 ~]# podman restart kepler                               2024-12-10T12:03:39.751426Z: cannot rm state directory `/run/crun/ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c`: Directory not empty                                                                                                              
      Error: removing container ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c from runtime: `/usr/bin/crun delete --force ab3be17325c8d70bf6feaa9be8e78f67c51107268d742e323a8a9c892505988c` failed: exit status 1 

      However, in centos9 with podman-5.3.1-1.el9.x86_64 this issue doesn't surface. If it's helpful here are the parameters that were passed while creating this container

      {
          "image": "{{ edpm_telemetry_kepler_image }}",
          "privileged": "true",
          "restart": "always",
          "ports": ["8888:8888"],
          "command": "-v=2",
          "net": "host",
          "environment": {
              "ENABLE_GPU": "true",
              "EXPOSE_CONTAINER_METRICS": "true",
              "ENABLE_PROCESS_METRICS": "true",
              "EXPOSE_VM_METRICS": "true",
              "EXPOSE_ESTIMATED_IDLE_POWER_METRICS": "false",
              "LIBVIRT_METADATA_URI": "http://openstack.org/xmlns/libvirt/nova/1.1"
          },
      {% if edpm_telemetry_power_monitoring_healthcheck %}
          "healthcheck": {
              "test": "/openstack/healthcheck kepler",
              "mount": "/var/lib/openstack/healthchecks/kepler"
          },
      {% endif %}
          "volumes": [
              "/lib/modules:/lib/modules:ro",
              "/run/libvirt:/run/libvirt:shared,ro",
              "/sys/:/sys:ro",
              "/proc:/proc"
          ]
      } 

      What is the impact of this issue to you?

      Due to this, restarting kepler podman container in openstack compute nodes is not possible as it puts the container into an unrecoverable state. Deleting and recreating is the only option.

      Please provide the package NVR for which the bug is seen:

      podman-4.9.4-16.el9_4.x86_64

      How reproducible is this bug?:

      Steps to reproduce

      1.  
      2.  
      3.  

      Expected results

      Actual results

            [RHEL-70694] podman cannot stop/restart container because /run/crun/<container_id>/tmpmount is not empty

            Giuseppe Scrivano added a comment - - edited

            the easiest would be to move to a new crun version (I suggest 1.16 in this case), not hurry to the last release that has other changes.

             

            if moving to 1.16 is not possible, we can look at a backport and in this case I also suggest the "partial" fix we have in 1.16.

            Giuseppe Scrivano added a comment - - edited the easiest would be to move to a new crun version (I suggest 1.16 in this case), not hurry to the last release that has other changes.   if moving to 1.16 is not possible, we can look at a backport and in this case I also suggest the "partial" fix we have in 1.16.

            Tom Sweeney added a comment -

            ykulkarn@redhat.com I am planning to have Giuseppe patch 1.16 with this fix and release that in the Zstream in February.  We generally do not backport entire versions (i.e. replacing crun 1.16 in RHEL 9 with crun 1.19).  There a lot of danger of messing up upgrades and adding changes that were unanticipated.  However, we'll need to here back from Giuseppe.

             

            Tom Sweeney added a comment - ykulkarn@redhat.com I am planning to have Giuseppe patch 1.16 with this fix and release that in the Zstream in February.  We generally do not backport entire versions (i.e. replacing crun 1.16 in RHEL 9 with crun 1.19).  There a lot of danger of messing up upgrades and adding changes that were unanticipated.  However, we'll need to here back from Giuseppe.  

            Also, just an addition to my previous comment, in the next release cycle would we get the package with partital fix (1.16) or a much higher version  including gscrivan@redhat.com's recent fix https://github.com/containers/crun/pull/1621 ?

            Yadnesh Kulkarni added a comment - Also, just an addition to my previous comment, in the next release cycle would we get the package with partital fix (1.16) or a much higher version  including gscrivan@redhat.com 's recent fix https://github.com/containers/crun/pull/1621 ?

            tsweeney@redhat.com We've cooked a workaround in our playbooks so that customer's don't hit this issue atleast during deployment. Manually restarting containers will still be an issue for them. Since there's no other option we will wait till the next release cycle (Feb 25th).

            Thank you all for your help.

            Yadnesh Kulkarni added a comment - tsweeney@redhat.com We've cooked a workaround in our playbooks so that customer's don't hit this issue atleast during deployment. Manually restarting containers will still be an issue for them. Since there's no other option we will wait till the next release cycle (Feb 25th). Thank you all for your help.

            tsweeney@redhat.com currently we have this in RHEL9/10:

            package  branch                                    version             upstream_branch  commit
            ---      ---                                       ---                 ---              ---
            crun     rhel-10-main                              1.19-1              ---              ---
            crun     rhel-9.0.0                                1.4.4-2             ---              ---
            crun     rhel-9.1.0                                1.5-1               ---              ---
            crun     rhel-9.2.0                                1.8.4-1             ---              ---
            crun     rhel-9.3.0                                1.8.7-1             ---              ---
            crun     rhel-9.4.0                                1.14.3-2            ---              ---
            crun     rhel-9.5.0                                1.16.1-1            ---              ---
            crun     rhel-9-main                               1.19-2              ---              ---
             

             

             

            Jindrich Novy added a comment - tsweeney@redhat.com currently we have this in RHEL9/10: package  branch                                    version             upstream_branch  commit ---      ---                                       ---                 ---              --- crun     rhel-10-main                              1.19-1              ---              --- crun     rhel-9.0.0                                1.4.4-2             ---              --- crun     rhel-9.1.0                                1.5-1               ---              --- crun     rhel-9.2.0                                1.8.4-1             ---              --- crun     rhel-9.3.0                                1.8.7-1             ---              --- crun     rhel-9.4.0                                1.14.3-2            ---              --- crun     rhel-9.5.0                                1.16.1-1            ---              --- crun     rhel-9-main                               1.19-2              ---              ---    

            Watson Automation set this Blocker/Exception as "Approved".

            RHEL Jira bot added a comment - Watson Automation set this Blocker/Exception as "Approved".

            Hey guys, would it possible to get an ETA as to when this fix would land in 9.4 ? Would it be anytime soon like in a week or so ?

            I want to know this because I am in touch with release delivery team to get 1.16 added to rhosp repo. But if your fix can land early I can try to hold them for a while and directly get the latest crun added to it.

            Jan 3 is the code freeze date for our current OpenStack release.

            Yadnesh Kulkarni added a comment - Hey guys, would it possible to get an ETA as to when this fix would land in 9.4 ? Would it be anytime soon like in a week or so ? I want to know this because I am in touch with release delivery team to get 1.16 added to rhosp repo. But if your fix can land early I can try to hold them for a while and directly get the latest crun added to it. Jan 3 is the code freeze date for our current OpenStack release.

            unfortunately the fix is not in 1.19, it will be in the next release.

            Giuseppe Scrivano added a comment - unfortunately the fix is not in 1.19, it will be in the next release.

            Hello rhn-support-jnovy will this fix be backported to 9.4 ? Because that's what we are using in OpenStack compute nodes.

            Yadnesh Kulkarni added a comment - Hello rhn-support-jnovy will this fix be backported to 9.4 ? Because that's what we are using in OpenStack compute nodes.

            ykulkarn@redhat.com crun-1.19 should be in today's RHEL9.6 compose.

            Jindrich Novy added a comment - ykulkarn@redhat.com crun-1.19 should be in today's RHEL9.6 compose.

              gscrivan@redhat.com Giuseppe Scrivano
              ykulkarn@redhat.com Yadnesh Kulkarni
              Container Runtime Eng Bot Container Runtime Eng Bot
              Container Runtime Bugs Bot Container Runtime Bugs Bot
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: