-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
1
-
False
-
-
False
-
rhel-container-tools
-
-
-
RUN 266
[2816256047] Upstream Reporter: GeorgFleig
Upstream issue status: Closed
Upstream description:
Issue Description
I'm running podman inside a a Virtualbox VM (Rocky Linux) on my Ubuntu host system.
7 containers are running. From time to time, when I boot up the VM again, the containers fail to start (they are running in rootless mode and started using Quadlet).
The logs indicate that something fails with setting up the netns. The containers go through a loop of restarts by systemd, until eventually another error comes up regarding IP address allocation (looks very similar to https://github.com/containers/podman/issues/18615, which is supposedly fixed in 4.8, while I am running 5.2.2).
Steps to reproduce the issue
Unfortunately I don't have any idea how to reproduce this. Podman runs without problems for a while, also VM reboots and power-offs are no issue until all of a sudden things go wrong again. I wish I could provide some precise steps.
Describe the results you received
Systemd service (generated by Quadlet) fails to start:
Jan 22 16:47:03 vbox systemd[694]: Starting traefik.service... Jan 22 16:47:04 vbox podman[1357]: 2025-01-22 16:47:04.145663341 +0000 UTC m=+0.201975522 container create e5d4df364d473fc917ee87137554c78ea69b032b8c3d81cc32c0f028823f966e (image=dockerhub.internal/traefik:3.1, name=traef> Jan 22 16:47:04 vbox podman[1357]: 2025-01-22 16:47:04.09747131 +0000 UTC m=+0.153783507 image pull 075808f3fdf72baa7b647b63631bf5fee7d143164049ebfc40a54d9f238d4b83 dockerhub.internal/traefik:3.1 Jan 22 16:47:04 vbox podman[1357]: 2025-01-22 16:47:04.278834145 +0000 UTC m=+0.335146331 container remove e5d4df364d473fc917ee87137554c78ea69b032b8c3d81cc32c0f028823f966e (image=dockerhub.internal/traefik:3.1, name=traef> Jan 22 16:47:04 vbox traefik[1357]: Error: rootless netns: create netns: open /tmp/containers-user-1002/containers/networks/rootless-netns/rootless-netns: file exists Jan 22 16:47:04 vbox systemd[694]: traefik.service: Main process exited, code=exited, status=126/n/a Jan 22 16:47:04 vbox systemd[694]: traefik.service: Failed with result 'exit-code'. Jan 22 16:47:04 vbox systemd[694]: Failed to start traefik.service. Jan 22 16:47:04 vbox systemd[694]: traefik.service: Scheduled restart job, restart counter is at 1. Jan 22 16:47:04 vbox systemd[694]: Stopped traefik.service.The file indeed exists, as well as the corresponding pid file. No process is running though with this pid:
[pulp@vbox rootless-netns]$ ls -la total 12 drwx------ 3 pulp pulp 106 Dec 16 11:23 . drwx------ 4 pulp pulp 84 Jan 23 16:50 .. -rw------- 1 pulp pulp 1 Dec 16 11:24 ref-count -rw-r--r-- 1 pulp pulp 60 Dec 16 11:23 resolv.conf -rw------- 1 pulp pulp 0 Dec 16 11:23 rootless-netns -rw------- 1 pulp pulp 6 Dec 16 11:23 rootless-netns-conn.pid drwx------ 4 pulp pulp 33 Dec 16 11:23 run [pulp@vbox rootless-netns]$ cat rootless-netns-conn.pid 16522 [pulp@vbox rootless-netns]$ ps aux | grep 16522 pulp 36794 0.0 0.0 6408 2176 pts/1 S+ 10:43 0:00 grep --color=auto 16522Then a loop of restarts with the same error follows. At some point the IP address pool is exhausted and the errors look like this:
Jan 22 16:52:11 vbox systemd[694]: Starting traefik.service... Jan 22 16:52:11 vbox podman[59059]: 2025-01-23 02:55:11.622213283 +0000 UTC m=+0.041797034 container create 9381bfa69bc1755035aa1942ed4884aeabdd4b70fa3ee48d5958daa50f147096 (image=dockerhub.internal/traefik:3.3, name=trae> Jan 22 16:52:11 vbox podman[59059]: 2025-01-23 02:55:11.670531125 +0000 UTC m=+0.090114884 container remove 9381bfa69bc1755035aa1942ed4884aeabdd4b70fa3ee48d5958daa50f147096 (image=dockerhub.internal/traefik:3.3, name=trae> Jan 22 16:52:11 vbox podman[59059]: 2025-01-23 02:55:11.607151849 +0000 UTC m=+0.026735633 image pull 88eafdd76c933a76798a389d994b4fdd6b5edb89d702aae10c4350ecaa3febb9 dockerhub.internal/traefik:3.3 Jan 22 16:52:11 vbox traefik[59059]: Error: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254 Jan 22 16:52:11 vbox systemd[694]: traefik.service: Main process exited, code=exited, status=126/n/a Jan 22 16:52:11 vbox systemd[694]: traefik.service: Failed with result 'exit-code'. Jan 22 16:52:11 vbox systemd[694]: Failed to start traefik.service. Jan 22 16:52:11 vbox systemd[694]: traefik.service: Scheduled restart job, restart counter is at 37. Jan 22 16:52:11 vbox systemd[694]: Stopped traefik.service.Which seems similar to what is described in https://github.com/containers/podman/issues/18615 and was supposedly fixed in 4.8. Yet I see the same error message with 5.2.2.
Once I delete rootless-netns and rootless-netns.pid (as mentioned in the first error) and restart the service, the container starts without problems.
Describe the results you expected
- Rootless netns parts are cleaned up properly in case they failed previously
- IP address pool is not exhausted when containers fail to start
podman info output
host: arch: amd64 buildahVersion: 1.37.5 cgroupControllers: - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.12-1.el9.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.12, commit: 5859d6167f22954414ce804d3f2ae9cf6208f929' cpuUtilization: idlePercent: 99.79 systemPercent: 0.1 userPercent: 0.11 cpus: 2 databaseBackend: sqlite distribution: distribution: rocky version: "9.5" eventLogger: journald freeLocks: 2012 hostname: vbox idMappings: gidmap: - container_id: 0 host_id: 1002 size: 1 - container_id: 1 host_id: 231072 size: 65536 uidmap: - container_id: 0 host_id: 1002 size: 1 - container_id: 1 host_id: 231072 size: 65536 kernel: 5.14.0-503.15.1.el9_5.x86_64 linkmode: dynamic logDriver: journald memFree: 5903810560 memTotal: 8057950208 networkBackend: netavark networkBackendInfo: backend: netavark dns: package: aardvark-dns-1.12.1-1.el9.x86_64 path: /usr/libexec/podman/aardvark-dns version: aardvark-dns 1.12.1 package: netavark-1.12.2-1.el9.x86_64 path: /usr/libexec/podman/netavark version: netavark 1.12.2 ociRuntime: name: crun package: crun-1.16.1-1.el9.x86_64 path: /usr/bin/crun version: |- crun version 1.16.1 commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32 rundir: /run/user/1002/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux pasta: executable: /usr/bin/pasta package: passt-0^20240806.gee36266-2.el9.x86_64 version: | pasta 0^20240806.gee36266-2.el9.x86_64 Copyright Red Hat GNU General Public License, version 2 or later <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. remoteSocket: exists: false path: /run/user/1002/podman/podman.sock rootlessNetworkCmd: pasta security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.3.1-1.el9.x86_64 version: |- slirp4netns version 1.3.1 commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236 libslirp: 4.4.0 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.2 swapFree: 2147479552 swapTotal: 2147479552 uptime: 20h 40m 45.00s (Approximately 0.83 days) variant: "" plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - registry.access.redhat.com - registry.redhat.io - docker.io store: configFile: /home/pulp/.config/containers/storage.conf containerStore: number: 7 paused: 0 running: 7 stopped: 0 graphDriverName: overlay graphOptions: {} graphRoot: /mnt/container_storage/pulp graphRootAllocated: 52517371904 graphRootUsed: 7091998720 graphStatus: Backing Filesystem: extfs Native Overlay Diff: "true" Supports d_type: "true" Supports shifting: "false" Supports volatile: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 12 runRoot: /tmp/containers-user-1002/containers transientStore: false volumePath: /mnt/container_storage/pulp/volumes version: APIVersion: 5.2.2 Built: 1731414899 BuiltTime: Tue Nov 12 12:34:59 2024 GitCommit: "" GoVersion: go1.22.7 (Red Hat 1.22.7-2.el9_5) Os: linux OsArch: linux/amd64 Version: 5.2.2Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Additional environment details
Additional information
No response
Upstream URL: https://github.com/containers/podman/issues/25144
- links to