-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
Upstream
-
3
-
False
-
-
False
-
-
[3100582369] Upstream Reporter: Dusty Mabe
Upstream issue status: Closed
Upstream description:
We use multi-arch builders running FCOS to build the FCOS artifacts we ship.
We recently noticed a regression in performance when compressing all the different disk images we have built. It seems this regression was introduced in the `42.20250427.2.0` -> `42.20250512.2.0` transition. The regression still occurs in the newly released `42.20250526.2.0`.
The performance is pretty bad taking 30 minutes to compress a single qemu qcow2:
```
[2025-05-29T12:44:09.396Z] 2025-05-29 12:44:09,288 INFO - Running command: ['xz', '-c9', '-T64', '/home/jenkins/agent/workspace/build-arch/builds/42.20250529.20.0/aarch64/fedora-coreos-42.20250529.20.0-qemu.aarch64.qcow2']
[2025-05-29T13:14:01.459Z] Compressed: fedora-coreos-42.20250529.20.0-qemu.aarch64.qcow2.xz
```The package changes in the original `42.20250427.2.0` -> `42.20250512.2.0` transition were:
```
ostree diff commit from: 44ca61fe51cc777383a9542bc316b350eea16961fe57b04901f9bdf0d29c7c5f 10:38:37 [2/4835]
ostree diff commit to: 90531dd4705453f2e4af9f2b15810fb3a65e487003bbee6b77ebab7338834ed0
Upgraded:
alternatives 1.32-1.fc42 -> 1.33-1.fc42
bind-libs 32:9.18.35-2.fc42 -> 32:9.18.36-1.fc42
bind-utils 32:9.18.35-2.fc42 -> 32:9.18.36-1.fc42
bootupd 0.2.26-3.fc42 -> 0.2.27-2.fc42
container-selinux 4:2.236.0-1.fc42 -> 4:2.237.0-1.fc42
elfutils-default-yama-scope 0.192-9.fc42 -> 0.193-2.fc42
elfutils-libelf 0.192-9.fc42 -> 0.193-2.fc42
elfutils-libs 0.192-9.fc42 -> 0.193-2.fc42
filesystem 3.18-36.fc42 -> 3.18-42.fc42
fwupd 2.0.8-1.fc42 -> 2.0.9-1.fc42
glibc 2.41-3.fc42 -> 2.41-5.fc42
glibc-common 2.41-3.fc42 -> 2.41-5.fc42
glibc-gconv-extra 2.41-3.fc42 -> 2.41-5.fc42
glibc-minimal-langpack 2.41-3.fc42 -> 2.41-5.fc42
hwdata 0.394-1.fc42 -> 0.395-1.fc42
iptables-legacy 1.8.11-5.fc42 -> 1.8.11-7.fc42
iptables-legacy-libs 1.8.11-5.fc42 -> 1.8.11-7.fc42
iptables-libs 1.8.11-5.fc42 -> 1.8.11-7.fc42
iptables-nft 1.8.11-5.fc42 -> 1.8.11-7.fc42
iptables-services 1.8.11-5.fc42 -> 1.8.11-7.fc42
iptables-utils 1.8.11-5.fc42 -> 1.8.11-7.fc42
iscsi-initiator-utils 6.2.1.10-0.gitd0f04ae.fc42.3 -> 6.2.1.11-0.git4b3e853.fc42
iscsi-initiator-utils-iscsiuio 6.2.1.10-0.gitd0f04ae.fc42.3 -> 6.2.1.11-0.git4b3e853.fc42
kernel 6.14.3-300.fc42 -> 6.14.5-300.fc42
kernel-core 6.14.3-300.fc42 -> 6.14.5-300.fc42
kernel-modules 6.14.3-300.fc42 -> 6.14.5-300.fc42
kernel-modules-core 6.14.3-300.fc42 -> 6.14.5-300.fc42
libatomic 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
libgcc 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
libnfsidmap 1:2.8.2-1.rc8.fc42 -> 1:2.8.3-1.rc1.fc42
libstdc++ 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
nfs-utils-coreos 1:2.8.2-1.rc8.fc42 -> 1:2.8.3-1.rc1.fc42
passim-libs 0.1.9-1.fc42 -> 0.1.10-1.fc42
passt 0^20250415.g2340bbf-1.fc42 -> 0^20250507.geea8a76-1.fc42
passt-selinux 0^20250415.g2340bbf-1.fc42 -> 0^20250507.geea8a76-1.fc42
rpcbind 1.2.7-1.rc1.fc42.4 -> 1.2.7-2.rc1.fc42
runc 2:1.2.5-1.fc42 -> 2:1.3.0-1.fc42
selinux-policy 41.38-1.fc42 -> 41.39-1.fc42
selinux-policy-targeted 41.38-1.fc42 -> 41.39-1.fc42
systemd 257.5-2.fc42 -> 257.5-6.fc42
systemd-container 257.5-2.fc42 -> 257.5-6.fc42
systemd-libs 257.5-2.fc42 -> 257.5-6.fc42
systemd-pam 257.5-2.fc42 -> 257.5-6.fc42
systemd-resolved 257.5-2.fc42 -> 257.5-6.fc42
systemd-shared 257.5-2.fc42 -> 257.5-6.fc42
systemd-sysusers 257.5-2.fc42 -> 257.5-6.fc42
systemd-udev 257.5-2.fc42 -> 257.5-6.fc42
zincati 0.0.30-2.fc42 -> 0.0.30-3.fc42
```We need to try to trace down this problem. The likeliest way is to spin up new AWS `m6g.metal` nodes with various versions of CoreOS to try to reproduce there. Note this is a bare metal instance type that can take 10 minutes before being able to SSH into the box.
Steps to investigate this problem:
- Create a reproducer for the problem (a test):
- Reproduce this problem with `42.20250512.2.0`.
- Observe that, when compared to an instance booted with `42.20250427.2.0` compressing something trivial like `/usr/lib/dracut/modules.d/30ignition/ignition` takes much longer on `42.20250512.2.0`.
- Test with latest `rawhide` and see if the reprodcer passes or fails. i.e. is this fixed in newer kernels?
- Based on the answer of if it's already fixed upstream: If it's not fixed upstream:
- Trace the `testing-devel` stream to find when the problem was introduced so we can narrow down the software that was changed.
Ultimately we may have to do a kernel bisect.
- links to