Uploaded image for project: 'CoreOS OCP'
  1. CoreOS OCP
  2. COS-3342

[coreos/fedora-coreos-tracker] performance regression on aarch64 starting with 42.20250512.X.0

XMLWordPrintable

    • Upstream
    • 3
    • False
    • Hide

      None

      Show
      None
    • False

      [3100582369] Upstream Reporter: Dusty Mabe
      Upstream issue status: Closed
      Upstream description:

      We use multi-arch builders running FCOS to build the FCOS artifacts we ship.

      We recently noticed a regression in performance when compressing all the different disk images we have built. It seems this regression was introduced in the `42.20250427.2.0` -> `42.20250512.2.0` transition. The regression still occurs in the newly released `42.20250526.2.0`.

      The performance is pretty bad taking 30 minutes to compress a single qemu qcow2:

      ```
      [2025-05-29T12:44:09.396Z] 2025-05-29 12:44:09,288 INFO - Running command: ['xz', '-c9', '-T64', '/home/jenkins/agent/workspace/build-arch/builds/42.20250529.20.0/aarch64/fedora-coreos-42.20250529.20.0-qemu.aarch64.qcow2']
      [2025-05-29T13:14:01.459Z] Compressed: fedora-coreos-42.20250529.20.0-qemu.aarch64.qcow2.xz
      ```

      The package changes in the original `42.20250427.2.0` -> `42.20250512.2.0` transition were:

      ```
      ostree diff commit from: 44ca61fe51cc777383a9542bc316b350eea16961fe57b04901f9bdf0d29c7c5f 10:38:37 [2/4835]
      ostree diff commit to: 90531dd4705453f2e4af9f2b15810fb3a65e487003bbee6b77ebab7338834ed0
      Upgraded:
      alternatives 1.32-1.fc42 -> 1.33-1.fc42
      bind-libs 32:9.18.35-2.fc42 -> 32:9.18.36-1.fc42
      bind-utils 32:9.18.35-2.fc42 -> 32:9.18.36-1.fc42
      bootupd 0.2.26-3.fc42 -> 0.2.27-2.fc42
      container-selinux 4:2.236.0-1.fc42 -> 4:2.237.0-1.fc42
      elfutils-default-yama-scope 0.192-9.fc42 -> 0.193-2.fc42
      elfutils-libelf 0.192-9.fc42 -> 0.193-2.fc42
      elfutils-libs 0.192-9.fc42 -> 0.193-2.fc42
      filesystem 3.18-36.fc42 -> 3.18-42.fc42
      fwupd 2.0.8-1.fc42 -> 2.0.9-1.fc42
      glibc 2.41-3.fc42 -> 2.41-5.fc42
      glibc-common 2.41-3.fc42 -> 2.41-5.fc42
      glibc-gconv-extra 2.41-3.fc42 -> 2.41-5.fc42
      glibc-minimal-langpack 2.41-3.fc42 -> 2.41-5.fc42
      hwdata 0.394-1.fc42 -> 0.395-1.fc42
      iptables-legacy 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iptables-legacy-libs 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iptables-libs 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iptables-nft 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iptables-services 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iptables-utils 1.8.11-5.fc42 -> 1.8.11-7.fc42
      iscsi-initiator-utils 6.2.1.10-0.gitd0f04ae.fc42.3 -> 6.2.1.11-0.git4b3e853.fc42
      iscsi-initiator-utils-iscsiuio 6.2.1.10-0.gitd0f04ae.fc42.3 -> 6.2.1.11-0.git4b3e853.fc42
      kernel 6.14.3-300.fc42 -> 6.14.5-300.fc42
      kernel-core 6.14.3-300.fc42 -> 6.14.5-300.fc42
      kernel-modules 6.14.3-300.fc42 -> 6.14.5-300.fc42
      kernel-modules-core 6.14.3-300.fc42 -> 6.14.5-300.fc42
      libatomic 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
      libgcc 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
      libnfsidmap 1:2.8.2-1.rc8.fc42 -> 1:2.8.3-1.rc1.fc42
      libstdc++ 15.0.1-0.11.fc42 -> 15.1.1-1.fc42
      nfs-utils-coreos 1:2.8.2-1.rc8.fc42 -> 1:2.8.3-1.rc1.fc42
      passim-libs 0.1.9-1.fc42 -> 0.1.10-1.fc42
      passt 0^20250415.g2340bbf-1.fc42 -> 0^20250507.geea8a76-1.fc42
      passt-selinux 0^20250415.g2340bbf-1.fc42 -> 0^20250507.geea8a76-1.fc42
      rpcbind 1.2.7-1.rc1.fc42.4 -> 1.2.7-2.rc1.fc42
      runc 2:1.2.5-1.fc42 -> 2:1.3.0-1.fc42
      selinux-policy 41.38-1.fc42 -> 41.39-1.fc42
      selinux-policy-targeted 41.38-1.fc42 -> 41.39-1.fc42
      systemd 257.5-2.fc42 -> 257.5-6.fc42
      systemd-container 257.5-2.fc42 -> 257.5-6.fc42
      systemd-libs 257.5-2.fc42 -> 257.5-6.fc42
      systemd-pam 257.5-2.fc42 -> 257.5-6.fc42
      systemd-resolved 257.5-2.fc42 -> 257.5-6.fc42
      systemd-shared 257.5-2.fc42 -> 257.5-6.fc42
      systemd-sysusers 257.5-2.fc42 -> 257.5-6.fc42
      systemd-udev 257.5-2.fc42 -> 257.5-6.fc42
      zincati 0.0.30-2.fc42 -> 0.0.30-3.fc42
      ```

      We need to try to trace down this problem. The likeliest way is to spin up new AWS `m6g.metal` nodes with various versions of CoreOS to try to reproduce there. Note this is a bare metal instance type that can take 10 minutes before being able to SSH into the box.

      Steps to investigate this problem:

      • Create a reproducer for the problem (a test):
      • Reproduce this problem with `42.20250512.2.0`.
      • Observe that, when compared to an instance booted with `42.20250427.2.0` compressing something trivial like `/usr/lib/dracut/modules.d/30ignition/ignition` takes much longer on `42.20250512.2.0`.
      • Test with latest `rawhide` and see if the reprodcer passes or fails. i.e. is this fixed in newer kernels?
      • Based on the answer of if it's already fixed upstream: If it's not fixed upstream:
      • Trace the `testing-devel` stream to find when the problem was introduced so we can narrow down the software that was changed.

      Ultimately we may have to do a kernel bisect.

              Unassigned Unassigned
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: