Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-68240

[RHEL EPIC] Provide zstd:chunked compression for RHEL 9.6 - RHEL 9.6

    • [RHEL EPIC] Provide zstd:chunked compression for RHEL 10 - RHEL 10 GA
    • Hide

      The compression format "zstd:chunked" must be turned on in the containers.conf file and a complete set of regression tests run.  In addition, tests that do not have that option turned on in the container should be created.conf, but the `--compression-format` option should use `zstd:chunked`.

      The full set of regression tests should be run for at least the container.conf file to ensure there's no issues running Podman.  In addition, tests should be created or older ones modified to use the `--compression-format=zstd:chunked` within the regression test suite.

      Simple performance measurements should be made as a baseline using the regression tests without zstd:chunked being used.  At a minimum, the time for completion and, if possible, CPU, memory, network, and disk usage should also be gathered.  These same tests and numbers should be run and captured with zstd:chunked turned on within containers.conf and compared to the initial run.

       

      Further, per Ed Santiago in : https://issues.redhat.com/browse/BIFROST-349

       
      edsantiago added a comment - 2024/10/23 8:06 AM
      The above acceptance criteria are woefully incomplete. I suggest, as a start:

       - Current (20241023) Podman upstream CI passes when run against a local cache registry in which all test images have been pushed using zstd:chunked compression. Test matrix shall include overlay and vfs storage drivers.

       - Current (20241023) Podman gating tests will pass in Fedora and RHEL when run against quay.io using at least two test images that have been pushed using zstd:chunked compression.

       - Intense manual testing. Actual humans will hammer away at podman and skopeo:
         - pushing to and pulling from a variety of registries (quay, redhat, fedora, local)
         - including manifests
         - overwriting existing already-pushed-with-other-compression images
         - interrupting (^C) pushes and pulls
         - running, saving, loading, checkpoint/restoring using the above pulled images
         - anything else I haven't thought of because this is just a quick list

       - New CI tests will be devised to identify and stress failure points identified by manual testing

       - New safeguards are added to CI such that the above conditions/assumptions are confirmed: if test image compression changes, tests will fail

       

      Show
      The compression format "zstd:chunked" must be turned on in the containers.conf file and a complete set of regression tests run.  In addition, tests that do not have that option turned on in the container should be created.conf, but the `--compression-format` option should use `zstd:chunked`. The full set of regression tests should be run for at least the container.conf file to ensure there's no issues running Podman.  In addition, tests should be created or older ones modified to use the `--compression-format=zstd:chunked` within the regression test suite. Simple performance measurements should be made as a baseline using the regression tests without zstd:chunked being used.  At a minimum, the time for completion and, if possible, CPU, memory, network, and disk usage should also be gathered.  These same tests and numbers should be run and captured with zstd:chunked turned on within containers.conf and compared to the initial run.   Further , per Ed Santiago in : https://issues.redhat.com/browse/BIFROST-349   edsantiago  added a comment -  2024/10/23 8:06 AM The above acceptance criteria are woefully incomplete. I suggest, as a start:  - Current (20241023) Podman upstream CI passes when run against a local cache registry in which all test images have been pushed using zstd:chunked compression. Test matrix shall include overlay and vfs storage drivers.  - Current (20241023) Podman gating tests will pass in Fedora and RHEL when run against quay.io using at least two test images that have been pushed using zstd:chunked compression.  - Intense manual testing. Actual humans will hammer away at podman and skopeo:    - pushing to and pulling from a variety of registries (quay, redhat, fedora, local)    - including manifests    - overwriting existing already-pushed-with-other-compression images    - interrupting (^C) pushes and pulls    - running, saving, loading, checkpoint/restoring using the above pulled images    - anything else I haven't thought of because this is just a quick list  - New CI tests will be devised to identify and stress failure points identified by manual testing  - New safeguards are added to CI such that the above conditions/assumptions are confirmed: if test image compression changes, tests will fail  
    • rhel-sst-container-tools
    • 26
    • 26
    • False
    • Hide

      None

      Show
      None
    • QE ack, Dev ack

      Description

      The zstd:chunked compression is a highly critical delivery for the image mode and other teams within RHEL.  

      Based on a discussion held by the Container Tools team on November 15, 2024, it has been decided to deliver a mitigation that will always compute the traditional uncompressed digest.  This mitigation addresses the two major problems identified with zstd:chunked partial pulls over the past several weeks of study and testing:

      1. Ambiguity in image IDs.
      2. A gap in image signing could result in a security exploit. 

      This mitigation does have a cost in CPU utilization when pulling images but preserves the substantial disk space and network throughput savings that zstd:chunked provides. The Image Mode team is aware of this tradeoff and finds it acceptable. We will provide a configuration option to disable the mitigation and documentation to understand the tradeoffs, allowing customers to decide whether performance gains are worth the aforementioned serious problems they will expose themselves to. This workaround has the benefit of requiring the least engineering effort of available options and guarantees we can deliver on time for RHEL 10.0 GA in late January 2025. 

      In the Spring of 2025, after RHEL 9.6 and RHEL 10.0 have been delivered, the Container Tools team will aim to develop and deliver a superior solution that addresses the performance tradeoff. 

      Further details are contained in this Design Document.  See Option B.

              tsweeney@redhat.com Tom Sweeney
              tsweeney@redhat.com Tom Sweeney
              Container Runtime Eng Bot Container Runtime Eng Bot
              Container Runtime Bugs Bot Container Runtime Bugs Bot
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: