Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-465

Quay as a cache proxy / pull-through cache for other registries

    • Quay acts a cache proxy / pull-through cache
    • Green
    • To Do
    • Quay Enterprise
    • 0% To Do, 0% In Progress, 100% Done
    • Hide

      Will be Tech Preview in 3.7

      Show
      Will be Tech Preview in 3.7

      Customer Problem: Developers require access to some upstream registries, e.g. the DockerHub library project or GitHub packages, to use open source software. Administrators and SecOps personell do not want to to allow generic access to these registries however. Some of these registries are also applying limits to unauthenticated pulls or pull using an account with a lower service tier which can become quickly exhausted in a shared environment. In other environments connection to the the upstream registry might be very slow and bandwidth constraint.

      Goal: Quay acts a cache proxy / pull-through cache for arbitrary upstream registries

      Problem:

      • as of today clusters behind Quay can only pull images which are already in Quay (either pushed directly or added via repo mirroring)
      • customers without governance requirements want to allow developers to reference / pull arbitrary images even if they're not synced into Quay yet, Quay should act as a proxy cache then
      • customers want to circumvent pull rate limitation / throttling on upstream registries (e.g. DockerHub)
      • customers want to accelerate pull performance of upstream registries (e.g. in edge deployment or remote locations)

      Why is this important: 

      • requested by several customers
      • perceived as a threshold feature
      • to achieve feature parity with the OCP internal registry
      • DockerHub introduced severe rate limiting in Nov 2020
      • developers don't want to rewrite all the pull specs in their manifest / helm chart / operators to benefit from a caching registry

      Dependencies (internal and external):

      Prioritized epics + deliverables (in scope / not in scope):

      • As a Quay user I want to be able to define an organization in Quay that acts a cache for a specific upstream registry
      • As a Quay user I want to be able to supply credentials to the upstream registry when defining a cache organization so that I can circumvent / extend possible pull-rate limits or access private repositories
      • As a Quay admin I want to be able to leverage the storage quota of an organization to limit the cache size so that backend storage consumption remains predictable by discarding images from the cache according to least recently used or pull frequency
      • As a user I can pull an image from Quay from above organization even if it doesn't exist there yet. Once I tried to pull and the image isn't locally cached yet, Quay will fetch it from explicitly defined upstream / source registries, caches it locally and stream it to the client transparently
      • As a user I want to rely on Quay's cache to stay coherent with the upstream source so that I transparently get newer images from the cache when tags have been overwritten in the upstream registry immediately or after a certain period of time (see next story)
      • As a user I want to be able to configure a staleness period in order to control when the cache checks for upstream image changes so that I can lower the amount of upstream registry dependency (e.g. in case of pull limits being reached or temporary connectivity issues)
      • As a user I want to be able to use the cache explicitly by referring to the upstream images by their original path components that are simply appended to URL of the quay cache location e.g. docker.io/library/postgres:latest -> quay.corp/cache/library/postgres:latest
      • As a user I want to be able to use the cache transparently by adjusting my registries.conf file which uses the Quay cache organization for a mirror of a specific (namespace in a) upstream registry
      • As a administrator I want to be able to select from caching an entire upstream registry (e.g. cache all of docker.io, i.e. docker.io/library/postgres:latest -> quay.corp/cache/library/postgres:latest) or just a selected namespaces (e.g. just cache docker.io/library, i.e. docker.io/library/postgres:latest -> quay.corp/docker-cache/postgres:latest) so that I can constrain access to potentially untrusted upstream registries
      • Given the sensitive nature of accessing potentially untrusted content from upstream registry all cache pulls needs to be logged (audit log)
      • As a user I want to be able to flush the cache so that I can eliminate excess storage consumption
      • Cached images are subject to image scanning via Clair
      • Robot accounts can be created in the cache organization as usual, their RBAC can be managed for all existing cached repositories at a given point in time
      • Assuming that many existing customers with strong governance requirements explicitly do not want to offer this capability but use the explicit whitelisting via repo mirroring instead the feature need to be configurable (globally on or off) via config file / app.
      • Metrics should exist to track cache activity and efficiency (hit rate, size, evictions)

      Estimate (XS, S, M, L, XL, XXL):  L

      Previous Work: 

      Open questions:

      • What happens when an upstream image is intermittently not accessible but still cached?

            [PROJQUAY-465] Quay as a cache proxy / pull-through cache for other registries

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory, and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHEA-2022:1736

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1736

            Documenting here so we have it ouside slack.

            Known limitations

            • repositories will be auto created on first pull via this feature with the visibility set to "private" (quay's default, see https://issues.redhat.com/browse/PROJQUAY-1224)
            • anonymous users will get a 401 when pulling a proxied repo for the first time. this is because our api blocks access to anonymous users for repos that don't exist, returning a 401 before the endpoint code is reached.

            Flavian Missi added a comment - Documenting here so we have it ouside slack. Known limitations repositories will be auto created on first pull via this feature with the visibility set to "private" (quay's default, see https://issues.redhat.com/browse/PROJQUAY-1224 ) anonymous users will get a 401 when pulling a proxied repo for the first time. this is because our api blocks access to anonymous users for repos that don't exist, returning a 401 before the endpoint code is reached.

            fmissi Good find. Worth keeping an eye on, when it lands.

            Daniel Messer added a comment - fmissi Good find. Worth keeping an eye on, when it lands.

            FTR - there's an open PR on the distribution-spec repo[1] covering proxying that's worth keeping an eye on.

            [1] https://github.com/opencontainers/distribution-spec/pull/66

            Flavian Missi added a comment - FTR - there's an open PR on the distribution-spec repo [1] covering proxying that's worth keeping an eye on. [1] https://github.com/opencontainers/distribution-spec/pull/66

            Daniel Messer added a comment - rhn-support-rbobek  Actually that exists already  https://issues.redhat.com/browse/RFE-1608

            Roman Bobek added a comment -

            DanielMesser, does it mean that we need to raise additional RFEs for OCP and CRI-O?

            Roman Bobek added a comment - DanielMesser , does it mean that we need to raise additional RFEs for OCP and CRI-O?

            While most of the implementation work will happen in Quay to allow this, it should also be noted that for OpenShift customers additional functionality has to be created in ImageContentSourcePolicy and/or the CRI-O runtime config. The latter currently only allows to redirect pull requests to a mirror/cache when the image has been referred to via digest. This is likely not how most developers and users operate.

            Daniel Messer added a comment - While most of the implementation work will happen in Quay to allow this, it should also be noted that for OpenShift customers additional functionality has to be created in ImageContentSourcePolicy and/or the CRI-O runtime config. The latter currently only allows to redirect pull requests to a mirror/cache when the image has been referred to via digest. This is likely not how most developers and users operate.

            Thomas Mckay (Inactive) added a comment - - edited

            Thomas Mckay (Inactive) added a comment - - edited https://docs.pulpproject.org/en/2.21/dev-guide/design/deferred-download.html https://pulp.plan.io/issues/3693  

            Since bulk operations to mirror entire namespaces is already tracked in https://issues.redhat.com/browse/PROJQUAY-213 I've changed the ticket scope and description to the second topic described in the RFE: cache proxy.

            Dirk Herrmann (Inactive) added a comment - Since bulk operations to mirror entire namespaces is already tracked in https://issues.redhat.com/browse/PROJQUAY-213 I've changed the ticket scope and description to the second topic described in the RFE: cache proxy.

            dirk.herrmann Sure! I agree.

            Ivan Bazulic added a comment - dirk.herrmann Sure! I agree.

              fmissi Flavian Missi
              rhn-support-ibazulic Ivan Bazulic
              Dongbo Yan Dongbo Yan
              Votes:
              17 Vote for this issue
              Watchers:
              38 Start watching this issue

                Created:
                Updated:
                Resolved: