Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59933

Get manifest and metadata caching does not support named tagged images

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 4.19.z
    • 4.17, 4.18, 4.19
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      Before this update, calls to retrieve an image manifest and metadata using a tagged image name did not cache the result of the lookup. As a consequence, memory usage of Hypershift quickly grew, which created performance issues. With this release, images in Hypershift using a named tag or canonical name are cached for 12 hours. As a result, memory usage is optimized in Hypershift. (link:https://issues.redhat.com/browse/OCPBUGS-59933[OCPBUGS-59933])
      Show
      Before this update, calls to retrieve an image manifest and metadata using a tagged image name did not cache the result of the lookup. As a consequence, memory usage of Hypershift quickly grew, which created performance issues. With this release, images in Hypershift using a named tag or canonical name are cached for 12 hours. As a result, memory usage is optimized in Hypershift. (link: https://issues.redhat.com/browse/OCPBUGS-59933 [ OCPBUGS-59933 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-53265. The following is the description of the original issue:

      Description of problem:

      As described in [Hypershift issues at scale - OCPBUGS-52256|https://issues.redhat.com/browse/OCPBUGS-52256], the memory usage of hypershift quickly grows causing issues. 
      
      One persistent error that shows up in the logs is:
      
      {"level":"error","ts":"2025-03-17T14:14:17Z","msg":"Reconciler error","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"cluster1","namespace":"control"},"namespace":"control","name":"cluster1","reconcileID":"e785ff72-6253-48bc-ba86-8bc63da26c10","error":"failed to determine if release image multi-arch: failed to retrieve manifest us.icr.io/am/ocp-release:4.18.1-x86_64: toomanyrequests","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}
      
      This leads to a steady memory leak which is caused in part by the lookups when the image reference is a named tag (for example, us.icr.io/am/ocp-release:4.18.1-x86_64) instead of a canonical name. Since the cache always uses the image reference ID (canonical name), the cache check will fail when the image reference has a named tag.
      
      This is true for:
      * GetManifest - https://github.com/openshift/hypershift/blob/00164ec3e6df7c140283872bb6fe6d524f76cc75/support/util/imagemetadata.go#L221
      * ImageMetadata - https://github.com/openshift/hypershift/blob/00164ec3e6df7c140283872bb6fe6d524f76cc75/support/util/imagemetadata.go#L72
      
      This appears to have been introduced in: https://github.com/openshift/hypershift/commit/1af27dbff1cc063c8bb29ac4c6543eb466d1ebe6

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              rcradick Ryan Cradick
              rcradick Ryan Cradick
              None
              None
              XiuJuan Wang XiuJuan Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: