Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6809

Uploading large layers fails with "blob upload invalid"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Normal
    • None
    • 4.11.z
    • Image Registry
    • Moderate
    • Sprint 232
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, customers running the image registry using AWS S3 would see a "blob upload invalid" when uploading large image layers to the image registry. This update adds code to handle the pagination of the parts being uploaded, enabling customers to upload large images layers to AWS S3.
      Show
      * Previously, customers running the image registry using AWS S3 would see a "blob upload invalid" when uploading large image layers to the image registry. This update adds code to handle the pagination of the parts being uploaded, enabling customers to upload large images layers to AWS S3.
    • Bug Fix

    Description

      Description of problem:

      Customer is running machine learning (ML) tasks on OpenShift Container Platform, for which large models need to be embedded in the container image. When building a new container image with large container image layers (>=10GB) and pushing it to the internal image registry, this fails with the following error message:

      error: build error: Failed to push image: writing blob: uploading layer to https://image-registry.openshift-image-registry.svc:5000/v2/example/example-image/blobs/uploads/b305b374-af79-4dce-afe0-afe6893b0ada?_state=[..]: blob upload invalid

      In the image registry Pod we can see the following error message:

      time="2023-01-30T14:12:22.315726147Z" level=error msg="upload resumed at wrong offest: 10485760000 != 10738341637" [..]
      time="2023-01-30T14:12:22.338264863Z" level=error msg="response completed with error" err.code="blob upload invalid" err.message="blob upload invalid" [..]

      Backend storage is AWS S3. We suspect that this could be the following upstream bug: https://github.com/distribution/distribution/issues/1698

      Version-Release number of selected component (if applicable):

      Customer encountered the issue on OCP 4.11.20. We reproduced the issue on OCP 4.11.21:
      
      $  oc version
      Client Version: 4.12.0
      Kustomize Version: v4.5.7
      Server Version: 4.11.21
      Kubernetes Version: v1.24.6+5658434

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install OpenShift Container Platform cluster 4.11.21 on AWS
      2. Confirm registry storage is on AWS S3
      3. Create a new build including a 10GB file using the following command: `printf "FROM registry.fedoraproject.org/fedora:37\nRUN dd if=/dev/urandom of=/bigfile bs=1M count=10240" | oc new-build -D -`
      4. Wait for some time for the build to run

      Actual results:

      Pushing the new build fails with the following error message:
      
      error: build error: Failed to push image: writing blob: uploading layer to https://image-registry.openshift-image-registry.svc:5000/v2/example/example-image/blobs/uploads/b305b374-af79-4dce-afe0-afe6893b0ada?_state=[..]: blob upload invalid

      Expected results:

      Push of large container image layers succeeds

      Additional info:

      Attachments

        Issue Links

          Activity

            People

              fmissi Flavian Missi
              rhn-support-skrenger Simon Krenger
              xiujuan wang xiujuan wang
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: