-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Make image pulls more resilient
-
False
-
None
-
False
-
Not Selected
-
To Do
-
rhel-sst-container-tools
-
0% To Do, 0% In Progress, 100% Done
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- When the network is unstable, crio is unable to pull images from docker.io while docker cli can. It makes OCP4 less tolerant to network outage than OCP3
Docker cli has a retry mechanism per blob which is very efficient which is not the case with crio stack.
- Upstream issue https://github.com/containers/image/issues/1145
Why is this important?
- For customers planning to migrate from OCP 3.11 to 4.x, this issue could make them worried about the cluster/applications availability when the network becomes unstable or the company proxy drops a connection.
Scenarios
- Pull with buildah --> KO
- https_proxy=<obfuscated> buildah pull docker://docker.io/image:tag
Getting image source signatures
Copying blob 04f220ee9266 done
Copying blob 034655750c88 done
Copying blob f0b757a2a0f0 done
Copying blob 89c8a77f7842 done
Copying blob 4bbcce26bc5e done
Copying blob 7b1a6ab2e44d done
Copying blob d1de5652303b done
Copying blob ef669123e59e done
Copying blob b14b1ba1d651 done
Copying blob e5cec468d3a6 done
while pulling "docker://docker.io/image:tag" as "docker.io/image:tag": Error writing blob: error storing blob to file "/var/tmp/storage487953815/6": read tcp x.x.x.x:port->x.x.x.x:3128: read: connection reset by peer
- Pull with docker --> OK
- https_proxy=<obfuscated> docker pull docker.io/image:tag
Trying to pull repository docker.io/library/image ...
10.6.5: Pulling from docker.io/library/image
7b1a6ab2e44d: Pull complete
034655750c88: Pull complete
f0b757a2a0f0: Pull complete
4bbcce26bc5e: Pull complete
04f220ee9266: Pull complete
89c8a77f7842: Pull complete
d1de5652303b: Pull complete
ef669123e59e: Pull complete
e5cec468d3a6: Pull complete
b14b1ba1d651: Pull complete
Status: Downloaded newer image for docker.io/image:tag
Docker logs show that it has to retry a blob when pulling the image
- journalctl -u docker | grep resume
dockerd-current[pid]: time="" level=debug msg="attempting to resume download of \"sha256:7b1a6ab2e44d...\" from 653653 bytes"
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: https://github.com/containers/image/pull/1816 https://github.com/containers/image/pull/1847 ; (with unit tests only, no test cases artificially triggering a pull disruption) https://github.com/cri-o/cri-o/pull/6664 / https://github.com/cri-o/cri-o/pull/6665
- DEV - Upstream documentation merged: N/A, this is automatic with no tunables at this point.
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is related to
-
RFE-2306 Make image pull with crio as resilient as with docker on unstable network
- Deferred
- links to
There are no Sub-Tasks for this issue.