OMR, Quay running with 1 gunicorn-registry worker backed with SQLite database. When mirroring images and pulling from the same registry, Quay will at different times error out completely due to gunicorn-registry worker being overwhelmed. When this happens, a SIGKILL will be sent to the worker and nginx will either read it as a gateway timeout or a 413 Request entity too large. Examples:
502 gateway timeout:
nginx stdout | 2025/07/04 13:21:00 [error] 111#0: *1469 upstream prematurely closed connection while reading response header from upstream, client: 172.24.10.50, server: _, request: "PUT /v2/rhel/modh/vllm/manifests/sha256-7e1d1985b0dd2b5ba2df41fc9c8c3edf13a2d9ed8a4d84db8f00eb6c753bc5c5 HTTP/1.1", upstream: "http://unix:/tmp/gunicorn_registry.sock:/v2/rhel/modh/vllm/manifests/sha256-7e1d1985b0dd2b5ba2df41fc9c8c3edf13a2d9ed8a4d84db8f00eb6c753bc5c5", host: "rhel.skynet:8443" gunicorn-registry stdout | 2025-07-04 13:21:00,756 [66] [ERROR] [gunicorn.error] Worker (pid:1028) was sent SIGKILL! Perhaps out of memory? ... nginx stdout | 172.24.10.50 (-) - - [04/Jul/2025:13:21:00 +0000] "PUT /v2/rhel/modh/vllm/manifests/sha256-7e1d1985b0dd2b5ba2df41fc9c8c3edf13a2d9ed8a4d84db8f00eb6c753bc5c5 HTTP/1.1" 502 287 "-" "oc-mirror" (32.877 1622 32.873 : 0.002)
413 Request time out:
nginx stdout | 2025/07/04 14:25:09 [error] 103#0: *3624 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 172.24.10.50, server: _, request: "PATCH /v2/rhel/modh/text-generation-inference/blobs/uploads/b8817c19-4edb-4e00-9fe3-89b5660d079e HTTP/1.1", upstream: "http://unix:/tmp/gunicorn_registry.sock:/v2/rhel/modh/text-generation-inference/blobs/uploads/b8817c19-4edb-4e00-9fe3-89b5660d079e", host: "rhel.skynet:8443" nginx stdout | 2025/07/04 14:25:09 [error] 103#0: *3624 client intended to send too large body: 3985836408 bytes, client: 172.24.10.50, server: _, request: "PATCH /v2/rhel/modh/text-generation-inference/blobs/uploads/b8817c19-4edb-4e00-9fe3-89b5660d079e HTTP/1.1", upstream: "http://unix:/tmp/gunicorn_registry.sock/v2/rhel/modh/text-generation-inference/blobs/uploads/b8817c19-4edb-4e00-9fe3-89b5660d079e", host: "rhel.skynet:8443" 1 "PATCH /v2/rhel/modh/text-generation-inference/blobs/uploads/b8817c19-4edb-4e00-9fe3-89b5660d079e HTTP/1.1" 413
All tests were done on a 16 core/32 GiB RAM VM on RHEL 9.6. The same issue happens on PostgreSQL backed Quay instance running again only one gunicorn-registry worker. Increasing the number of workers on PostgreSQL backed Quay resolves this problem, on SQLite this is not an option due to workers overlapping and SQLite locks the database during writes.