Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: quay.io
Labels:
- triaged

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

RH OSUS (aka Cincinnati) instance periodically scrapes quay.io/openshift-release-dev/ocp-release repo to discover new OCP releases to be added into OCP upgrade graph. We are seeing a regular event on Monday mornings where the scrapes are partly timing out, partly see various 50X errors:

[2023-04-17T11:51:59Z INFO graph_builder::graph] graph update triggered
[2023-04-17T11:56:59Z ERROR graph_builder::graph] Processing all plugins with a timeout of 300s
[2023-04-17T11:56:59Z ERROR graph_builder::graph] deadline has elapsed
[2023-04-17T12:01:59Z INFO graph_builder::graph] graph update triggered
[2023-04-17T12:03:47Z ERROR graph_builder::graph] failed to fetch all release metadata from quay.io/openshift-release-dev/ocp-release
[2023-04-17T12:03:47Z ERROR graph_builder::graph] fetching manifest and manifestref for openshift-release-dev/ocp-release:4.12.0-0.nightly-multi-2022-08-08-134208: unexpected HTTP status 504 Gateway Timeout
[2023-04-17T12:08:47Z INFO graph_builder::graph] graph update triggered
[2023-04-17T12:09:45Z INFO graph_builder::graph] graph update completed, 7169 valid releases

The actual scrape is known to be quite expensive operation (quay.io/openshift-release-dev/ocp-release has over 7k tags) but is is performed periodically and steadily without any peaks on our side (it is not request-driven, it cannot be triggered externally) and we only see these disruptions in a pretty deterministic time window every week, which hints towards interfering with some kind of batch job scheduled for that window.

The affected time window seems to be between 09:00Z and 12:00Z every Monday. We do not see other regular disruptions like this, but the Monday one is very reliable.

I have also created OTA-971 on our side to make Cincinnati use webhooks instead of periodic listing. That will reduce OSUS sensitivity to the problem, but there are other consumers of that repo (like CI) which are affected by it, so it should be investigated on the Quay side, too.

Filed as a folllowup to a Slack thread: https://redhat-internal.slack.com/archives/C7WH69HCY/p1679913378496759

is related to

OTA-971 Allow using Quay webhooks for informing about new OCP releases in OSUS

To Do

Assignee:: Unassigned

Reporter:: Petr Muller

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/05/15 3:54 PM

Updated:: 2025/06/12 4:41 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates