Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

Fallout from ~~OCPBUGS-50510~~, we can now get etcdserver timeouts whereas previously clients would retry and succeed, this seems to hit pod sandbox creation specifically during the window of time we're setting up monitortests prior to e2e testing.

Two goals here:

Dig into how heavy the load is during monitortest StartCollection phase from the kube api audit logs. Via deads "openshift/cluster-debug-tools has a handy audit command". We should document whatever use of the tool proves useful here in our team drive somewhere.

Then we likely proceed to slow down the monitortest initialization, perhaps only run a few monitortest setups at a time here instead of all of them in parallel at once: https://github.com/openshift/origin/blob/50451ebe907765cbe3a5537ed089eb0045f9e0f6/pkg/monitortestframework/impl.go#L86

We'll need to then monitor for these errors coming out of that test, this postgres query should help:

select r.timestamp, r.url
from prow_job_run_tests t, prow_job_run_test_outputs o, prow_job_runs r, prow_jobs j
where t.test_id = 260 and t.id = o.prow_job_run_test_id and r.id = t.prow_job_run_id
and j.id = r.prow_job_id
and j.release = '4.18'
and o.output like '%etcdserver%'
order by r.timestamp asc;

is related to

OCPBUGS-50510 etcd timeouts causing failed pod sandbox creation writing network status

Closed

links to

openshift/origin#29635: TRT-2061: brute throttling

openshift/origin#29637: TRT-2061: throttle monitor startup refactor

openshift/origin#29645: TRT-2061: backport 4.18

openshift/origin#29649: TRT-2061: Test disable etcd log analyzer

Assignee:: Forrest Babcock

Reporter:: Devan Goodwin

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/03/28 10:50 AM

Updated:: 2025/10/15 3:46 PM

Resolved:: 2025/05/21 1:12 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates