Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:
- trt-rotation

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

Today we have an imperfect panic detection catching quite a few problems in the gather extra step by grepping the logs we pull down. However is the panic was not in the current/last pod log, it would not be detected.

Deads requested trying a better approach. His suggestion was to have origin stream all pod logs.

I fear this is too heavyweight, and instead think that Loki might be a better option, but needs investigation. Auth needs solving (auth to our grafana? or actual loki), and there could be delay between when logs show up in queries. (not sure how this works yet) \

Additionally this would need to be done in a way that origin remains usable for external users. We could disable if loki is not enabled, but then origin users would not get panic detection. Pod log streaming should work for anyone but likely dramatically increases the requirements for memory/cpu/network to run the origin tests in ways that could majorly impact the CI clusters, possibly the cluster under test as well.

We might be able to use loki once oauth secrets via TRT-1933 and PostAnalysis Framework are in place. These could be in addition to the existing panic detection that only matches captured artifacts currently

is related to

TRT-1764 Define a post test framework to generate smart junits based on module dependencies

TRT-2275 Define New Post Analysis Command

Closed

TRT-1905 Breakout panic test by namespace

Closed

links to

openshift/origin#30211: TRT-2275: introduce cluster health-check subcommand for openshift-tests

Assignee:: Unassigned

Reporter:: Devan Goodwin

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/07/17 11:34 AM

Updated:: 2025/10/14 12:34 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates