-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
Analysis of historic queries in Thanos for large scale deployment
-
1
-
False
-
None
-
False
-
-
Not Selected
-
To Do
-
0% To Do, 0% In Progress, 100% Done
-
Observability Sprint 2023-05
Value Statement
https://docs.google.com/document/d/1bQurT6175AqKAiTCf6vLrkD5DdrQtju1o0tcfbf41co/
Address:
- Historical query from OOTB grafana dashboard times out. We know that this happens when they increase the time range.
- We also know of cases where even at a lower time range the sheer cardinality of metrics creates issues - like a namespace that has 4k pods breaks certain displays.
- We find that the compactor crashes or gets stuck - reasons may be different (and that is the goal of this exercise), but this problem does arise when we start accumulating data.
- Store PVs keep on increasing in size continuously (and this we have seen even when compactor works)
Ensure the issue title clearly reflects the value of this user story to the
intended persona. (Explain the "WHY")
Definition of Done for Engineering Story Owner (Checklist)
- ...
Development Complete
- The code is complete.
- Functionality is working.
- Any required downstream Docker file changes are made.
Tests Automated
- [ ] Unit/function tests have been automated and incorporated into the
build. - [ ] 100% automated unit/function test coverage for new or changed APIs.
Secure Design
- [ ] Security has been assessed and incorporated into your threat model.
Multidisciplinary Teams Readiness
- [ ] Create an informative documentation issue using the [Customer
Portal_doc_issue template](
https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),
and ensure doc acceptance criteria is met. Link the development issue to
the doc issue. - [ ] Provide input to the QE team, and ensure QE acceptance criteria
(established between story owner and QE focal) are met.
Support Readiness
- [ ] The must-gather script has been updated.