-
Story
-
Resolution: Done
-
Normal
-
None
Value Statement
Exposing metrics from the search components will help with debugging and SLOs.
Definition of Done for Engineering Story Owner (Checklist)
Reference: https://sre.google/sre-book/monitoring-distributed-systems/
Metrics to collect and expose from search-indexer:
- Time it takes to complete a the request from managed cluster. (latency)
- Histogram.
- Full re-sync 0 to 10,000 resources
- Full re-sync 10,001 to 25,000 resources
- Full re-sync 25,001 to 100,000 resources
- Full re-sync over 100,001 resources
- Delta sync 0 to 100 resources
- Delta sync over 101 resources.
- Total current requests count. (saturation)
- Counter that increases and decreases as requests come and get resolved.
- This may change too quickly because a typical request takes <1 second. Will need to figure out what is a useful sampling.
- Alternative we could infer this if we have the request start and end times.
- Count rejected requests from Managed Clusters. (saturation/errors)
- We may already have this from the http response code = 429
- Count errors responses (errors)
- Use http response code.
Retry count from managed cluster.- We can infer this from the rejected request counts.
DB connection counts- Not relevant enough, the pool will use all connections available.
- Document all currently exposed metrics in README.
- Document the metrics in the metrics-chronicole repo.
Development Complete
- The code is complete.
- Functionality is working.
- Any required downstream Docker file changes are made.
Tests Automated
- [ ] Unit/function tests have been automated and incorporated into the
build. - [ ] 100% automated unit/function test coverage for new or changed APIs.
Secure Design
- [ ] Security has been assessed and incorporated into your threat model.
Multidisciplinary Teams Readiness
- [ ] Create an informative documentation issue using the [Customer
Portal_doc_issue template](
https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),
and ensure doc acceptance criteria is met. Link the development issue to
the doc issue. - [ ] Provide input to the QE team, and ensure QE acceptance criteria
(established between story owner and QE focal) are met.
Support Readiness
- [ ] The must-gather script has been updated.
- is cloned by
-
ACM-3288 Search should expose usage metrics (search-api)
- Closed