-
Epic
-
Resolution: Done
-
Blocker
-
None
Epic Goal
- Define SLIs for components that will be used by Service Delivery.
- Code instrumentation for agreed upon SLIs, expose metrics.
- Define alerting rules for SLIs.
- Determine starting SLO based on aggregation of our SLIs.
Why is this important?
- Meet SLA requirements that will be established as part of SD.
- Service monitoring and alerting will be essential for quick RCA and resolution for service disruptions across environments.
Scenarios
Any code paths that are executed in any/all of the Server Foundation components must undergo review and work as part of this Epic requirement.
Referencing component list from: https://docs.google.com/spreadsheets/d/1d7nfEl7OhvDe69HDK132NX9NzWRLjOmHrm2nDlJOaXw/edit#gid=1946150399
- Components:
- cluster-manager
- registration-controller
- registration-webhook
- work-webhook
- placement-controller
- managedcluster-import-controller
- ocm-controller
- ocm-proxyserver?
- ocm-webhook?
- klusterlet-addon-controller
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- Server Foundation F2F 2022 discussion
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>