-
Epic
-
Resolution: Duplicate
-
Blocker
-
ACM 2.7.0
Epic Goal
- Define SLIs for components that will be used by Service Delivery.
- Code instrumentation for agreed upon SLIs, expose metrics.
- Define alerting rules for SLIs.
- Determine starting SLO based on aggregation of our SLIs.
Why is this important?
- Meet SLA requirements that will be established as part of SD.
- Service monitoring and alerting will be essential for quick RCA and resolution for service disruptions across environments.
Scenarios
- ACM health pre and post install/upgrade
- Hypershift Addon and its installer components
- Policy controllers, but not the actual policy being applied
- Foundation components
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
Dependencies (internal and external)
- Hypershift-addon
-
- agent addon
- agent Hypershift-operator
- agent External DNS
- Policy
- All controllers
- agents
- Foundation
- Hub side controllers
- Agent side controllers
Previous Work (Optional):
- Server Foundation F2F 2022 discussion
- Hypershift addon document
Open questions::
- Are there a set of signals SLI's that service devivery requires or suggests?
- How many of the signals can be just rules? (no code change required)
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build{}