-
Feature
-
Resolution: Done
-
Undefined
-
None
-
BU Product Work
-
False
-
False
-
OCPPLAN-9733HyperShift: Central Control-Plane Management (aka Hosted Control Planes)
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
-
0
Feature Overview
To assume that software is immune to problems is a recipe for disaster. Understanding however that any system no matter how perfectly engineered it be IS prune to errors (machine or human) is the first step towards building truly reliable software.
This feature is about Observe...wait-for-it...ability, observability is a system property that is based on one or more of the following assumptions:
- Permanent health for a complex system is a fallacy.
- If it's complex and it's a collection of services working together, then its likely unpredictable.
- Failure is a given in every aspect/state of the product from design to operation.
- Debug-ability is key for building robust, supportable, and reliable systems.
Understanding that the above assumptions do exist, the question now is, how do we propagate signals from different parts of the product that would indicate a problem with a fair amount of accuracy. For that we need insight, to get insight, we need to implement mechanisms for signal propagation.
Background, and strategic fit
To be able to support, maintain, and ensure a system is alive, observability is key. Using observability, different stakeholders are able to assist and be accountable/responsible for problem resolution. Additionally, building KPIs for product success can only happen with sufficient metrics.
Goals
The goal for this feature is to implement the basic building blocks required for gaining insight and observability on all HyperShift pieces. This translate into four basic goals:
- Build the basic blocks for monitoring HyperShift:
- Expose and enable metrics for the HyperShift Operator as well as the control-plane components.
- Create basic dashboards to view the metrics.
- Create necessary alerts for failures.
- Build the basic blocks for making HyperShift log-able.
- Instrument and configure verbosity of logs in the different HyperShift components.
- Integrate with log collection tools in Openshift.
- Expose meaningful events that reflect your component's state.
- Identify specific points in HyperShift components in the path of requests and build traces around them.
Documentation Considerations
Questions to be addressed:
- What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
- Does this feature have doc impact?
- New Content, Updates to existing content, Release Note, or No Doc Impact
- If unsure and no Technical Writer is available, please contact Content Strategy.
- What concepts do customers need to understand to be successful in [action]?
- How do we expect customers will use the feature? For what purpose(s)?
- What reference material might a customer want/need to complete [action]?
- Is there source material that can be used as a reference for the Technical Writer in writing the content? If yes, please link if available.
- What is the doc impact (New Content, Updates to existing content, or Release Note)?
- is blocked by
-
MON-2143 [B1] HyperShift Monitoring Feature Parity
- Closed
-
OCPPLAN-8109 Setup Alert Configuration in RHOBS for HyperShift Monitoring
- Closed
-
OCPPLAN-8110 Setup dashboards and Grafana against RHOBS
- Closed