Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: PM Power-monitoring
Labels:
- no_epic

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Proposed title of this feature request

Bring kepler operator to level 4

Description

As explained in the operator SDK, level 4 is related to providing deep insights:

Setup full monitoring and alerting for your operand. All resources such as Prometheus rules (alerts) and Grafana dashboards should be created by the operator when the operand CR is instantiated. The RED method1 is a good place to start with knowing what metrics to expose. Aim to have as few alerts as possible, by alerting on symptoms that are associated with end-user pain rather than trying to catch every possible way that pain could be caused. Alerts should link to relevant consoles and make it easy to figure out which component is at fault Native k8s objects emit events (“Events” objects) for situations users or administrators should be alerted about. Your operator should do similar for state changes related to your operand. “Custom”, here, means that it should emit events specific to your operator/operand outside of the events already emitted by their deployment methodology. This, in conjunction with status descriptors for the CR conditions, give much needed visibility into actions taken by your operator/operand. Operators are codified domain-specific knowledge. Your end user should not need this domain-specific knowledge to gain visibility into what’s happening with their resource. Please, ensure that you look at the Kubernetes API conventions in the Events and status sections to know how to properly deal with them.

List any affected packages or components.

Kepler
Kepler operator

Acceptance criteria

Monitoring

Operator exposing metrics about its health
Operator exposes health and performance metrics about the Operand

Alerting and Events

Operand sends useful alerts
Custom Resources emit custom events

Example: A database Operator continues to parse the logging output of the database software and understands noteworthy log events, e.g. running out of space for database files and produces alerts. The operator also instruments the database and exposes application level, e.g. database queries per second

Guiding questions to determine Operator reaching Level 3

Does your Operator expose a health metrics endpoint?

Does your Operator expose Operand alerts?

Do you have Standard Operating Procedures (SOPs) for each alert?

Does you operator create critical alerts when the service is down and warning alerts for all other alerts?

Does your Operator watch the Operand to create alerts?

Does your Operator emit custom Kubernetes events?

Does your Operator expose Operand performance metrics?

Assignee:: Roger Florén

Reporter:: Roger Florén

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023/08/28 9:39 AM

Updated:: 2024/08/01 7:53 PM

Details

Description

Proposed title of this feature request

Description

List any affected packages or components.

Acceptance criteria

Monitoring

Alerting and Events

Guiding questions to determine Operator reaching Level 3

Attachments

Easy Agile Planning Poker

Activity

People

Dates