Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-507

Bring kepler operator to level 3

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • PM Power-monitoring
    • False
    • None
    • False
    • Not Selected
    • 100% To Do, 0% In Progress, 0% Done

      Proposed title of this feature request

      Bring kepler operator to level 3

      Description

      As explained in the operator SDK, level 3 is related to providing full lifecycle support:

      It should be possible to backup and restore the operand from the operator itself without any additional manual intervention other than triggering these operations. The operand data that should be backed up is any stateful data managed by the operand. You don’t need to backup the CR itself or the k8s resources created by the operator as the operator should return all resources to the same state if the CR is recreated. If your operator does not already setup the operand with other k8s resilient best practices, this should be completed to achieve this capability level. This includes liveness and readiness probes, multiple replicas, rolling deployment strategies, pod disruption budgets, CPU and memory requests and limits.

      List any affected packages or components.

      • Kepler
      • Kepler operator

      Acceptance criteria

      • Operator provides the ability to create backups of the Operand
      • Operator is able to restore a backup of an Operand
      • Operator orchestrates complex re-configuration flows on the Operand
      • Operator implements fail-over and fail-back of clustered Operands
      • Operator supports add/removing members to a clustered Operand
      • Operator enables application-aware scaling of the Operand

      Guiding questions to determine Operator reaching Level 3

      • Does your Operator support backing up the Operand?
      • Does your Operator support restoring an Operand from a backup and get it under management again?
      • Does your Operator wait for reconfiguration work to be finished and in the expected sequence?
      • Is your Operator taking cluster quorum into account, if present?
      • Does your Operator allow adding/removing read-only slave instances of your Operator?
      • Does your operand have a Liveness probe?
      • Does your operand have a Readiness probe which will fail if any aspect of the operand is not ready? e.g. if the connection to the database fails.
      • Does your operand use a rolling deployment strategy?
      • Does your operator create a PodDisruptionBudget resource for your operand pods?
      • Does your operand have CPU requests and limits set?

              rh-ee-rfloren Roger Florén
              rh-ee-rfloren Roger Florén
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: