Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-21957

[spike] Plan the monitoring code refactoring

XMLWordPrintable

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Create the design, split this effort to the different action items it will require and provide an estimate for the effort each one will take

      Show
      Create the design, split this effort to the different action items it will require and provide an estimate for the effort each one will take
    • CNV-8094 - CNV Observability
    • ---
    • ---
    • CNV I/U Operators Sprint 229, CNV I/U Operators Sprint 230
    • None

      The KubeVirt metrics code is currently embedded in the heart of the operator code.
      This causes issues with code readability, code complexity and maintainability etc.

      There a 2 suggestion for improving the monitoring code in KubeVirt:
      1. Kubevirt monitoring to be done externally
      This story is about re-designing our monitoring components so they are developed and deployed externally to Kubevirt. In other words, monitoring components would be developed in a different repository and would be deployed separately (similarly to CDI, for example).

      This has many advantages, for example:

      • Enhanced development speed
      • Decoupling monitoring code from operator code
      • Enhanced security: monitor publicly available data only
      • Becoming more modular and generic - resilient to future changes

       Note that this approach is generic: it could be applied any operator, not only to Kubevirt. In addition, it can used to export data not only to Prometheus, but to other tools as well.

      For more info, motivation, goals and architecture design, please look at:

      https://github.com/kubevirt/community/pull/189

      (By the time of writing this the design proposal is still a draft. Many Changes are expected. Feedback is much appreciated)

      2. Create a monitoring directory for each operator repository and in it to have all the monitoring (metrics, alerts, runbooks) logic.

      More details can be found here https://docs.google.com/document/d/1L2lcri3SogFhjaIutbVdvnSkFNMrXeBQ7L0Zs_izJzM/edit?usp=sharing
      And an example is here https://github.com/operator-framework/operator-sdk/pull/5996

      In this spike we need to determine which of the implementations is better for KubeVirt and if it can also be considered a best practices for other operators that want to have monitoring.

      Things to consider during the evaluation of the different alternatives:
      1. Be able to have all the labels that are taken from the environment like namespace, pod, container, instance, job, endpoint etc.
      2. Be able to collect and report both metrics based on changes in the environment for the resources and metrics that should be collected periodically like CPU, Memory etc.
      3. Have a way to catch braking changes on the operators core code so that we don't get bugs in the monitoring side.

            sradco Shirly Radco
            iholder@redhat.com Itamar Holder
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: