Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2277

Onboard CNV-QE OpenShift Clusters Metrics Onto Observatorium

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • Observatorium
    • None
    • False
    • None
    • False
    • NEW
    • NEW
    • 0

      Overview:

      • In OpenShift Container Native Virtualization QE team we are frequently deploying clusters to run tests on, validate bug fixes, and more.
      • OpenShift Container Platform includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. ([1])
      • Currently, we are not persisting any data that has been collected by Prometheus and the data's life cycle is as the pod's life-cycle.
      • We would like to start persisting clusters data that has been collected by Prometheus.
      • We would also like to take the current default stack and scale it horizontally with centralized Prometheus cluster.

      Why do we want to persist metrics?

      • Prometheus data’s life cycle is as the pod’s life cycle.
      • We should be able to debug failures/events for post-mortem clusters

      Why do we need a centralized cluster?

      • Phase #1: We would like to have a dashboard where we can watch and analyze metrics data from our clusters.
      • One-stop source for all the metrics related to running clusters

       

      Requirements:

      1. What resources need to be collected?
        Metrics from Prometheus
      2. What volume of data will be collected?
      rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[4d]) = 2723.675487
      rate(prometheus_tsdb_compaction_chunk_samples_sum[4d]) = 2819.718854
      rate(prometheus_tsdb_head_samples_appended_total[4d]) = 3064.62018 
      1. What is the expected or forecasted growth in collected data?
        10% growth in data per month
      2. How many users will require read access?
        The entire OpenShift Virtualization (cnv) Department.
      3. How many clients will be emitting data?
        1 Prometheus server per each cluster,
        For PoC we are planning to write metrics from 15 clusters.
      4. How long should data be retained?
        1 Month
      5. How many independent service accounts do you need to write data into the platform?
        1 Service account will be used for all clusters hosted on all platforms (AWS/Azure/IBMC/PSI/other)

       

       

      Request Checklist:

      [V] Description of your service
      [V] Explanation of why you need would benefit from sending data to Observatorium
      [V] All of the answers to the collected requirements from section 1
      [V] The organization ID of the new customer portal account from section 2
      `260Z0RxxPFealmmU9ApvsvOZaps`
      [V] A link to the Service Now ticket from section 3
      [V] The email address specified in the service account in section 3
      `cnv-qe-devops@redhat.com`
      [V] A link to the merge request from section 4
      https://gitlab.cee.redhat.com/service/ocm-resources/-/merge_requests/2073

            prekumar@redhat.com Prem Saraswat (Inactive)
            rhn-support-mocohen Mor Cohen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: