Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2384

Double scrape_interval for CMO controlled ServiceMonitors for SNO

XMLWordPrintable

    • SNO scape intervals
    • False
    • None
    • False
    • Not Selected
    • NEW
    • To Do
    • Impediment
    • NEW
    • 0% To Do, 0% In Progress, 100% Done

      Epic Goal

      • Offer the option to double the scrape intervals for CMO controlled ServiceMonitors in single node deployments
      • Alternatively automatically double the same scrape intervals if CMO detects an SNO setup

      The potential target ServiceMonitors are:

      • kubelet
      • kube-state-metrics
      • node-exporter
      • etcd
      • openshift-state-metrics

      Why is this important?

      • Reduce CPU usage in SNO setups
      • Specifically doubling the scrape interval is important because:
      1. we are confident that this will have the least chance to interfere with existing rules. We typically have rate queries over the last 2 minutes (no shorter time window). With 30 second scrape intervals (the current default) this gives us 4 samples in any 2 minute window. rate needs at least 2 samples to work, we want another 2 for failure tolerance. Doubling the scrape interval will still give us 2 samples in most 2 minute windows. If a scrape fails, a few rule evaluations might fail intermittently.
      2. We expect a measureable reduction of CPU resources (see previous work)

      Scenarios

      1. RAN deployments (Telco Edge) are SNO deployments. In these setups a full CMO deployment is often not needed and the default setup consumes too many resources. OpenShift as a whole has only very limited CPU cycles available and too many cycles are spend on Monitoring

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Previous Work (Optional):

      1. https://issues.redhat.com/browse/MON-1569

      Open questions:

      1. Whether doubling some scrape intervals reduces CPU usage to fit into the assigned budget

      Non goals

      • Allow arbitrarily long scrape intervals. This will interfere with alert and recoring rules
      • Implement a global override to scrape intervals.

            jfajersk@redhat.com Jan Fajerski
            jfajersk@redhat.com Jan Fajerski
            Tai Gao Tai Gao
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: