Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-1985

Allow admin users to create new alerting rules based on platform metrics

XMLWordPrintable

    • Allow admin users to create new alerting rules based on platform metrics
    • False
    • False
    • NEW
    • To Do
    • OBSDA-2 - Allow administrators to create new alerting rules based on platform-defined metrics
    • Impediment
    • NEW
    • 0% To Do, 0% In Progress, 100% Done
    • Hide
      ==== New option to create alerts based on core platform metrics

      With this release, administrators can create new alerting rules based on core platform metrics.
      You can now modify settings for existing platform alerting rules by adjusting thresholds and by changing labels.
      You can also define and add new custom alerting rules by constructing a query expression based on core platform metrics in the openshift-monitoring namespace.
      For more information, see xref:../monitoring/managing-alerts.adoc#managing-core-platform-alerting-rules_managing-alerts[Managing alerting rules for core platform monitoring].
      Show
      ==== New option to create alerts based on core platform metrics With this release, administrators can create new alerting rules based on core platform metrics. You can now modify settings for existing platform alerting rules by adjusting thresholds and by changing labels. You can also define and add new custom alerting rules by constructing a query expression based on core platform metrics in the openshift-monitoring namespace. For more information, see xref:../monitoring/managing-alerts.adoc#managing-core-platform-alerting-rules_managing-alerts[Managing alerting rules for core platform monitoring].
    • Enhancement
    • Done

      Epic Goal

      • Allow admin user to create new alerting rules, targeting metrics in any namespace
      • Allow cloning of existing rules to simplify rule creation
      • Allow creation of silences for existing alert rules

      Why is this important?

      • Currently, any platform-related metrics (exposed in a openshift-, kube- and default namespace) cannot be used to form a new alerting rule. That makes it very difficult for administrators to enrich our out of the box experience for the OpenShift Container Platform with new rules that may be specific to their environments.
      • Additionally, we had requests from customer to allow modifications of our existing, out of the box alerting rules (for instance tweaking the alert expression or changing the severity label). Unfortunately, that is not easy since most rules come from several open source projects, or other OpenShift components, and any modifications would make a seamless upgrade not really seamless anymore. Imagine K8s changes metrics again (see 1.14) and we have to update our rules. We would not know what modifications have been done (even just the threshold might be difficult if upstream changes that as well) and we would not be able to upgrade these rules.

      Scenarios

      • I'd like to modify the query expression of an existing rule (because the threshold value doesn't match with my environment).

      Cloning the existing rule should end up with a new rule in the same namespace.
      Modifications can now be done to the new rule.
      (Optional) You can silence the existing rule.

      • I'd like to create a new rule based on a metric only available to an openshift-* namespace

      Create a new PrometheusRule object inside the namespace that includes the metrics you need to form the alerting rule.

      • I'd like to update the label of an existing rule.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • Ability to distinguish between rules deployed by us (CMO) and user created rules

      Dependencies (internal and external)

      Previous Work (Optional):

      Open questions::

      1. Distinguish between operator-created rules and user-created rules
        Currently no such mechanism exists. This will need to be added to prometheus-operator or cluster-monitoring-operator.

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            jfajersk@redhat.com Jan Fajerski
            jfajersk@redhat.com Jan Fajerski
            Junqi Zhao Junqi Zhao
            Votes:
            4 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: