Uploaded image for project: 'Cloud Infrastructure Security & Compliance'
  1. Cloud Infrastructure Security & Compliance
  2. CMP-1441

Compliance Operator Alerting & Healing

XMLWordPrintable

    • compliance-operator-alerting-and-healing
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 0
    • 0% 0%

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      Epic Goal

      Enhance the compliance operator to provide metrics into it's health. In cases where the operator can't function normally, it should provided detailed alerts.

      Why is this important?

      Compliance Operator on all 4 FedRAMP hive clusters has been in a failing state since 4/10. Upon investigation, it was found pods were in a failing state due to an expired certificate. This seems to have been caused by compliancesuites being in a "stuck" state where it never completes the RUNNING phase. Manually deleting this compliancesuite will return all pods and the operator back to a healthy state.

      Scenarios

      1. As a Kubernetes/OpenShift administrator, I need to know when the compliance operator isn't functioning properly, or it unable to perform scans (e.g., PagerDuty integration)

      Acceptance Criteria

      • The compliance operator must provide, or integrate with a notification system
      • Must have detailed documentation that describes how to configure and integrate the compliance operator for alerting

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            Unassigned Unassigned
            lbragsta@redhat.com Lance Bragstad
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: