Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-19978

Develop an AI-powered OSDFM bot to monitor and analyze all OSDFM metrics and alerts

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Fleet Manager
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Moderate
    • None

      OSDFM is a service team focused on service reliability and availability as our top priorities.{}

      Currently, OSDFM team members are responsible for monitoring two main categories of information on a daily basis:

      1. Production Environment Monitoring:{}We track the status of every region in the production environment using several metrics. This includes:
        • The status of all Service Clusters in each region
        • The status of all Management Clusters under each SC
        • The state of HCPs on each MC/SC/Region, including metrics such as HCP count, installation time SLO, etc
      2. Pipeline Alert Monitoring (Integration & Stage & Production):{}OSDFM manages multiple pipelines used for deployment in both Stage and Production environments as well as testing in Integration and Stage environments. Any failures in these pipelines trigger alerts that require manual investigation by the OSDFM team.

      To reduce the operational effort on the team, we plan to develop an AI-powered bot with the following functionality:

      1. Slack Integration:{}All metric anomalies and pipeline alerts will be forwarded to the #osd-fm-alerts Slack channel.
      2. AI Monitoring & Summarization:{}The bot will continuously monitor alerts in #osd-fm-alerts, intelligently analyze and summarize the information, and then post a single consolidated message in the #wg-osd-fleet-manager channel.

      This will help the team stay informed without being overwhelmed by alert noise, while maintaining visibility and rapid response for critical issues.

              Unassigned Unassigned
              chuluo@redhat.com Chunxi Luo
              Eveline Cai Eveline Cai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: