Uploaded image for project: 'OpenStack Strategy'
  1. OpenStack Strategy
  2. RHOSSTRAT-1068

Default Alerting for Network Observability

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • rhos-18.0.17 FR 5
    • None
    • NFV
    • None
    • Not Selected
    • False
    • False
    • Hide

      None

      Show
      None
    • 0
    • 0
    • rhos-connectivity-nfv

      1. Proposed title of this feature request

      Default Alerting for Network Observability

      2. What is the nature and description of the request?

      With metrics being collected by the openstack-network-exporter, there are some metrics that always indicate a problem: buffers overflowing, packets being dropped, that sort of thing.  We should incorporate a base set of alarms that will trigger when these metrics are non-zero.  This spans both kernel data path and userspace data path.

      3. Why does the customer need this? (List the business requirements here)

      Having metrics observability is great, but the added value of showing we understand knowable bad metrics patterns and we will alert the customer to them shows a holistic approach to observability design.  And getting these alerts for free could be a great value add for customers.

      4. List any affected packages or components.

      openstack-network-exporter

      thanos

       

      Acceptance/Done Criteria:

      This feature automatically provides operators (i.e Site reliability Engineer) with essential OpenStack insight from the environment 

      1. Enablement and Provisioning
        1. Provide a way where customer can enable/disable such alerts
      2. Console Visibility and Integration
        1. We need to figure out where to push these alerts and how to integrate/reach those external tools
      3. Core Alert Functionality (Only control plane right now)
        1. Service Health about openstack networking services like ovn-controller etc. down/unrepsonsice etc
        2. Interface Flapping
      4. Configuration and Documentation
        1. Documentation to explain how to configure/use these alerts 

      Definition of Done:

      1) Document all the limitations encountered while implementing/testing this feature
      2) Document "config guide" for this feature (along with topology if possible)
      3) All QE test should have been automated, executed and passed in CI
      4) Enablement for Support

      References:

      Alerts in Specification

              hakhande Haresh Khandelwal
              rh-ee-gurpsing Gurpreet Singh
              Gurpreet Singh Gurpreet Singh
              Edu Alcaniz Edu Alcaniz
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: