Uploaded image for project: 'Red Hat Internal Developer Platform'
  1. Red Hat Internal Developer Platform
  2. RHIDP-3596

Expose metrics for critical functionality

Prepare for Y ReleasePrepare for Z ReleaseRemove QuarterXMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • 1.4
    • 1.3
    • Core platform
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • 80% To Do, 20% In Progress, 0% Done

      Feature Overview (aka. Goal Summary)

      An elevator pitch (value statement) that describes the Feature in a clear,
      concise way.

      RHDH supports the monitoring of prometheus metrics but the only metrics that are exposed today come from upstream (catalog and scaffolder). We need to think about areas of failure that could impact the availability and integrity of RHDH and its integrating services.

      Goals (aka. expected user outcomes)

      The observable functionality that the user now has as a result of receiving
      this feature. Include the anticipated primary user type/persona and which
      existing features, if any, will be expanded.

      • The RHDH platform engineer can forward prometheus metrics to their monitoring stack in order to configure alerting for critical areas of concern with the openTelemetry client
      • Expose metrics that customers can take action on e.g.
        • Number of IdP sync failures
        • Number of auth failures
        • Any system errors that the admin can either recover from (i.e. pod restart, catalog clean up) or follow up with by monitoring external service outages (e.g. github, Azure, Quay, etc)

      Requirements (aka. Acceptance Criteria):

      A list of specific needs or objectives that a feature must deliver in order
      to be considered complete. If the feature spans across releases then good
      to have scope for each release with acceptance criteria. Be sure to
      include nonfunctional requirements such as security, reliability,
      performance, maintainability, scalability, usability, etc.

      • Instrumentation upstream has deprecated prom-client in favour of opentelemetry. We need to determine what that means for migration and support. For example, catalog metrics is considered experimental. We can't declare support for it if it has that designation.
      • Review the metrics that are exposed upstream to see if they are enough to cover our customers' situations. Scaffolder and Catalog metrics seem to be based on the deprecated prom-client. We need to determine if a replacement exists. 
      • Instrument key plugins that will require metrics. Integrations with external service providers are a primary candidate
      • Build up scenarios for Docs and QE testing

      Out of Scope (Optional)

      High-level list of items that are out of scope.

      <your text here>

      Customer Considerations (Optional)

      Provide any additional customer-specific considerations that must be made
      when designing and delivering the Feature. Initial completion during
      Refinement status.

      <your text here>

      Documentation Considerations

      Provide information that needs to be considered and planned so that
      documentation will meet customer needs. If the feature extends existing
      functionality, provide a link to its current documentation.

      <your text here>

            ktsao@redhat.com Kim Tsao
            ktsao@redhat.com Kim Tsao
            RHIDP - Security
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: