Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-8812

Multi-cluster: metrics and profiling

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • GITOPS-8810Multi-cluster GitOps - Tech Preview
    • 0% To Do, 0% In Progress, 100% Done

      Feature Overview

      In order to collect enough data for debugging and to improve scalability going forward, we need the code to produce both metrics and profiling data.

      Metrics should be very detailed, such as timing for certain operations, network connections, authentication, etc etc. however they should not expose any sensitive data.

      Goals

      To provide metrics and insight into the runtime of both principal and agent components to aid with operations and troubleshooting.

      Requirements

       

      Requirements Notes IS MVP
      Both agent and principal expose a metrics server on a configurable TCP port    
      Metrics server can be enabled or disabled, with enabled being the default    
      Metrics are exported in a widely understood format such as Prometheus metric data    
      Installation manifests are exposing the metrics servers using services    
      Initial set of described metrics is implemented in the code    
      The go profiler (pprof) can be enabled or disabled, with disabled being the default    
      Documentation exist on which metrics exist and how to interpret them    
      Documentation exist on how to turn on/off the Go profiler and how to access it    

      Use Cases

      • Metrics will help customers to run and tune both, agent and principal components
      • Metrics and profiling data will help engineering and customer support to troubleshoot production environments

      Out of scope

      • Enhanced metrics, introspection or telemetry such as OpenTelemtry is out of scope for this feature (to be handled in a different feature)

      Dependencies

      <Link or at least explain any known dependencies.>

      Background, and strategic fit

      <What does the person writing code, testing, documenting need to know?>

      Assumptions

      <Are there assumptions being made regarding prerequisites and dependencies?>

      <Are there assumptions about hardware, software or people resources?>

      Customer Considerations

      <Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

      Documentation/QE Considerations

      <What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?>

      <Does this feature have a doc impact? Possible values are: New Content, Updates to existing content,  Release Note, or No Doc Impact?>

      <Are there assumptions being made regarding prerequisites and dependencies?>

      <Are there assumptions about hardware, software or people resources?>

      Impact

      <If the feature is ordered with other work, state the impact of this feature on the other work>

      Related Architecture/Technical Documents

      <links>

      Definition of Ready

      • The objectives of the feature are clearly defined and aligned with the business strategy.
      • All feature requirements have been clearly defined by Product Owners.
      • The feature has been broken down into epics.
      • The feature has been stack ranked.
      • Definition of the business outcome is in the Outcome Jira (which must have a parent Jira).

       
       

              jparsai Jayendra Parsai
              jfischer@redhat.com Jann Fischer
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: