Uploaded image for project: 'OpenShift Top Level Product Strategy'
  1. OpenShift Top Level Product Strategy
  2. OCPPLAN-8490

Kata Observability Building Blocks

XMLWordPrintable

    • False
    • False
    • Not Set
    • No
    • Not Set
    • Not Set
    • Not Set
    • 14% To Do, 0% In Progress, 86% Done
    • Undefined

      Feature Overview

      To assume that software is immune to problems is a recipe for disaster. Understanding however that any system no matter how perfectly engineered it be IS prune to errors (machine or human) is the first step towards building truly reliable software.

      This feature is about to Observe...wait-for-it...ability, observability is a system property that is based on one or more of the following assumptions:

      • Permanent health for a complex system is a fallacy.
      • If it's complex and it's a collection of services working together, then its likely unpredictable.
      • Failure is a given in every aspect/state of the product from design to operation.
      • Debug-ability is key for building robust, supportable, and reliable systems.

      Understanding that the above assumptions do exist, the question now is, how do we propagate signals from different parts of the product that would indicate a problem with a fair amount of accuracy. For that we need insight, to get insight, we need to implement mechanisms for signal propagation.

      Background, and strategic fit

      To be able to support, maintain, and ensure a system is alive, observability is key. Using observability, different stakeholders are able to assist and be accountable/responsible for problem resolution. Additionally, building KPIs for product success can only happen with sufficient metrics.

      Goals

      The goal for this feature is to implement the basic building blocks required for gaining insight and observability on all Kata pieces. This translate into four basic goals:

      • Build the basic blocks for monitoring Kata:
        • Expose, identify, and expose basic metrics
          • Kubernetes metrics.
          • Kata specific metrics.
        • Create basic dashboards to view the metrics.
        • Create necessary alerts for failures.
      • Build the basic blocks for making Kata log-able.
        • Instrument and configure verbosity of logs in the different Kata components upstream and downstream.
        • Integrate with log collection tools in Openshift.
      • Expose meaningful events that reflect your component's state.
      • Identify specific points in kata components in the path of requests and build traces around them.

      Documentation Considerations

      Questions to be addressed:

      • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
      • Does this feature have doc impact?
      • New Content, Updates to existing content, Release Note, or No Doc Impact
      • If unsure and no Technical Writer is available, please contact Content Strategy.
      • What concepts do customers need to understand to be successful inĀ [action]?
      • How do we expect customers will use the feature? For what purpose(s)?
      • What reference material might a customer want/need to completeĀ [action]?
      • Is there source material that can be used as a reference for the Technical Writer in writing the content? If yes, please link if available.
      • What is the doc impact (New Content, Updates to existing content, or Release Note)?

              Unassigned Unassigned
              azaalouk Adel Zaalouk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: