Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-1312

Service Mesh admin (w/ Kiali) in OpenShift Console

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Undefined Undefined
    • OSSM 2.3.0
    • None
    • Kiali
    • None
    • Service Mesh admin (w/ Kiali) in OpenShift Console
    • False
    • None
    • False
    • doc_ack
    • Release Notes, User Experience
    • To Do
    • Dev Preview
    • Technology Preview
    • Done

      The Problem: After users install the OpenShift Service Mesh operators and create their first control plane(s), they have introduced a new, complex and potentially critical infrastructure layer into their cluster.

      There are no paths to admin this new layer of infrastructure from the OpenShift console - is it working? Is it doing what I expect it to? Are the workloads that I added part of the mesh? Is it healthy? If something's not working as expected, where do I turn?

      We do provide a "Service Mesh enabled" note and a Kiali link in the Project page, though these are easily missed. There is no info to help if you don't see them as expected.  

      Potential use cases (to be broken out into stories/issues):

      • Validate that the mesh control plane has successfully started. Via command line, this might be done with "oc get pods -n istio-system" to see that all pods have started successfully. Potential reasons for failure include a lack of resources or permissions. 
        • Validate that the mesh was created as expected. There are several possible configuration options in SMCP - enabling/disabling components like Jaeger/ES, configuring Ingress and Egress Gateways, configuring a trust domain, logging settings, etc.
        • Validate that the mesh has the scope expected (which projects are involved? Is it a whole or partial cluster?)
      • How to monitor control plane components (CPU, Memory, Logs, etc)? How to setup alerts?
      • How to add workloads to the mesh? How to validate that they've been added?
        • This is shown in the Kiali graph, though it's easily missed in a graph view.
      • How to monitor data plane components (CPU, memory, logs, config, etc)? How to setup alerts?
      • How to configure and access services via an Ingress Gateway?
      • How to setup and validate mTLS between services? (graph view?)
      • How to manage network policies around a service mesh?
        • OSSM creates some by default, though we let the user override in SMCP.
      • How to setup and validate authorization policies within a service mesh?
      • How to view application metrics?
        • Why can't I view application metrics? How to troubleshoot?
      • Do any federation peers exist?
        • What is their status?
        • How to troubleshoot? Gateway logs, Istiod logs, etc...
      • And more along these lines... validation, monitoring of control/data plane components, resources and troubleshooting. 

      Many (most?) of the above are available in Kiali, though it's not always obvious how to get to them in the OpenShift Console. Integration within the OpenShift Console will hopefully make it easier to leverage existing facilities for monitoring workloads, but with a focus on service mesh control and data plane workloads.

      Approach: We should look to deliver an initial OpenShift Console plugin that introduces a Service Mesh menu and lays ground work for other integrations, which can then happen incrementally along the OpenShift Service Mesh and Kiali operator lifecycles. Even a tab that displays the control plane status as a start would have value.

      Note: This epic focuses on admin use cases - in particular, getting started, monitoring and troubleshooting of the mesh control and data planes. We know these are areas our customers struggle with today, while the dev work flows are much more diverse between customers. We should create a different epic for more dev oriented use cases. Mesh exists across multiple roles - admins work more closely with dev teams, and devs who are also admins (SREs, etc). There are also security and networking roles.

              Unassigned Unassigned
              jlongmui@redhat.com Jamie Longmuir
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: