Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-11376

Support ACM-based multi-cluster Kiali monitoring

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • OSSM 3.2.0
    • Kiali
    • None
    • Support ACM-based multi-cluster Kiali monitoring
    • False
    • Hide

      None

      Show
      None
    • False
    • To Do

      As of OSSM 3.2 (Kiali 2.17), Kiali supports an external deployment model that targets  a multi-cluster environment where Kiali is deployed on a cluster separate from the mesh clusters. This RFE has that deployment model in mind, although it is likely applicable to any multi-cluster mesh deployment, so Kiali could also be co-located with an Istio control plane.

      The main architectural point is that for Kiali to present an end-to-end visualization of a multi-cluster environment, it requires unified metrics, as Kiali only connects to a single metric store (Prometheus). This is possible for customers that are OK providing their own Prometheus solution, but that's not possible for all customers, nor is it a preferred solution for OSSM and Red Hat, because it does not get the benefits of the standard OpenShift observability stack.

      For OpenShift, the recommended metric integration is to leverage the User Workload Monitoring (UWM) Prometheus storage. So, each mesh cluster should have its metrics scraped locally into the UWM store. UWM does not support standard Prometheus federation, the supported approach using AWM for unifying metrics is via ACM. The metrics are available to Kiali via the Thanos front-end.

       The general deployment model looks like this:


       
      It is possible to set up this deployment model using OSSM 3.2, but the issue is that Kiali needs an oauth token to authenticate to the Observability Service on the hub cluster. ddemoiti was able to get this working using a short lived (24h) token of an OpenShift user on the hub. Also, Kiali does not support authentication using longer lived certificates.
      We need a supported, documented, ACM-based solution that has a long-lived token solution.

      Note that there are reasons to take the ACM approach over COO or other approaches.

      @Simon Pasquier mentioned, "...I wouldn't encourage customers to use Prometheus for remote-write ingestion beyond prototyping and small environments. Our (Red Hat) go-to solution is definitely ACM ... Because ACM can(should) scale horizontally while Prometheus can only scale vertically. If you ingest metrics coming from many clusters into a single Prometheus instance, you would be quickly limited".

      Note that this should likely get tied to an overall multi-cluster ACM effort for OSSM. The exploratory effort into ACM has been closed, but an actual implementation epic does not yet exist (cc jlongmui@redhat.com). See https://issues.redhat.com/browse/OSSM-9061

       

        1. dm3.png
          57 kB
          Jay Shaughnessy

              Unassigned Unassigned
              rhn-engineering-jshaughn Jay Shaughnessy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: