Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-18871

Documentation for sharding feature in 2.12, 2.13

XMLWordPrintable

    • Product / Portfolio Work
    • 6
    • False
    • Hide

      None

      Show
      None
    • False
    • None

      Note: Doc team updates the current version of the documentation and the
      two previous versions (n-2), but we address *only high-priority, or
      customer-reported issues* for -2 releases in support.
      Describe the changes in the doc and link to your dev story:

      1. - [x] Mandatory: Add the required version to the Fix version/s field.

      2.14

      2. - [x] Mandatory: Choose the type of documentation change or review.

      • [ ] We need to update to an existing topic
      • [x] We need to add a new document to an existing section

      This should be added to the Customizing Observability Configuration section

      with the topic title being "Scaling up metrics collection"

      • [ ] We need a whole new section; this is a function not
        documented before and doesn't belong in any current section
      • [ ] We need an Operator Advisory review and approval
      • [ ] We need a z-Stream (Errata) Advisory and Release note for
        MCE and/or ACM

      3. - [x] Mandatory: Find the link to where the documentation update
      should go and add it to the recommended changes. You can either use the
      published doc or the staged repo for this step:

      https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12/html-single/observability/index#customizing-observability

      Note: As the feature and doc is understood, this recommendation may
      change. If this is new documentation, link to the section where you think
      it should be placed.

      Customer Portal published version

      https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12

      Doc staged repo within the ACM Workspace:
      https://github.com/stolostron/rhacm-docs

      4. - [x] Mandatory for GA content:

      • [x] Add steps, the diff, known issue, and/or other important
        conceptual information in the following space:

      We added a new sharding feature for metrics-collector in spoke clusters.

      Initially, the metrics-collector starts with a single internal goroutine, which makes a /federate request to the in-cluster Prometheus, with the metrics allowlist set as arguments. It receives the response, in the form of the latest timeseries sample values from each timeseries and their labels, as stored in the Prometheus TSDB.

       

      It then converts that response to a Prometheus Remote Write request and sends it to the hub cluster's Observatorium API endpoint, after which it is ingested in the Hub Cluster's Thanos instance.

       From 2.12 we added a new `workers` field to the MCO CRD (which by default is set to 1). This worker value ensures that metrics-collector starts with that number of goroutines internally, and the /federate calls are sharded (so each goroutine only takes part of the allowlist as the URL params, and sends that as an individual remote write request). This ensures that you have much smaller requests, which will be sent individually, and allows the users to scale up their metrics collection as much as they need (and increase their allowlist infinitely). This also makes the collection and sending reliable.

       

      Now usually you would want to set the same workers value across all your spoke clusters. In which case, setting it on MCO CRD will ensure that the workers value is propagated to every single spoke cluster (using the spoke ObservabilityAddon CRD).

       

      But in certain cases, users might want to override workers value for certain spoke clusters only. To do this, they can set the annotation observability.open-cluster-management.io/addon-source: "override" on the spoke's ObservabilityAddon spec, and set a different worker value there.

      To revert to setting it via MCO, they can simply change the annotation back to observability.open-cluster-management.io/addon-source: "mco" 

      • [ ] *Add Required access level *(example, *Cluster
        Administrator*) for the user to complete the task:
      • [x] Add verification at the end of the task, how does the user
        verify success (a command to run or a result to see?)

      Users can verify this, by looking at arguments on the metrics-collector pods.

      • [x] Add link to dev story here:

      https://github.com/stolostron/multicluster-observability-operator/pull/1718 

      https://issues.redhat.com/browse/ACM-14316

      5. - [ ] Mandatory for bugs: What is the diff? Clearly define what the
      problem is, what the change is, and link to the current documentation. Only
      use this for a documentation bug.

              mdockery@redhat.com Mikela Jackson
              hsabhnan@redhat.com Harshil Sabhnani
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: