-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
-
False
-
-
-
-
None
For new ACM 2.15 MCOA Addon (we will have GA for metrics)
we need to create as much content as possible.
This doc (can be also blog, etc) is about how to enable rightisizing.
1. For custom metrics create a new Scrapeconfig
apiVersion: monitoring.coreos.com/v1alpha1 kind: ScrapeConfig metadata: name: acm-virtualization-metrics # A unique name for your ScrapeConfig namespace: open-cluster-management-observability # As specified in the documentation labels: app.kubernetes.io/component: platform-metrics-collector # For platform metrics app: metrics # Common label from your existing config app.kubernetes.io/managed-by: multicluster-observability-operator # From existing config app.kubernetes.io/part-of: multicluster-observability-addon # From existing config app.kubernetes.io/version: 1.0.0 # From existing config chart: metrics-1.0.0 # From existing config release: multicluster-observability-addon # From existing config spec: jobName: acm-virtualization # A descriptive job name metricsPath: /federate # As per the existing configuration for federated metrics params: match[]: # ACM Resource Claims (acm_rs) metrics - '{__name__="acm_rs:namespace:cpu_request_hard"}' - '{__name__="acm_rs:namespace:cpu_request"}' - '{__name__="acm_rs:namespace:cpu_usage"}' - '{__name__="acm_rs:namespace:cpu_recommendation"}' - '{__name__="acm_rs:namespace:memory_request_hard"}' - '{__name__="acm_rs:namespace:memory_request"}' - '{__name__="acm_rs:namespace:memory_usage"}' - '{__name__="acm_rs:namespace:memory_recommendation"}' - '{__name__="acm_rs:cluster:cpu_request_hard"}' - '{__name__="acm_rs:cluster:cpu_request"}' - '{__name__="acm_rs:cluster:cpu_usage"}' - '{__name__="acm_rs:cluster:cpu_recommendation"}' - '{__name__="acm_rs:cluster:memory_request_hard"}' - '{__name__="acm_rs:cluster:memory_request"}' - '{__name__="acm_rs:cluster:memory_usage"}' - '{__name__="acm_rs:cluster:memory_recommendation"}' # ACM Resource Claims VM (acm_rs_vm) metrics - '{__name__="acm_rs_vm:namespace:cpu_request"}' - '{__name__="acm_rs_vm:namespace:cpu_usage"}' - '{__name__="acm_rs_vm:namespace:memory_request"}' - '{__name__="acm_rs_vm:namespace:memory_usage"}' - '{__name__="acm_rs_vm:namespace:cpu_recommendation"}' - '{__name__="acm_rs_vm:namespace:memory_recommendation"}' - '{__name__="acm_rs_vm:cluster:cpu_request"}' - '{__name__="acm_rs_vm:cluster:cpu_usage"}' - '{__name__="acm_rs_vm:cluster:memory_request"}' - '{__name__="acm_rs_vm:cluster:memory_usage"}' - '{__name__="acm_rs_vm:cluster:cpu_recommendation"}' - '{__name__="acm_rs_vm:cluster:memory_recommendation"}' scheme: HTTPS # From existing configuration scrapeClass: ocp-monitoring # From existing configuration staticConfigs: - targets: - prometheus-k8s.openshift-monitoring.svc:9091 # From existing configuration
2. scrapeconfig must be referenced:
—
# 2. Strategic Merge Patch for ClusterManagementAddOn # Apply this YAML to your HUB CLUSTER. # This will SAFELY add the new ScrapeConfig reference to your existing # 'multicluster-observability-addon'. apiVersion: addon.open-cluster-management.io/v1alpha1 kind: ClusterManagementAddOn metadata: name: multicluster-observability-addon namespace: open-cluster-management-observability # Ensure this matches your ClusterManagementAddOn's namespace spec: installStrategy: placements: - name: global # Name of the existing placement namespace: open-cluster-management-global-set # <<< ADDED: Required namespace from your ClusterManagementAddOn configs: # Path to the list of configs - $patch: append # Directive to add to the list group: monitoring.coreos.com resource: scrapeconfigs name: acm-virtualization-metrics # Must match your ScrapeConfig's name namespace: open-cluster-management-observability # Must match your ScrapeConfig's namespace
3. verification
Now I can see it is found:
- desiredConfig:
name: acm-virtualization-metrics
namespace: open-cluster-management-observability
specHash: 71c5b4244e7c13e1cd715901c811baf74359e8dc1060032d049fca17e727ef7f
And we see it is deployed in the open-cluster-management-agent-addon namespace.
Then, from the terminal of the prom-agent-platform-metrics-collector-0 pod in the open-cluster-management-agent-addon namespace, i see it is picked up:
$ less /etc/prometheus/config_out/prometheus.env.yaml
global:
scrape_interval: 120s
scrape_timeout: 30s
external_labels:
prometheus: open-cluster-management-agent-addon/platform-metrics-collector
prometheus_replica: prom-agent-platform-metrics-collector-0
scrape_configs:
- job_name: scrapeConfig/open-cluster-management-agent-addon/acm-virtualization-metrics
metrics_path: /federate
params:
match[]:
- '{__name__="acm_rs:namespace:cpu_request_hard"}'
- '{__name__="acm_rs:namespace:cpu_request"}'
1. - [ ] Mandatory: Add the required version to the Fix version/s field.
2. - [ ] Mandatory: Choose the type of documentation change or review.
- [ ] We need to update to an existing topic
- [ ] We need to add a new document to an existing section
- [ ] We need a whole new section; this is a function not
documented before and doesn't belong in any current section
- [ ] We need an Operator Advisory review and approval
- [ ] We need a z-Stream (Errata) Advisory and Release note for
MCE and/or ACM
3. - [ ] Mandatory: Find the link to where the documentation update
should go and add it to the recommended changes. You can either use the
published doc or the staged repo for this step:
Note: As the feature and doc is understood, this recommendation may
change. If this is new documentation, link to the section where you think
it should be placed.
Customer Portal published version
https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.12
Doc staged repo within the ACM Workspace:
https://github.com/stolostron/rhacm-docs
4. - [ ] Mandatory for GA content:
- [ ] Add steps, the diff, known issue, and/or other important
conceptual information in the following space:
- [ ] *Add Required access level *(example, *Cluster
Administrator*) for the user to complete the task:
- [ ] Add verification at the end of the task, how does the user
verify success (a command to run or a result to see?)
- [ ] Add link to dev story here:
5. - [ ] Mandatory for bugs: What is the diff? Clearly define what the
problem is, what the change is, and link to the current documentation. Only
use this for a documentation bug.