-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
1
-
False
-
None
-
False
-
OBSDA-450 - Use OTEL collector to export all metrics from one cluster
-
-
-
Tracing Sprint # 253
Version of components:
opentelemetry-operator.v0.97.1-3-gb58917e3
Description of problem:
When a OTEL collector is created with Prometheus receiver to scrape in-cluster monitoring metrics, many of the metrics cannot be scraped and gives the following error:
{"kind": "exporter", "data_type": "metrics", "name": "debug"} 2024-04-12T08:57:11.763Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912231734, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"} 2024-04-12T08:57:11.768Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
Steps to reproduce the issue:
- Create the OTEL collector with Prometheus receiver using following config.
apiVersion: v1 kind: Namespace metadata: name: chainsaw-scrape-in-cluster-monitoring spec: {} --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-monitoring-view subjects: - kind: ServiceAccount name: otel-collector namespace: chainsaw-scrape-in-cluster-monitoring --- kind: ConfigMap apiVersion: v1 metadata: name: cabundle namespce: chainsaw-scrape-in-cluster-monitoring annotations: service.beta.openshift.io/inject-cabundle: "true" --- apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel namespace: chainsaw-scrape-in-cluster-monitoring spec: volumeMounts: - name: cabundle-volume mountPath: /etc/pki/ca-trust/source/service-ca readOnly: true volumes: - name: cabundle-volume configMap: name: cabundle mode: deployment config: | receivers: prometheus: config: scrape_configs: - job_name: 'federate' scrape_interval: 15s scheme: https tls_config: ca_file: /etc/pki/ca-trust/source/service-ca/service-ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token honor_labels: true params: 'match[]': - '{__name__="kube_pod_container_state_started"}' metrics_path: '/federate' static_configs: - targets: - "prometheus-k8s.openshift-monitoring.svc.cluster.local:9091" exporters: debug: verbosity: detailed service: pipelines: metrics: receivers: [prometheus] processors: [] exporters: [debug]
*Check the collector logs, no metrics are scraped and we see the following failure in the logs.
2024-04-12T09:07:35.010Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912854999, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"} 2024-04-12T09:07:35.011Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
*Check the same metric from the in-cluster monitoring /federate endpoint.
https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#monitoring-querying-metrics-by-using-the-federation-endpoint-for-prometheus_accessing-monitoring-apis-by-using-the-cli
% curl -G -k -H "Authorization: Bearer $TOKEN" https://$HOST/federate --data-urlencode 'match[]=kube_pod_container_state_started' TYPE kube_pod_container_state_started untyped kube_pod_container_state_started{container="alertmanager",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887789e+09 1712913040234 kube_pod_container_state_started{container="alertmanager-proxy",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887793e+09 1712913040234 kube_pod_container_state_started{container="approver",endpoint="https-main",job="kube-state-metrics",namespace="openshift-network-node-identity",pod="network-node-identity-4t45w",service="kube-state-metrics",uid="f5bc2d09-fe89-4f4f-9c47-dbbc0babf61d",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887764e+09 1712913040234 kube_pod_container_state_started{container="authentication-operator",endpoint="https-main",job="kube-state-metrics",namespace="openshift-authentication-operator",pod="authentication-operator-ffbbb5674-cj2jx",service="kube-state-metrics",uid="cfe068b5-2d3e-46c0-9bd5-d56fcb03666a",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887791e+09 1712913040234