Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: OpenTelemetry
Labels:
None

Story Points:
1
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
Use OTEL collector to export all metrics from one cluster
Feature Link:
OBSDA-450 - Use OTEL collector to export all metrics from one cluster
Git Pull Request:
https://github.com/openshift/openshift-docs/pull/73313/files#r1592754881
Intelligence Requested:
Market:

Sprint:
Tracing Sprint # 253

SFDC Cases Links:
SFDC Cases Counter:

Version of components:

opentelemetry-operator.v0.97.1-3-gb58917e3

Description of problem:

When a OTEL collector is created with Prometheus receiver to scrape in-cluster monitoring metrics, many of the metrics cannot be scraped and gives the following error:

{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-04-12T08:57:11.763Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912231734, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"}
2024-04-12T08:57:11.768Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}

Steps to reproduce the issue:

Create the OTEL collector with Prometheus receiver using following config.

apiVersion: v1
kind: Namespace
metadata:
  name: chainsaw-scrape-in-cluster-monitoring
spec: {}


---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-monitoring-view 
subjects:
  - kind: ServiceAccount
    name: otel-collector
    namespace: chainsaw-scrape-in-cluster-monitoring


---
kind: ConfigMap
apiVersion: v1
metadata:
  name: cabundle
  namespce: chainsaw-scrape-in-cluster-monitoring
  annotations:
    service.beta.openshift.io/inject-cabundle: "true" 

---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: chainsaw-scrape-in-cluster-monitoring
spec:
  volumeMounts:
    - name: cabundle-volume
      mountPath: /etc/pki/ca-trust/source/service-ca
      readOnly: true
  volumes:
    - name: cabundle-volume
      configMap:
        name: cabundle
  mode: deployment
  config: |
    receivers:
      prometheus: 
        config:
          scrape_configs:
            - job_name: 'federate'
              scrape_interval: 15s
              scheme: https
              tls_config:
                ca_file: /etc/pki/ca-trust/source/service-ca/service-ca.crt
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              honor_labels: true
              params:
                'match[]':
                  - '{__name__="kube_pod_container_state_started"}'
              metrics_path: '/federate'
              static_configs:
                - targets:
                  - "prometheus-k8s.openshift-monitoring.svc.cluster.local:9091"


    exporters:
      debug: 
        verbosity: detailed


    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [debug]

*Check the collector logs, no metrics are scraped and we see the following failure in the logs.

2024-04-12T09:07:35.010Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912854999, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"}
2024-04-12T09:07:35.011Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}

*Check the same metric from the in-cluster monitoring /federate endpoint.
https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#monitoring-querying-metrics-by-using-the-federation-endpoint-for-prometheus_accessing-monitoring-apis-by-using-the-cli

% curl -G -k -H "Authorization: Bearer $TOKEN" https://$HOST/federate --data-urlencode 'match[]=kube_pod_container_state_started'

TYPE kube_pod_container_state_started untyped
kube_pod_container_state_started{container="alertmanager",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887789e+09 1712913040234
kube_pod_container_state_started{container="alertmanager-proxy",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887793e+09 1712913040234
kube_pod_container_state_started{container="approver",endpoint="https-main",job="kube-state-metrics",namespace="openshift-network-node-identity",pod="network-node-identity-4t45w",service="kube-state-metrics",uid="f5bc2d09-fe89-4f4f-9c47-dbbc0babf61d",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887764e+09 1712913040234
kube_pod_container_state_started{container="authentication-operator",endpoint="https-main",job="kube-state-metrics",namespace="openshift-authentication-operator",pod="authentication-operator-ffbbb5674-cj2jx",service="kube-state-metrics",uid="cfe068b5-2d3e-46c0-9bd5-d56fcb03666a",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887791e+09 1712913040234

links to

openshift/openshift-docs#73313: OBSDOCS-866: (TRACING-3922) Configure OTEL to scrape in-cluster monitoring stack

Details

Description

Attachments

Issue Links

Activity

People

Dates