Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-4139

Cannot scrape many of the metrics from in-cluster monitoring stack using OTEL Prometheus receiver.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • OpenTelemetry
    • None
    • Tracing Sprint # 253

      Version of components:

      opentelemetry-operator.v0.97.1-3-gb58917e3

      Description of problem: 

      When a OTEL collector is created with Prometheus receiver to scrape in-cluster monitoring metrics, many of the metrics cannot be scraped and gives the following error:

      {"kind": "exporter", "data_type": "metrics", "name": "debug"}
      2024-04-12T08:57:11.763Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912231734, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"}
      2024-04-12T08:57:11.768Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
      

      Steps to reproduce the issue:

      • Create the OTEL collector with Prometheus receiver using following config. 
      apiVersion: v1
      kind: Namespace
      metadata:
        name: chainsaw-scrape-in-cluster-monitoring
      spec: {}
      
      
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: otel-collector
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: cluster-monitoring-view 
      subjects:
        - kind: ServiceAccount
          name: otel-collector
          namespace: chainsaw-scrape-in-cluster-monitoring
      
      
      ---
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: cabundle
        namespce: chainsaw-scrape-in-cluster-monitoring
        annotations:
          service.beta.openshift.io/inject-cabundle: "true" 
      
      ---
      apiVersion: opentelemetry.io/v1alpha1
      kind: OpenTelemetryCollector
      metadata:
        name: otel
        namespace: chainsaw-scrape-in-cluster-monitoring
      spec:
        volumeMounts:
          - name: cabundle-volume
            mountPath: /etc/pki/ca-trust/source/service-ca
            readOnly: true
        volumes:
          - name: cabundle-volume
            configMap:
              name: cabundle
        mode: deployment
        config: |
          receivers:
            prometheus: 
              config:
                scrape_configs:
                  - job_name: 'federate'
                    scrape_interval: 15s
                    scheme: https
                    tls_config:
                      ca_file: /etc/pki/ca-trust/source/service-ca/service-ca.crt
                    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                    honor_labels: true
                    params:
                      'match[]':
                        - '{__name__="kube_pod_container_state_started"}'
                    metrics_path: '/federate'
                    static_configs:
                      - targets:
                        - "prometheus-k8s.openshift-monitoring.svc.cluster.local:9091"
      
      
          exporters:
            debug: 
              verbosity: detailed
      
      
          service:
            pipelines:
              metrics:
                receivers: [prometheus]
                processors: []
                exporters: [debug]

      *Check the collector logs, no metrics are scraped and we see the following failure in the logs.

      2024-04-12T09:07:35.010Z warn internal/transaction.go:123 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1712912854999, "target_labels": "{_name_=\"up\", instance=\"prometheus-k8s.openshift-monitoring.svc.cluster.local:9091\", job=\"federate\"}"}
      2024-04-12T09:07:35.011Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
      

      *Check the same metric from the in-cluster monitoring /federate endpoint.
      https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#monitoring-querying-metrics-by-using-the-federation-endpoint-for-prometheus_accessing-monitoring-apis-by-using-the-cli

      % curl -G -k -H "Authorization: Bearer $TOKEN" https://$HOST/federate --data-urlencode 'match[]=kube_pod_container_state_started'
      
      TYPE kube_pod_container_state_started untyped
      kube_pod_container_state_started{container="alertmanager",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887789e+09 1712913040234
      kube_pod_container_state_started{container="alertmanager-proxy",endpoint="https-main",job="kube-state-metrics",namespace="openshift-monitoring",pod="alertmanager-main-0",service="kube-state-metrics",uid="6bbd5162-191e-457c-90d8-42e8392bad71",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887793e+09 1712913040234
      kube_pod_container_state_started{container="approver",endpoint="https-main",job="kube-state-metrics",namespace="openshift-network-node-identity",pod="network-node-identity-4t45w",service="kube-state-metrics",uid="f5bc2d09-fe89-4f4f-9c47-dbbc0babf61d",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887764e+09 1712913040234
      kube_pod_container_state_started{container="authentication-operator",endpoint="https-main",job="kube-state-metrics",namespace="openshift-authentication-operator",pod="authentication-operator-ffbbb5674-cj2jx",service="kube-state-metrics",uid="cfe068b5-2d3e-46c0-9bd5-d56fcb03666a",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 1.712887791e+09 1712913040234
      

       

       

            ploffay@redhat.com Pavol Loffay
            rhn-support-ikanse Ishwar Kanse
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: