Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-4082

Upstream - Cannot scrape metrics exported by the prometheus exporter.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • rhosdt-3.3
    • None
    • OpenTelemetry
    • None
    • 1
    • Tracing Sprint # 252, Tracing Sprint # 253
    • Critical

      Version of components:

      opentelemetry-operator.v0.96.0-7-geaf998f2

      Description of the issue:
      When we create a collector instance with prometheus exporter. The servicemonitor created requires the following label selectors.

        selector:
          matchLabels:
            app.kubernetes.io/component: opentelemetry-collector
            app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector
            app.kubernetes.io/managed-by: opentelemetry-operator
            app.kubernetes.io/part-of: opentelemetry
            operator.opentelemetry.io/collector-monitoring-service: Exists
      

      These labels are present in the operator metrics monitoring svc that we create which is used to scrape the operator metrics. And we can see the metrics being scraped by the user workload monitoring stack in OCP web console. 

      oc get svc cluster-collector-collector-monitoring -o yaml
      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2024-03-13T13:16:22Z"
        labels:
          app.kubernetes.io/component: opentelemetry-collector
          app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector
          app.kubernetes.io/managed-by: opentelemetry-operator
          app.kubernetes.io/name: cluster-collector-collector-monitoring
          app.kubernetes.io/part-of: opentelemetry
          app.kubernetes.io/version: latest
          operator.opentelemetry.io/collector-monitoring-service: Exists
        name: cluster-collector-collector-monitoring
        namespace: chainsaw-otlp-metrics
        ownerReferences:
        - apiVersion: opentelemetry.io/v1alpha1
          blockOwnerDeletion: true
          controller: true
          kind: OpenTelemetryCollector
          name: cluster-collector
          uid: dd3a6653-a670-4508-92d4-9db9b9e816f2
        resourceVersion: "369446"
        uid: fba2fc76-0fa3-4f13-9c09-3fe00614c5d8
      spec:
        clusterIP: 172.30.45.7
        clusterIPs:
        - 172.30.45.7
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        ports:
        - name: monitoring
          port: 8888
          protocol: TCP
          targetPort: 8888
        selector:
          app.kubernetes.io/component: opentelemetry-collector
          app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector
          app.kubernetes.io/managed-by: opentelemetry-operator
          app.kubernetes.io/part-of: opentelemetry
        sessionAffinity: None
        type: ClusterIP
      status:
        loadBalancer: {}
      

      However the collector and collector headless svcs that are created is missing the  operator.opentelemetry.io/collector-monitoring-service: Exists label due to which the servicemonitor selector doesn't work and the prometheus exporter metrics are not scraped by the user workload monitoring stack. 

      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2024-03-13T13:16:22Z"
        labels:
          app.kubernetes.io/component: opentelemetry-collector
          app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector
          app.kubernetes.io/managed-by: opentelemetry-operator
          app.kubernetes.io/name: cluster-collector-collector
          app.kubernetes.io/part-of: opentelemetry
          app.kubernetes.io/version: latest
        name: cluster-collector-collector
        namespace: chainsaw-otlp-metrics
        ownerReferences:
        - apiVersion: opentelemetry.io/v1alpha1
          blockOwnerDeletion: true
          controller: true
          kind: OpenTelemetryCollector
          name: cluster-collector
          uid: dd3a6653-a670-4508-92d4-9db9b9e816f2
        resourceVersion: "369432"
        uid: 761aef05-8a5d-48bf-a99f-991699fe7615
      spec:
        clusterIP: 172.30.253.40
        clusterIPs:
        - 172.30.253.40
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        ports:
        - appProtocol: grpc
          name: otlp-grpc
          port: 4317
          protocol: TCP
          targetPort: 4317
        - appProtocol: http
          name: otlp-http
          port: 4318
          protocol: TCP
          targetPort: 4318
        - name: prometheus
          port: 8889
          protocol: TCP
          targetPort: 8889
        selector:
          app.kubernetes.io/component: opentelemetry-collector
          app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector
          app.kubernetes.io/managed-by: opentelemetry-operator
          app.kubernetes.io/part-of: opentelemetry
        sessionAffinity: None
        type: ClusterIP
      status:
        loadBalancer: {}
      

      Steps to reproduce the issue:

      • Install the latest operator bundle built off upstream.
      • Run the otlp-metrics-traces test case.
      chainsaw test --skip-delete tests/e2e-openshift/otlp-metrics-traces 
      • Check that the test fails on the check metrics step.
      • Go to the chainsaw-otlp-metrics project and set the collector instance to unmanaged.
      • Edit the servicemonitor and remove the label selector 
        operator.opentelemetry.io/collector-monitoring-service: Exists
      • Rerun the metrics traces generator job.
        oc create -f 03-metrics-traces-gen.yaml
      • Then execute the check_metrics.sh script, the script exits after sometime when metrics are found.

      Expected Behaviour:

      operator.opentelemetry.io/collector-monitoring-service: Exists is added to any one of the collector svcs (to prevent duplicate metrics) when prometheus exporter is used.

      Additional Notes:
      The issue was detected in our upstream testing job. Refer https://github.com/openshift/open-telemetry-opentelemetry-operator/pull/23 

              rhn-support-iblancas Israel Blancas Alvarez
              rhn-support-ikanse Ishwar Kanse
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: