-
Bug
-
Resolution: Done
-
Blocker
-
None
-
None
-
2
-
False
-
None
-
False
-
OBSDA-773 - Service Monitor should only point to the collector service and not include headless
-
-
-
1
-
Tracing Sprint # 252, Tracing Sprint # 253
-
Critical
Version of components:
opentelemetry-operator.v0.96.0-7-geaf998f2
Description of the issue:
When we create a collector instance with prometheus exporter. The servicemonitor created requires the following label selectors.
selector: matchLabels: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/part-of: opentelemetry operator.opentelemetry.io/collector-monitoring-service: Exists
These labels are present in the operator metrics monitoring svc that we create which is used to scrape the operator metrics. And we can see the metrics being scraped by the user workload monitoring stack in OCP web console.
oc get svc cluster-collector-collector-monitoring -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2024-03-13T13:16:22Z" labels: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/name: cluster-collector-collector-monitoring app.kubernetes.io/part-of: opentelemetry app.kubernetes.io/version: latest operator.opentelemetry.io/collector-monitoring-service: Exists name: cluster-collector-collector-monitoring namespace: chainsaw-otlp-metrics ownerReferences: - apiVersion: opentelemetry.io/v1alpha1 blockOwnerDeletion: true controller: true kind: OpenTelemetryCollector name: cluster-collector uid: dd3a6653-a670-4508-92d4-9db9b9e816f2 resourceVersion: "369446" uid: fba2fc76-0fa3-4f13-9c09-3fe00614c5d8 spec: clusterIP: 172.30.45.7 clusterIPs: - 172.30.45.7 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: monitoring port: 8888 protocol: TCP targetPort: 8888 selector: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/part-of: opentelemetry sessionAffinity: None type: ClusterIP status: loadBalancer: {}
However the collector and collector headless svcs that are created is missing the operator.opentelemetry.io/collector-monitoring-service: Exists label due to which the servicemonitor selector doesn't work and the prometheus exporter metrics are not scraped by the user workload monitoring stack.
apiVersion: v1 kind: Service metadata: creationTimestamp: "2024-03-13T13:16:22Z" labels: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/name: cluster-collector-collector app.kubernetes.io/part-of: opentelemetry app.kubernetes.io/version: latest name: cluster-collector-collector namespace: chainsaw-otlp-metrics ownerReferences: - apiVersion: opentelemetry.io/v1alpha1 blockOwnerDeletion: true controller: true kind: OpenTelemetryCollector name: cluster-collector uid: dd3a6653-a670-4508-92d4-9db9b9e816f2 resourceVersion: "369432" uid: 761aef05-8a5d-48bf-a99f-991699fe7615 spec: clusterIP: 172.30.253.40 clusterIPs: - 172.30.253.40 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - appProtocol: grpc name: otlp-grpc port: 4317 protocol: TCP targetPort: 4317 - appProtocol: http name: otlp-http port: 4318 protocol: TCP targetPort: 4318 - name: prometheus port: 8889 protocol: TCP targetPort: 8889 selector: app.kubernetes.io/component: opentelemetry-collector app.kubernetes.io/instance: chainsaw-otlp-metrics.cluster-collector app.kubernetes.io/managed-by: opentelemetry-operator app.kubernetes.io/part-of: opentelemetry sessionAffinity: None type: ClusterIP status: loadBalancer: {}
Steps to reproduce the issue:
- Install the latest operator bundle built off upstream.
- Run the otlp-metrics-traces test case.
chainsaw test --skip-delete tests/e2e-openshift/otlp-metrics-traces
- Check that the test fails on the check metrics step.
- Go to the chainsaw-otlp-metrics project and set the collector instance to unmanaged.
- Edit the servicemonitor and remove the label selector
operator.opentelemetry.io/collector-monitoring-service: Exists
- Rerun the metrics traces generator job.
oc create -f 03-metrics-traces-gen.yaml
- Then execute the check_metrics.sh script, the script exits after sometime when metrics are found.
Expected Behaviour:
operator.opentelemetry.io/collector-monitoring-service: Exists is added to any one of the collector svcs (to prevent duplicate metrics) when prometheus exporter is used.
Additional Notes:
The issue was detected in our upstream testing job. Refer https://github.com/openshift/open-telemetry-opentelemetry-operator/pull/23
- is related to
-
OBSDA-773 Service Monitor should only point to the collector service and not include headless
- Closed
- links to