-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
openshift-4.16.z, openshift-4.17.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
NEW
-
NEW
-
-
-
Moderate
ISSUE 1.
The runbook "PrometheusDuplicateTimestamps" [0] contains the command:
$ oc -n $NAMESPACE logs -l 'app.kubernetes.io/name=prometheus' | \ grep 'Error on ingesting samples with different value but same timestamp.*' \ | sort | uniq -c | sort -n
And the command:
$ oc -n $NAMESPACE logs -l 'app.kubernetes.io/name=prometheus' | \ grep 'Duplicate sample for timestamp.*' | sort | uniq -c | sort -n
These commands don't return anything. Let's review a cluster were the problem is present:
$ oc logs prometheus-k8s-0 -n openshift-monitoring | grep -c 'Error on ingesting samples with different value but same timestamp.*' 198 $ oc logs prometheus-k8s-1 -n openshift-monitoring | grep -c 'Error on ingesting samples with different value but same timestamp.*' 200
Let's run now the command as it's in the runbook removing all after the "grep" and adding to the grep the option "-c"
$ NAMESPACE="openshift-monitoring" $ oc -n $NAMESPACE logs -l 'app.kubernetes.io/name=prometheus' | grep -c 'Error on ingesting samples with different value but same timestamp.*' 0
The command as it's in the runbook returns 0 as the result of the command below returns always only 20 lines of logs that could contain or not the error:
$ oc -n openshift-monitoring logs -l app.kubernetes.io/name=prometheus |wc -l 20 $ oc -n openshift-monitoring logs prometheus-k8s-0 |wc -l 19019 $ oc -n openshift-monitoring logs prometheus-k8s-1 |wc -l 19047
ISSUE 2
The same 2 commands cited contain "| sort | uniq -c | sort -n"
$ oc -n $NAMESPACE logs -l 'app.kubernetes.io/name=prometheus' | \ grep 'Error on ingesting samples with different value but same timestamp.*' \ | sort | uniq -c | sort -n $ oc -n $NAMESPACE logs -l 'app.kubernetes.io/name=prometheus' | \ grep 'Duplicate sample for timestamp.*' | sort | uniq -c | sort -n
As when running the logs each entry returned contain the timestamp as observed below, each entry is unique, then " | sort | uniq -c | sort -n" is only using computational resources
1 ts=2025-01-24T16:08:41.846Z caller=scrape.go:1783 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/openshift-state-metrics/0 target=https://10.128.2.9:8443/metrics msg="Duplicate sample for timestamp" series="openshift_group_user_account{group=\"cluster-admins\",user=\"admin\"}" 1 ts=2025-01-24T16:10:41.846Z caller=scrape.go:1783 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/openshift-state-metrics/0 target=https://10.128.2.9:8443/metrics msg="Duplicate sample for timestamp" series="openshift_group_user_account{group=\"cluster-admins\",user=\"admin\"}"
Suggestion
It can be used a command like:
$ pods=$(oc -n $NAMESPACE get pods -l 'app.kubernetes.io/name=prometheus' -o jsonpath={.items[*].metadata.name}) $ for pod in $(echo $pods); do oc -n $NAMESPACE logs $pod; done | cut -c29- | sort | uniq -c | sort -n 212 caller=scrape.go:1783 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/openshift-state-metrics/0 target=https://10.128.2.9:8443/metrics msg="Duplicate sample for timestamp" series="openshift_group_user_account{group=\"cluster-admins\",user=\"admin\"}" $ for pod in $(echo $pods); do oc -n $NAMESPACE logs $pod; done | \ grep 'Error on ingesting samples with different value but same timestamp.*' | cut -c29- | sort | uniq -c | sort -n 433 caller=scrape.go:1744 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/openshift-state-metrics/0 target=https://10.128.2.9:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1