Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2609

error Prometheus openshift-monitoring/prometheus-k8s-0 has missed 2 rule group evaluations

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW

      This seems to come from the very long standing bug in the `kube-prometheus-stack`currently in re-opened state[1]

       

      ts=2022-05-31T16:27:19.926Z caller=manager.go:660 level=warn component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=92
      ts=2022-05-31T16:27:20.043Z caller=manager.go:660 level=warn component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=45
      ts=2022-05-31T16:27:20.300Z caller=manager.go:660 level=warn component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=247
      ts=2022-05-31T16:27:20.553Z caller=manager.go:660 level=warn component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=247
      ts=2022-05-31T16:27:20.759Z caller=manager.go:660 level=warn component="rule manager" group=kube-apiserver.rules msg="Error on ingesting out-of-order result from rule evaluation" numDropped=247

      We usually see this error in any (but not limited too - though) this two situations

      .- NTP issues

      .- Slow storage backend performance

      Checking storage we see no issue, backend is fibre channel attached flashcore storage. Should be really fast

      Checking NTP we found not relevant issue, clocks sync seem to be OK

      chronyc sources
      210 Number of sources = 2
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^* xxx.cloud.local       2  10   377    61    45us[ -382us] +/ 7464us
      ^- xxx.cloud.local       3   9   377    21    33us[  -33us] +/   25ms
       date
      Tue Jun  7 06:27:36 UTC 2022
      chronyc sources
      210 Number of sources = 2
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^- xxx.cloud.local       3   9   377   312   579us[ -579us] +/   25ms
      ^* xxx.cloud.local       2  10   377   750   101us[ +220us] +/ 8198us
      date
      Tue Jun  7 06:27:36 UTC 2022
      chronyc sources
      210 Number of sources = 2
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^* xxx.cloud.local       2  10   377   137   102us[ -134us] +/ 7312us
      ^- xxx.cloud.local       3  10   377   651   101us[ -132us] +/   25ms
      date
      Tue Jun  7 06:27:36 UTC 2022
      chronyc sources
      210 Number of sources = 2
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^* xxx.cloud.local       2  10   377   738    73us[ -105us] +/ 8074us
      ^- xxx.cloud.local       3  10   377   650   527us[ -527us] +/   26ms
      date
      Tue Jun  7 06:27:36 UTC 2022
      chronyc sources
      210 Number of sources = 2
      MS Name/IP address         Stratum Poll Reach LastRx Last sample               
      ===============================================================================
      ^- xxx.cloud.local       3   8   377   179    83us[  -83us] +/   25ms
      ^* xxx.cloud.local       2  10   377   220   121us[ -136us] +/ 7868us
      date
      Tue Jun  7 06:27:36 UTC 2022

       

      [1] https://github.com/prometheus-community/helm-charts/issues/1283

              Unassigned Unassigned
              rhn-support-vmedina1 Victor Medina
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: