Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12525

node role is calculated twice in thanos-querier API

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.14.0
    • premerge
    • Monitoring
    • None
    • Moderate
    • No
    • MON Sprint 235
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Before this update, Thanos Querier failed to de-duplicate metrics by node roles. This update fixes the issue so that Thanos Querier now properly de-duplicates metrics by node roles. link:https://issues.redhat.com/browse/OCPBUGS-12525[OCPBUGS-12525]
      Show
      * Before this update, Thanos Querier failed to de-duplicate metrics by node roles. This update fixes the issue so that Thanos Querier now properly de-duplicates metrics by node roles. link: https://issues.redhat.com/browse/OCPBUGS-12525 [ OCPBUGS-12525 ]
    • Bug Fix
    • Done

      Description of problem:

      tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

      launch 4.14-ci,openshift/cluster-monitoring-operator#1926 no-spot

      3 masters, 3 workers, each node is with 4 cpus, no infra node

      $ oc get node
      NAME                                         STATUS   ROLES                  AGE   VERSION
      ip-10-0-132-193.us-east-2.compute.internal   Ready    control-plane,master   23m   v1.26.2+d2e245f
      ip-10-0-135-65.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f
      ip-10-0-149-72.us-east-2.compute.internal    Ready    worker                 14m   v1.26.2+d2e245f
      ip-10-0-158-0.us-east-2.compute.internal     Ready    worker                 14m   v1.26.2+d2e245f
      ip-10-0-229-135.us-east-2.compute.internal   Ready    worker                 17m   v1.26.2+d2e245f
      ip-10-0-234-36.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f

      labels see below

      control-plane: node-role.kubernetes.io/control-plane: ""
      master: node-role.kubernetes.io/master: ""
      worker: node-role.kubernetes.io/worker: ""

      search with "cluster:capacity_cpu_cores:sum" on admin console "Observe -> Metrics", label_node_role_kubernetes_io=master and label_node_role_kubernetes_io="" are both calculated twice

      Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus            Value
      cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
      cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12
      cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
      cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12 

      checked from thanos-querier API, same result with that from console UI(console UI used thanos-querier API)

      $ token=`oc create token prometheus-k8s -n openshift-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=cluster:capacity_cpu_cores:sum' | jq
      {
        "status": "success",
        "data": {
          "resultType": "vector",
          "result": [
            {
              "metric": {
                "__name__": "cluster:capacity_cpu_cores:sum",
                "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
                "label_kubernetes_io_arch": "amd64",
                "label_node_openshift_io_os_id": "rhcos",
                "prometheus": "openshift-monitoring/k8s"
              },
              "value": [
                1682394655.248,
                "12"
              ]
            },
            {
              "metric": {
                "__name__": "cluster:capacity_cpu_cores:sum",
                "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
                "label_kubernetes_io_arch": "amd64",
                "label_node_openshift_io_os_id": "rhcos",
                "label_node_role_kubernetes_io": "master",
                "prometheus": "openshift-monitoring/k8s"
              },
              "value": [
                1682394655.248,
                "12"
              ]
            },
            {
              "metric": {
                "__name__": "cluster:capacity_cpu_cores:sum",
                "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
                "label_kubernetes_io_arch": "amd64",
                "label_node_openshift_io_os_id": "rhcos",
                "prometheus": "openshift-monitoring/k8s"
              },
              "value": [
                1682394655.248,
                "12"
              ]
            },
            {
              "metric": {
                "__name__": "cluster:capacity_cpu_cores:sum",
                "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
                "label_kubernetes_io_arch": "amd64",
                "label_node_openshift_io_os_id": "rhcos",
                "label_node_role_kubernetes_io": "master",
                "prometheus": "openshift-monitoring/k8s"
              },
              "value": [
                1682394655.248,
                "12"
              ]
            }
          ]
        }
      } 

      no such issue if we query the expr for "cluster:capacity_cpu_cores:sum" directly

      Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus             Value
      cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
      cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12 

      should do deduplication for thanos-querier API

      Version-Release number of selected component (if applicable):

      tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

      How reproducible:

      always

      Steps to Reproduce:

      1. see the description
      2.
      3.
      

      Actual results:

      node role is calculated twice in thanos-querier API

      Expected results:

      node role should be calculated only once in thanos-querier API

              rhn-support-bburt Brian Burt
              juzhao@redhat.com Junqi Zhao
              Junqi Zhao Junqi Zhao
              Brian Burt Brian Burt
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: