Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5447

[4.9] Storage I/O metrics on some console dashboards show as no datapoints found

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Undefined
    • None
    • 4.9.z
    • Monitoring
    • None
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      same as OCPBUGS-2623, the issue also exist in 4.9, "Storage IO" graphs show "No datapoints found.", affected dashboards

      "Kubernetes / Compute Resources / Cluster" dashboard
      "Kubernetes / Compute Resources / Namespace (Pods)" dashboard
      "Kubernetes / Compute Resources / Pod" dashboard

      see the prometheus expression for the affected dashboards

      a. "Kubernetes / Compute Resources / Cluster" dashboard
      Storage IO
      IOPS(Reads+Writes) expression:
      ceil(sum by(namespace) (rate(container_fs_reads_total{container!="", cluster=""}[5m]) + rate(container_fs_writes_total{container!="", cluster=""}[5m])))
      
      ThroughPut(Read+Write) expression:
      sum by(namespace) (rate(container_fs_reads_bytes_total{container!="", cluster=""}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster=""}[5m]))
      
      Storage IO - Distribution
      Current Storage IO expression:
      1. sum by(namespace) (rate(container_fs_reads_total{container!="", cluster=""}[5m]))
      2. sum by(namespace) (rate(container_fs_writes_total{container!="", cluster=""}[5m]))
      3. sum by(namespace) (rate(container_fs_reads_total{container!="", cluster=""}[5m]) + rate(container_fs_writes_total{container!="", cluster=""}[5m]))
      4. sum by(namespace) (rate(container_fs_reads_bytes_total{container!="", cluster=""}[5m]))
      5. sum by(namespace) (rate(container_fs_writes_bytes_total{container!="", cluster=""}[5m]))
      6. sum by(namespace) (rate(container_fs_reads_bytes_total{container!="", cluster=""}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster=""}[5m]))
      
      b. "Kubernetes / Compute Resources / Namespace (Pods)" dashboard, selected openshift-apiserver in this bug
      Storage IO
      IOPS(Reads+Writes) expression:
      ceil(sum by(pod) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]) + rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m])))
      
      ThroughPut(Read+Write) expression:
      sum by(pod) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      
      Storage IO - Distribution
      Current Storage IO expression:
      1. sum by(pod) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      2. sum by(pod) (rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      3. sum by(pod) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]) + rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      4. sum by(pod) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      5. sum by(pod) (rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      6. sum by(pod) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver"}[5m]))
      
      c. "Kubernetes / Compute Resources / Pod" dashboard, selected one pod from openshift-apiserver in this bug
      Storage IO - Distribution(Containers)
      IOPS(Reads+Writes) expression:
      ceil(sum by(container) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]) + rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m])))
      
      ThroughPut(Read+Write) expression:
      sum by(container) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      
      Storage IO - Distribution
      Current Storage IO expression:
      1. sum by(container) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      2. sum by(container) (rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      3. sum by(container) (rate(container_fs_reads_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]) + rate(container_fs_writes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      4. sum by(container) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      5. sum by(container) (rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]))
      6. sum by(container) (rate(container_fs_reads_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m]) + rate(container_fs_writes_bytes_total{container!="", cluster="",namespace="openshift-apiserver", pod="apiserver-5d6c8bbbc8-2hwwj"}[5m])) 

      reason is in 4.9, container label for container_fs_.* metrics is dropped,

      $ oc -n openshift-monitoring get servicemonitor kubelet -oyaml
      ...
          - action: drop
            regex: ;(container_fs_.*);.+
            sourceLabels:
            - __tmp_keep_metric
            - __name__
            - container

      example

      $ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=container_fs_reads_bytes_total' | jq
      .... 
            {
              "metric": {
                "__name__": "container_fs_reads_bytes_total",
                "device": "/dev/nvme0n1",
                "endpoint": "https-metrics",
                "id": "/system.slice/systemd-udevd.service",
                "instance": "10.0.204.243:10250",
                "job": "kubelet",
                "metrics_path": "/metrics/cadvisor",
                "node": "ip-10-0-204-243.us-east-2.compute.internal",
                "service": "kubelet"
              },
              "value": [
                1672995939.851,
                "7093760"
              ]
            },

      Version-Release number of selected component (if applicable):

      4.9.0-0.nightly-2023-01-03-103749

      How reproducible:

      always

      Steps to Reproduce:

      1. check monitoring dashboards
      2.
      3.
      

      Actual results:

      Metrics section for resource consumption are not visible and shows as datapoints not found.

      Expected results:

      show result

      Additional info:

       

      Attachments

        Activity

          People

            janantha@redhat.com Jayapriya Pai
            juzhao@redhat.com Junqi Zhao
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: