Uploaded image for project: 'OpenShift Windows Containers'
  1. OpenShift Windows Containers
  2. WINC-568

Investigate adding console graphs for Windows workloads

    • BU Product Work
    • 3
    • False
    • False
    • Undefined
    • WINC - Sprint 246

      USER STORY:
      As an OpenShift admin, I want see the console graphs for Windows pods so that the user experience wrt windows nodes is at par with Linux Nodes

      DESCRIPTION:
      We want to export our own Prometheus rules with the queries required to display pod graphs. We could not rename the Windows metrics to match with existing console queries for the pods. This is because, on Linux side the pod metrics are taken from cAdvisor. We need to investigate if we could get metrics from `windows-exporter` or get support from cAdvisor for the required pod queries

       

      ACCEPTANCE CRITERIA:
      1. Set of queries to be added to PrometheusRule object that display pod graphs

      2. Story that captures the implementation of the above queries in WMCO

      ENGINEERING DETAILS:
      sig-windows thread for getting container metrics

      Discussion with monitoring team

      Issue for Windows container support in upstream cadvisor

            [WINC-568] Investigate adding console graphs for Windows workloads

            Calling an end to the investigation. Created WINC-1181 and WINC-1180 in order to track the work done for this.

            (old account) Sebastian Soto added a comment - Calling an end to the investigation. Created WINC-1181 and WINC-1180 in order to track the work done for this.

             

            Update on the network metrics:

            Using windows exporter 0.24.0 I am getting `No datapoints found` for the query made by the console:
            (sum(irate(container_network_receive_bytes_total{pod='win-webserver-685cd6c5cc-8298l'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) (pod_network_name_info)

            There is data returned from the query `irate(container_network_receive_bytes_total{pod='windows-machine-config-operator-7c8bcc7b64-sjqxw'}[5m])`
            Which makes me believe the error is due to pod_network_name_info not having data for the Windows pods I am looking at.

            I'm confirming that by checking in the namespace the workloads are deployed to via the query: pod_network_name_info{namespace="openshift-windows-machine-config-operator"}
            I only see metrics for the Linux pods in the namespace.

            Looking into this it seems like these metrics are coming from https://github.com/openshift/network-metrics-daemon which runs on each Linux node, and creates a metric for applicable pods running on the node.

             

            (old account) Sebastian Soto added a comment -   Update on the network metrics: Using windows exporter 0.24.0 I am getting `No datapoints found` for the query made by the console: (sum(irate(container_network_receive_bytes_total{pod='win-webserver-685cd6c5cc-8298l'} [5m] )) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) (pod_network_name_info) There is data returned from the query `irate(container_network_receive_bytes_total{pod='windows-machine-config-operator-7c8bcc7b64-sjqxw'} [5m] )` Which makes me believe the error is due to pod_network_name_info not having data for the Windows pods I am looking at. I'm confirming that by checking in the namespace the workloads are deployed to via the query: pod_network_name_info{namespace="openshift-windows-machine-config-operator"} I only see metrics for the Linux pods in the namespace. Looking into this it seems like these metrics are coming from https://github.com/openshift/network-metrics-daemon which runs on each Linux node, and creates a metric for applicable pods running on the node.  

            Some finding that came as a result of a customer issue investigation:

            Confirmed network and storage data are available using the 0.24.0 windows_exporter.
            There is no direct parallel for the storage graphs, which use filesystem usage. There is read/write data, but thats not helpful for the graphs part.
            Unfortunately the network data is a little different as well. Its not super obvious to me how to make it work for the pod metrics graph, as the transformations that need to be made for the data to show on the graph:
            
            (sum(irate(container_network_receive_bytes_total{pod='windows-machine-config-operator-5d474f5796-5hqcp', namespace='openshift-windows-machine-config-operator'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) ( pod_network_name_info )
            
            isn't type-compatible with the transformation I need to do to add the pod information to the data:
            (windows_container_network_receive_bytes_total * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""})

            (old account) Sebastian Soto added a comment - Some finding that came as a result of a customer issue investigation: Confirmed network and storage data are available using the 0.24.0 windows_exporter. There is no direct parallel for the storage graphs, which use filesystem usage. There is read/write data, but thats not helpful for the graphs part. Unfortunately the network data is a little different as well. Its not super obvious to me how to make it work for the pod metrics graph, as the transformations that need to be made for the data to show on the graph: (sum(irate(container_network_receive_bytes_total{pod= 'windows-machine-config- operator -5d474f5796-5hqcp' , namespace= 'openshift-windows-machine-config- operator ' }[5m])) by (pod, namespace, interface )) + on(namespace,pod, interface ) group_left(network_name) ( pod_network_name_info ) isn't type-compatible with the transformation I need to do to add the pod information to the data: (windows_container_network_receive_bytes_total * on(container_id) group_left(namespace, pod, container) kube_pod_container_info{container_id!=""})

            Converting issue type to Spike based on label 'Spike'

            Russell Teague added a comment - Converting issue type to Spike based on label 'Spike'

            WINC-722 needs to be completed before working on this story.

            Mansi Kulkarni (Inactive) added a comment - WINC-722 needs to be completed before working on this story.

              rh-ee-ssoto Sebastian Soto
              vhire Vaishnavi Hire
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: