Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46416

Missing metric - example: cluster_autoscaler_failed_scale_ups_total

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, some cluster autoscaler metrics were not initialized, and therefore were not available. With this release, these metrics are initialized and available. (link:https://issues.redhat.com/browse/OCPBUGS-46416[*OCPBUGS-46416*])
      Show
      * Previously, some cluster autoscaler metrics were not initialized, and therefore were not available. With this release, these metrics are initialized and available. (link: https://issues.redhat.com/browse/OCPBUGS-46416 [* OCPBUGS-46416 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-25852. The following is the description of the original issue:
      โ€”
      Description of problem:

      Missing metrics - example: cluster_autoscaler_failed_scale_ups_total 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always 

      Steps to Reproduce:

      #curl the autoscalers metrics endpoint: 
      
      $ oc exec deployment/cluster-autoscaler-default -- curl -s http://localhost:8085/metrics | grep cluster_autoscaler_failed_scale_ups_total 
          

      Actual results:

      the metrics does not return a value until an event has happened   

      Expected results:

      The metric counter should be initialized at start up providing a zero value

      Additional info:

      I have been through the file: 
      
      https://raw.githubusercontent.com/openshift/kubernetes-autoscaler/master/cluster-autoscaler/metrics/metrics.go 
      
      and checked off the metrics that do not appear when scraping the metrics endpoint straight after deployment. 
      
      the following metrics are in metrics.go but are missing from the scrape
      
      ~~~
      node_group_min_count
      node_group_max_count
      pending_node_deletions
      errors_total
      scaled_up_gpu_nodes_total
      failed_scale_ups_total
      failed_gpu_scale_ups_total
      scaled_down_nodes_total
      scaled_down_gpu_nodes_total
      unremovable_nodes_count 
      skipped_scale_events_count
      ~~~

       

            [OCPBUGS-46416] Missing metric - example: cluster_autoscaler_failed_scale_ups_total

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:6122

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:6122

            Validated looks good 

            adding pre-merge-tested label

            miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get clusterversion
            NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.18.0-0.ci.test-2024-12-30-050338-ci-ln-26g5crb-latest   True        False         13m     Cluster version is 4.18.0-0.ci.test-2024-12-30-050338-ci-ln-26g5crb-latest
            miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc create -f cas.yaml 
            clusterautoscaler.autoscaling.openshift.io/default created
            miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get clusterautoscaler
            NAME      AGE
            default   5s
            miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get servicemonitors
            NAME                          AGE
            cluster-autoscaler-default    15s
            cluster-autoscaler-operator   34m
            machine-api-controllers       34m
            machine-api-operator          34m
             

            Milind Yadav added a comment - Validated looks good  adding pre-merge-tested label miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get clusterversion NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS version   4.18.0-0.ci.test-2024-12-30-050338-ci-ln-26g5crb-latest   True        False         13m     Cluster version is 4.18.0-0.ci.test-2024-12-30-050338-ci-ln-26g5crb-latest miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc create -f cas.yaml  clusterautoscaler.autoscaling.openshift.io/ default created miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get clusterautoscaler NAME      AGE default   5s miyadav@miyadav-thinkpadx1carbongen8:~/cas$ oc get servicemonitors NAME                          AGE cluster-autoscaler- default    15s cluster-autoscaler- operator   34m machine-api-controllers       34m machine-api- operator          34m

              rh-ee-tbarberb Theo Barber-Bany
              openshift-crt-jira-prow OpenShift Prow Bot
              Milind Yadav Milind Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: