Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20179

Nodepool metric does not correctly reflect nodepool state

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Normal
    • 4.15.0
    • 4.13
    • HyperShift
    • No
    • Hypershift Sprint 244, Hypershift Sprint 245
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Hypershift Operator would report time series for node pools that no longer existed. With this release, the Hypershift Operator reports time series for node pools correctly. (link:https://issues.redhat.com/browse/OCPBUGS-20179[*OCPBUGS-20179*])

      Show
      * Previously, the Hypershift Operator would report time series for node pools that no longer existed. With this release, the Hypershift Operator reports time series for node pools correctly. (link: https://issues.redhat.com/browse/OCPBUGS-20179 [* OCPBUGS-20179 *])
    • Bug Fix
    • Done

    Description

      Description of problem:

      hypershift_nodepools_available_replicas does not properly reflect the nodepool.
      
      $ oc get nodepools -n ocm-production-12345678
      NAME              CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
      re-test-workers   re-test   2               0               False         True         4.12.35                                      Minimum availability requires 2 replicas, current 0 available
      
      Meanwhile, there are 3 hypershift_nodepools_available_replicas time series for the nodepools:
      - re-test-worker2 reporting 1
      - re-test-worker3 reporting 1
      - re-test-workers reporting 0 (accurate)
      
      The issue here is the two extra time series, which should not exist if the nodepool doesn't exist.

      Version-Release number of selected component (if applicable):

      4.12.35

      How reproducible:

      This particular cluster had its OIDC configuration along with other customer AWS account resources deleted, which might be connected to the misbehaviour of the metric.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      Adding must-gather and metric time series in the ticket

      Attachments

        Activity

          People

            ngrauss.openshift Nicolas Grauss
            cbusse.openshift Claudio Busse
            Jie Zhao Jie Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: