Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-52735

Metric kubevirt_hco_system_health_status get value 3

XMLWordPrintable

    • CNV I/U Operators Sprint 264, CNV I/U Operators Sprint 266
    • None

      Description of problem:

      While triaging failure of kubemacpooldown test, I found out that the test is failing because the metric kubevirt_hco_system_health_status that should get the value 2 because it is a critical one, get the value 3, which is not like the design of the metric, not mentioned in the documentation and the UI doesn't know to present it because the values should only 0/1/2.
      
      Metrics should report only -  healthy (0), warning (1), or error (2)
      The metric is used to asses the Operator health and causes a bug that is very visible in the also in the UI for the Operator heath in the OCP Overview page.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      100%

      Steps to Reproduce:

      1.Scale down “cluster-network-addons-operator” deployment to zero (Otherwise, it will revert the changes on kubemacpool-mac-controller-manager)
      
      oc -n openshift-cnv scale deployment cluster-network-addons-operator --replicas=0 
      
      2.Scale down “kubemacpool-mac-controller-manager” deployment to zero
      
      oc -n openshift-cnv scale deployment kubemacpool-mac-controller-manager --replicas=0
      
      Observe the alert
      check severity=critical
      check operator_health_impact=critical
      
      3.Check for kubevirt_hyperconverged_operator_health_status metric value 

      Actual results:

      3

      Expected results:

      2

      Additional info:

      This bug affects 4.18

              alitman@redhat.com Aviv Litman
              rh-ee-orevah Ohad Revah
              Ohad Revah Ohad Revah
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: