Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30043

mcd_state metric showing 3 states concurrently

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • Moderate
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A customer has defined an alert for mcd_state degraded and noticed that the alert does not clear     

      Version-Release number of selected component (if applicable):

      4.13   

      How reproducible:

      I have not been able to reproduce this but have captured the state in logs and command output    

      Steps to Reproduce:

      curling the machine config daemonset pod from a prometheus instance: 
      
      
      ➜  ~ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k https://<$NODEIP>:9001/metrics -H "Authorization: Bearer $TOKEN" | grep mcd_state
        
      # HELP mcd_state state of daemon on specified node
      # TYPE mcd_state gauge
      mcd_state{reason="",state="Done"} 1.708730448680743e+09
      mcd_state{reason="",state="Working"} 1.7087304270475767e+09
      mcd_state{reason="failed to drain node: $NODE after 1 hour. Please see machine-config-controller logs for more information",state="Degraded"} 1.7067804848773804e+09
       
      
      we can see the metric reported 3 times, once with the state "Done", once with "Working" and once with "Degraded" including a reason. 
      
      asking the customer to restart the machine config daemonset pod results in the metric showing just one state. 
      
      # HELP mcd_state state of daemon on specified node
      # TYPE mcd_state gauge
      mcd_state{reason="",state="Done"} 1.7091117838342226e+09
      

       

      Actual results:

      an alert configured against `query: mcd_state{state="Degraded"}` never clears    

      Expected results:

      The metric returns a clearly defined result     

      Additional info:

          

              djoshy David Joshy
              rhn-support-nigsmith Nigel Smith
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: