Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Machine Config Operator
Labels:
- mco-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Moderate
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

A customer has defined an alert for mcd_state degraded and noticed that the alert does not clear

Version-Release number of selected component (if applicable):

4.13

How reproducible:

I have not been able to reproduce this but have captured the state in logs and command output

Steps to Reproduce:

curling the machine config daemonset pod from a prometheus instance: 


➜  ~ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k https://<$NODEIP>:9001/metrics -H "Authorization: Bearer $TOKEN" | grep mcd_state
  
# HELP mcd_state state of daemon on specified node
# TYPE mcd_state gauge
mcd_state{reason="",state="Done"} 1.708730448680743e+09
mcd_state{reason="",state="Working"} 1.7087304270475767e+09
mcd_state{reason="failed to drain node: $NODE after 1 hour. Please see machine-config-controller logs for more information",state="Degraded"} 1.7067804848773804e+09
 

we can see the metric reported 3 times, once with the state "Done", once with "Working" and once with "Degraded" including a reason. 

asking the customer to restart the machine config daemonset pod results in the metric showing just one state. 

# HELP mcd_state state of daemon on specified node
# TYPE mcd_state gauge
mcd_state{reason="",state="Done"} 1.7091117838342226e+09

Actual results:

an alert configured against `query: mcd_state{state="Degraded"}` never clears

Expected results:

The metric returns a clearly defined result

Additional info:

Assignee:: David Joshy

Reporter:: Nigel Smith

QA Contact:: Sergio Regidor de la Rosa

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/02/28 3:27 PM

Updated:: 2025/09/13 3:37 AM

Resolved:: 2024/03/07 2:06 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates