Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Node Healthcheck
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Steps to reproduce:
{
"metric":

{ "__name__": "nodehealthcheck_ongoing_remediation", "container": "kube-rbac-proxy", "endpoint": "https", "exported_instance": "x.x.x.x:yy", "exported_job": "node-healthcheck-controller-manager-metrics-service", "exported_namespace": "openshift-workload-availability", "instance": "my-redacted-instance:my-port", "job": "prometheus-federate-job", "name": "<node-name>", // This is the node name "namespace": "openshift-workload-availability", "pod": "node-healthcheck-controller-manager-7f6475d7d4-zvrxw", "prometheus": "openshift-monitoring/k8s", "prometheus_replica": "prometheus-k8s-0", "remediation": "SelfNodeRemediation", "service": "node-healthcheck-controller-manager-metrics-service" }

,
"value": [
1771598863.253,
"1" // Value is "1" for Node name
]
},
{
"metric":

{ "__name__": "nodehealthcheck_ongoing_remediation", "container": "kube-rbac-proxy", "endpoint": "https", "exported_instance": "x.x.x.x:yy", "exported_job": "node-healthcheck-controller-manager-metrics-service", "exported_namespace": "openshift-workload-availability", "instance": "my-redacted-instance:my-port", "job": "prometheus-federate", "name": "<node-name>-mh7vw", // This is the remediation CR "namespace": "openshift-workload-availability", "pod": "node-healthcheck-controller-manager-7f6475d7d4-zvrxw", "prometheus": "openshift-monitoring/k8s", "prometheus_replica": "prometheus-k8s-0", "remediation": "SelfNodeRemediation", "service": "node-healthcheck-controller-manager-metrics-service" }

,
"value": [
1771598863.253,
"0" // Value is "0" for remediation CR
]
}

Looks like the issue was introduced in this PR: #231 , when
metrics.ObserveNodeHealthCheckRemediationDeleted(node.GetName(), remediationCR.GetNamespace(), remediationCR.GetKind())

was moved to deleteRemediationCR method:
metrics.ObserveNodeHealthCheckRemediationDeleted(remediationCR.GetName(), remediationCR.GetNamespace(), remediationCR.GetKind())

links to

medik8s/node-healthcheck-operator#390: nodehealthcheck_ongoing_remediation metric stuck at 1 after remediation completes due to mismatched label sets

Assignee:: Unassigned

Reporter:: Michael Habash

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/02/22 9:38 AM

Updated:: 2026/03/02 4:47 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty