Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Monitoring
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


When a node is marked down for maintenance, such as with the node maintenance operator, KubeDaemonSetMisScheduled and KubeDaemonSetRolloutStuck alerts begin firing for all platform "openshift-*" daemonsets.

For a cluster with a lot of Red Hat operators, this can reach 20 or more alerts of each type (40 in total).

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.18/assets/control-plane/prometheus-rule.yaml#L116

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.18/assets/control-plane/prometheus-rule.yaml#L167

When a cluster is put into maintenance, the operator does add a new taint:
{"effect":"NoSchedule","key":"medik8s.io/drain"}

We should factor a node being in maintenance / cordoned / SchedulingDisabled when it is in Ready state for the purpose of alerting.

Version-Release number of selected component (if applicable):

4.18.15

How reproducible:

Always

Steps to Reproduce:

    1. OpenShift 4.18 Cluster
    2. Install Node Maintenance Operator
    3. Put a node into maintenance:

apiVersion: nodemaintenance.medik8s.io/v1beta1
kind: NodeMaintenance
metadata:
  name: nodemaintenance-cr
spec:
  nodeName: worker-0.ocp418shared.tamlab.brq2.redhat.com
  reason: "node maint"

Actual results:

A lot of extra alerts that are not providing actionable information.

Expected results:

Additional info:

Assignee:: Jan Fajerski

Reporter:: Matt Robson

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/06/09 5:29 PM

Updated:: 2025/10/10 1:23 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates