Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:

Activity Type:
None
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Sprint:
None

1. Proposed title of this feature request
Promethues rule for cronjobs in FailedNeedsStart condition

2. What is the nature and description of the request?
When cronjobs fail a certain number of times (100 or more), the cronjob enters a permanent failed state. This can sometimes occur when the cluster is shutdown for an extended period, or can also occur if there are temporary issues in the cluster. Customer is requesting our alertmanager fire an alert when an infrastructure cronjob is in this state. An example message you might see in the events:

The elasticsearch index management cronjobs fail after a maintenance window brought down the nodes for 72 hours. The event in the cronjob description is: Warning FailedNeedsStart 69s (x24505 over 2d20h) cronjob-controller Cannot determine if job needs to be started: too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.

3. Why does the customer need this? (List the business requirements here)
Reliability and uptime.

4. List any affected packages or components.
Any cronjobs for components supported by Red Hat.

This is a known kubernetes behavior, for example:

clones

RFE-1903 AlertManager alert for cronjobs in FailedNeedsStart condition

Refinement

is related to

RFE-4595 Alert for failed CronJobs

Approved

Assignee:: Unassigned

Reporter:: Roger Florén

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/03/27 9:29 AM

Updated:: 2025/09/13 9:19 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates