[LOG-1918] Alert `FluentdNodeDown` always firing - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: Logging 5.4.0
Affects Version/s: Logging 5.3.0
Component/s: Log Collection
Labels:
- devel_ack+
- rn-done-resolved

Blocked:
False
Ready:
False
Epic Link:
Deploy Vector Collector as alternate Alpha offering
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:
Before this update, a name change of the deployed collector in the 5.3 release caused the alert 'fluentnodedown' to generate.
Market:

Sprint:
Logging (Core) - Sprint 209

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

The alert `FluentdNodeDown` is always firing. The rule is:

    - alert: FluentdNodeDown
      annotations:
        message: Prometheus could not scrape fluentd {{ $labels.instance }} for more
          than 10m.
        summary: Fluentd cannot be scraped
      expr: |
        absent(up{job="fluentd"} == 1)
      for: 10m
      labels:
        service: fluentd
        severity: critical

In 5.3, the collector name is changed from `fluentd` to `collector`, maybe the expr should be `absent(up{job="collector"} == 1)`

Version-Release number of selected component (if applicable):

cluster-logging.5.3.0-46

How reproducible:

Always

Steps to Reproduce:
1. deploy logging 5.3
2. check alerts in openshift console
3.

Actual results:

Expected results:

Additional info:

links to

openshift/cluster-logging-operator#1239: LOG-1918: Fix the collector alert job name

Assignee:: Jeffrey Cantrill

Reporter:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/11/04 5:13 AM

Updated:: 2022/04/08 1:07 PM

Resolved:: 2021/11/05 3:17 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates