Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: Logging 6.0.0
Affects Version/s: Logging 5.6.0
Component/s: Log Collection
Labels:
- devel_ack+
- need-info

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:
The CollectorHighErrorRate and CollectorVeryHighErrorRate are removed in the logging 6.0 release and may be re-introduced in a future release
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 235, Log Collection - Sprint 236, Log Collection - Sprint 237, Log Collection - Sprint 238, Log Collection - Sprint 239, Log Collection - Sprint 240, Log Collection - Sprint 241, Log Collection - Sprint 242, Log Collection - Sprint 243, Log Collection - Sprint 244, Log Collection - Sprint 245, Log Collection - Sprint 258

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not firing even if Vector component errors are high.

Version-Release number of selected component (if applicable):

cluster-logging.v5.6.0

elasticsearch-operator.v5.6.0

How reproducible:

Always

Steps to Reproduce:

*Create a ClusterLogging instance with Vector as collector and ES as default logstore.

*Delete ES pods to generate Vector errors.

while true; do oc delete pod $( oc get pod -l component=elasticsearch | egrep elasticsearch | awk '{print $1}' ); done

*Wait for 15 - 20 minutes and check vector_component_errors_total and check that the alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not triggered.

sum(irate(vector_component_errors_total[2m]))
1.6332200316965575

Other steps tried to generate the errors.

*Create a ClusterLogForwarder instance to send logs to a non-existent external ES instance.

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
  - name: es-created-by-user
    type: elasticsearch
    url: 'http://elasticsearch-server.aosqe-es.svc:9200'
  pipelines:
  - name: forward-to-external-es
    inputRefs:
    - infrastructure
    - application
    - audit
    outputRefs:
    - es-created-by-user

*Create a ClusterLogging instance.

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  collection:
    logs:
      type: "vector"  
      vector: {}

*Wait for 15 - 20 minutes and check vector_component_errors_total and check that the alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not triggered.

Additional info:

The Metrics expression doesn't return any value.

100 * (sum by(pod, instance) (rate(vector_component_errors_total[2m])) / sum by(pod, instance) (rate(vector_component_received_events_total[2m]))) > 10

This issue is also reported for FluentdHighErrorRate and FluentdVeryHighErrorRate

https://issues.redhat.com/browse/LOG-2457

It seems we cant triggers these alerts reliably or we are simulating the scenario wrong.

is related to

LOG-2467 Configure lokistack-gateway to honor the global tlsSecurityProfile

Closed

links to

openshift/cluster-logging-operator#1991: LOG-3432: Replace existing rules with rate function and add new alert for success log sync events

openshift/cluster-logging-operator#2777: LOG-3432: Remove collector error alerts

RHBA-2024:137361 Logging for Red Hat OpenShift - 6.0.0

mentioned on

Merge request - Updated US source to: 82c9caa LOG-5907: add trace log level detection

Assignee:: Jeffrey Cantrill

Reporter:: Ishwar Kanse

QA Contact:: Anping Li

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/12/15 10:12 AM

Updated:: 2024/09/24 3:25 PM

Resolved:: 2024/09/24 3:25 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates