-
Bug
-
Resolution: Done-Errata
-
Major
-
Logging 5.6.0
-
False
-
None
-
False
-
NEW
-
NEW
-
The CollectorHighErrorRate and CollectorVeryHighErrorRate are removed in the logging 6.0 release and may be re-introduced in a future release
-
Bug Fix
-
-
-
Log Collection - Sprint 235, Log Collection - Sprint 236, Log Collection - Sprint 237, Log Collection - Sprint 238, Log Collection - Sprint 239, Log Collection - Sprint 240, Log Collection - Sprint 241, Log Collection - Sprint 242, Log Collection - Sprint 243, Log Collection - Sprint 244, Log Collection - Sprint 245, Log Collection - Sprint 258
Description of problem:
Alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not firing even if Vector component errors are high.
Version-Release number of selected component (if applicable):
cluster-logging.v5.6.0
elasticsearch-operator.v5.6.0
How reproducible:
Always
Steps to Reproduce:
*Create a ClusterLogging instance with Vector as collector and ES as default logstore.
*Delete ES pods to generate Vector errors.
while true; do oc delete pod $( oc get pod -l component=elasticsearch | egrep elasticsearch | awk '{print $1}' ); done
*Wait for 15 - 20 minutes and check vector_component_errors_total and check that the alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not triggered.
sum(irate(vector_component_errors_total[2m])) 1.6332200316965575
Other steps tried to generate the errors.
*Create a ClusterLogForwarder instance to send logs to a non-existent external ES instance.
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: es-created-by-user
type: elasticsearch
url: 'http://elasticsearch-server.aosqe-es.svc:9200'
pipelines:
- name: forward-to-external-es
inputRefs:
- infrastructure
- application
- audit
outputRefs:
- es-created-by-user
*Create a ClusterLogging instance.
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" collection: logs: type: "vector" vector: {}
*Wait for 15 - 20 minutes and check vector_component_errors_total and check that the alerts CollectorHighErrorRate and CollectorVeryHighErrorRate are not triggered.
Additional info:
The Metrics expression doesn't return any value.
100 * (sum by(pod, instance) (rate(vector_component_errors_total[2m])) / sum by(pod, instance) (rate(vector_component_received_events_total[2m]))) > 10
This issue is also reported for FluentdHighErrorRate and FluentdVeryHighErrorRate
https://issues.redhat.com/browse/LOG-2457
It seems we cant triggers these alerts reliably or we are simulating the scenario wrong.
- is related to
-
LOG-2467 Configure lokistack-gateway to honor the global tlsSecurityProfile
- Closed
- links to
-
RHBA-2024:137361 Logging for Red Hat OpenShift - 6.0.0
- mentioned on