-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
Logging 5.8.2
-
False
-
None
-
False
-
NEW
-
NEW
-
Bug Fix
-
-
-
Moderate
Description of problem:
We are seeing that "CollectorHighErrorRate" alerts for Vector are being triggered for messages like the following:
2024-01-17T15:09:15.991612Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true 2024-01-17T15:09:15.991676Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true 2024-01-17T15:09:53.249900Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true 2024-01-17T15:09:53.249944Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true 2024-01-17T15:11:53.272709Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true 2024-01-17T15:11:53.272761Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true
The above are warnings and do not result in any logs being lost, as the connection is retried. As a result, the alert "CollectorHighErrorRate" should not fire in such a case. No other errors are present in the logs for Vector provided by the customer (confirmed via must-gather `grep R -v "WARN" collector////*.log`).
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12
OpenShift Logging 5.8
How reproducible:
On customer side
Steps to Reproduce:
- Configure Log Forwarding with a sink that has a low connection timeout
- Observe the logs in Vector that contain "connection closed before message completed"
- Confirm the logs are retried and successfully delivered
Actual results:
Alert "CollectorHighErrorRate" is firing
Expected results:
Alert "CollectorHighErrorRate" is not firing for these warnings.
Additional info:
- Issue was previously discussed here: https://redhat-internal.slack.com/archives/CB3HXM2QK/p1705660299265829