Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5346

"CollectorHighErrorRate" alert should not fire for "connection closed" warnings

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • Logging 5.8.2
    • Log Collection
    • False
    • None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Moderate

      Description of problem:

      We are seeing that "CollectorHighErrorRate" alerts for Vector are being triggered for messages like the following:

       

      2024-01-17T15:09:15.991612Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true
      2024-01-17T15:09:15.991676Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true
      2024-01-17T15:09:53.249900Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true
      2024-01-17T15:09:53.249944Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true
      2024-01-17T15:11:53.272709Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::internal_events::http_client: HTTP error. error=connection closed before message completed error_type="request_failed" stage="processing" internal_log_rate_limit=true
      2024-01-17T15:11:53.272761Z WARN sink{component_kind="sink" component_id=splunk_infra_receiver component_type=splunk_hec_logs component_name=splunk_infra_receiver}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection closed before message completed internal_log_rate_limit=true 

      The above are warnings and do not result in any logs being lost, as the connection is retried. As a result, the alert "CollectorHighErrorRate" should not fire in such a case. No other errors are present in the logs for Vector provided by the customer (confirmed via must-gather `grep R -v "WARN" collector////*.log`).

       

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.12

      OpenShift Logging 5.8

      How reproducible:

      On customer side

      Steps to Reproduce:

      1. Configure Log Forwarding with a sink that has a low connection timeout
      2. Observe the logs in Vector that contain "connection closed before message completed"
      3. Confirm the logs are retried and successfully delivered

      Actual results:

      Alert "CollectorHighErrorRate" is firing

      Expected results:

      Alert "CollectorHighErrorRate" is not firing for these warnings.

      Additional info:

            Unassigned Unassigned
            rhn-support-skrenger Simon Krenger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: