Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: Logging 6.3.2
Affects Version/s: Logging 5.8.z, Logging 5.9.z, Logging 6.0.z, Logging 6.1.z, Logging 6.2.z, Logging 6.3.z, Logging 6.4.z
Component/s: Log Collection
Labels:
- devel_ack+

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:
Before this fix, Vector could not recover from silently closed TCP connections. With this fix, Vector now uses keepalive probes to detect and automatically re-establish unresponsive TCP connections.
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
Logging - Sprint 277, Logging - Sprint 278
Severity:
Critical

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When it's configured to log forward to syslog (socket sink) and the TCP session is killed/dropped for any reason (firewall/load balancer/etc), it's not observed any error in Vector, but logs are not sent and Vector is not able to log forward until the collector pods are restarted

This issue is confirmed in upstream in https://github.com/vectordotdev/vector/issues/4933.

Version-Release number of selected component (if applicable):

All Logging versions using Vector and syslog output (socket sink using TCP)

How reproducible:

Detailed in the upstream bug.

Steps to Reproduce:

The steps for reproducing it are detailed in the upstream issue https://github.com/vectordotdev/vector/issues/4933#issuecomment-1185617943

Actual results:

Vector stops of log forwarding logs with half closed network connection and not retrying until the collector pods are restarted and new TCP connections are created

Expected results:

Vector is aware that the TCP connection doesn't work and it creates a new TCP Connections.

Additional info:

Not tested, but the same should impact to other sinks using TCP protocol as it could be Elasticsearch as Vector has not implemented TCP Keepalive

Workaround

Restart the collector pods for creating new TCP Communications or:

1. Set the variables

    $ cr="collector"
    $ ns="openshift-logging"

2. Move to "Unmanaged" the Cluster Logging CR

    $ oc -n $ns patch obsclf/$cr -n $ns -p '{"spec":{"managementState": "Unmanaged"}}' --type=merge
    clusterlogforwarder.observability.openshift.io/collector patched

3. Extract the collector configmap. This extract the files "run-vector.sh" and "vector.toml"

    $ mkdir config
    $ cd config/
    $ oc extract cm/$cr-config -n $ns
    run-vector.sh
    vector.toml

4. Modify the "vector.toml"

    $ servers=$(oc get obsclf/$cr -n $ns -o jsonpath='\{.spec.outputs[?(.type=="syslog")].name}')
    $ for server in $(echo $servers|tr "-" "_"); do echo $server; sed -i  "/sinks\.output_$server\]/a keepalive.time_secs = 60" vector.toml ; done

5. Delete the current Vector configuration

    $ oc delete cm $cr-config -n $ns

6. Recreate the configmap

    $ oc create configmap $cr --from-file=run-vector.sh --from-file=vector.toml -n $ns

7. Restart the collector pods for using the new configuration

    $ oc delete pods -l app.kubernetes.io/component=collector -n $ns

clones

LOG-7502 Vector stops of log forwarding when the TCP session is killed

Closed

is cloned by

LOG-7753 [release-6.2] Vector stops of log forwarding when the TCP session is killed

Verified

links to

openshift/cluster-logging-operator#3112: [release-6.3] LOG-7502: Vector stops log forwarding when the TCP session is killed

Assignee:: Calvin Lee

Reporter:: Oscar Casal Sanchez

QA Contact:: Kabir Bharti

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/18 9:10 PM

Updated:: 2025/10/08 7:26 PM

Resolved:: 2025/10/08 7:25 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Workaround

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates