-
Bug
-
Resolution: Done-Errata
-
Major
-
Logging 5.5.z
-
False
-
None
-
False
-
NEW
-
VERIFIED
-
-
Bug Fix
-
High
-
-
-
Log Collection - Sprint 238, Log Collection - Sprint 239
-
Important
Description of problem:
Randomly, vector pods allocated in the worker nodes go into a panic state and become unresponsive.
After restarting the pods manually, the logs are successfully processed again.
The log in question that we can see in Vector pods is:
2023-06-26T03:14:23.503804380Z 2023-06-26T03:14:23.487362Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=Failed to make HTTP(S) request: error trying to connect: dns error: failed to lookup address information: Name or service not known component_kind="sink" component_type="elasticsearch" component_id=external_elasticsearch_ecp component_name=external_elasticsearch_ecp 2023-06-26T03:14:28.566100553Z 2023-06-26T03:14:28.566022Z ERROR kube_client::client::builder: failed with error error trying to connect: dns error: failed to lookup address information: Name or service not known 2023-06-26T03:14:28.566100553Z thread 'vector-worker' panicked at 'all branches are disabled and there is no else branch', src/kubernetes/reflector.rs:26:9 2023-06-26T03:14:28.566152028Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace 2023-06-26T03:14:33.615156912Z 2023-06-26T03:14:33.615108Z ERROR kube_client::client::builder: failed with error error trying to connect: dns error: failed to lookup address information: Name or service not known 2023-06-26T03:14:33.615156912Z thread 'vector-worker' panicked at 'all branches are disabled and there is no else branch', src/kubernetes/reflector.rs:26:9 2023-06-26T03:14:33.844423262Z 2023-06-26T03:14:33.844348Z ERROR sink{component_kind="sink" component_id=default component_type=elasticsearch component_name=default}: vector::internal_events::http_client: HTTP error. error=error trying to connect: dns error: failed to lookup address information: Name or service not known error_type="request_failed" stage="processing" 2023-06-26T03:14:33.844477233Z 2023-06-26T03:14:33.844433Z WARN sink{component_kind="sink" component_id=default component_type=elasticsearch component_name=default}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: error trying to connect: dns error: failed to lookup address information: Name or service not known
Version-Release number of selected component (if applicable):
RHOL 5.6.7
RHOL 5.7.2
How reproducible:
-
Actual results:
Vector pods go into a panic state and losing logs.
Expected results:
Vector pods working properly
Additional info:
After doing some checks, I have found this bug --> https://github.com/vectordotdev/vector/issues/12245
And it was solved in Vector release 0.21.0--> https://vector.dev/releases/0.21.0/#known-issues
RHOL 5.6 and 5.7 uses v0.20.1 release If I am not wrong -->https://github.com/ViaQ/vector/tree/release-5.6 and https://github.com/ViaQ/vector/tree/release-5.7
For next RHOL 5.8 I can see that the Vector release will be v0.28.1 -->https://github.com/ViaQ/vector/tree/release-5.8