[OBSDA-482] Enable option for debugging Vector for troubleshooting

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: Logging 6.0
Affects Version/s: Logging 5.7, Logging 5.8
Component/s: Log Collection, PM Logging
Labels:
- CEE.neXT

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Priority Data:
PX Impact Score:

Intelligence Requested:
Market:

Proposed title of this feature request

Enable option for debugging Vector for troubleshooting

What is the nature and description of the request?

In fluentd, it was very easy to debug it. Some examples:

Enable debugging option (this required to move to Unmanaged)
For checking the content of the data log forwarded, it was easy to review the content of the buffer files and if the content of it was the expected: metadata
Number of buffer files on the queue indicating delays or problems to log forwarding to an specific output (probably, this can be observed in the metrics now)
Verify when done an specific configuration, for example, enabling json, really the content is parsed as expected and sent to the expected indice defined

As Vector is now implemented to work in memory, we are blind from the point of view of troubleshooting and helping to make easy to understand and explain why the things are happening.

Then, it should be desired to have an option to enable debug in an easy way for doing troubleshooting and identify possible problems and even better if this option can be in a "Managed" status.

Also, some instructions to dump the data that Vector is holding in the memory for its revision.

Why does the customer need this? (List the business requirements)

Any kind of troubleshooting related to Vector and having more details about what's happening, content log forwarded, indices where the content is sent, etc.

List any affected packages or components.

ClusterLogging - ClusterLogForwarder API

Vector

NOTES:

From grooming Dec19. Three parts to this request:
1. Add the ability to set vector's log level (debug) in a Managed status
2. Add the ability to do a "core dump" of vector data/state
3. Add feature similar to vector tap - (delivered)

The exact needed is what vector tap provides: https://vector.dev/guides/level-up/vector-tap-guide/ and also, https://vector.dev/guides/level-up/troubleshooting/

incorporates

OBSDA-620 Enable Stream (stdout/stderr) information in logs in vector

Release Pending

is blocked by

LOG-5262 Log Collection 6.0 Tech Debt

Closed

LOG-5295 Investigate tasks to support troubleshooting Vector

Closed

is related to

LOG-4556 Enable vector API and CLI

Closed

links to

[KCS] How to change the log level in the Vector in RHOCP 4

[KCS] How to debug Vector in RHOCP 4

openshift/cluster-logging-operator#2174: LOG-4556: Enable vector API and CLI

ViaQ/vector#152: LOG-4556: Enable vector API and CLI

mentioned on

Merge request - Updated 2 upstream sources

Merge request - Updated US source to: 4c21e86 LOG-4556: Enable vector API and CLI

(3 links to, 3 mentioned on)

Jeffrey Cantrill added a comment - 2024/02/21 3:08 PM

I was reviewing the pull-request from ~~LOG-4556~~ and inside is not present a way to modify it in the end-user for setting tap and being able to us

rhn-support-ocasalsa Please clarify what you mean here. My experience is anyone who is able to jump on the running container can use 'vector top'. I'm not certain what exactly there is to modify or enable for its use. Is it a separate binary not present in the production image?

Jeffrey Cantrill added a comment - 2024/02/21 3:08 PM I was reviewing the pull-request from LOG-4556 and inside is not present a way to modify it in the end-user for setting tap and being able to us rhn-support-ocasalsa Please clarify what you mean here. My experience is anyone who is able to jump on the running container can use 'vector top'. I'm not certain what exactly there is to modify or enable for its use. Is it a separate binary not present in the production image?

GitLab CEE Bot added a comment - 2023/09/26 5:26 PM

CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.8-rhel-9_upstream_2ea2330f9798009afde64ad1eb3adbde:

Updated 2 upstream sources

GitLab CEE Bot added a comment - 2023/09/26 5:26 PM CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.8-rhel-9_ upstream _2ea2330f9798009afde64ad1eb3adbde : Updated 2 upstream sources

GitLab CEE Bot added a comment - 2023/09/26 11:19 AM

CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.7-rhel-8_upstream_fb7ca74910c1e8efabe58996df2ab695:

Updated US source to: 4c21e86 ~~LOG-4556~~: Enable vector API and CLI

GitLab CEE Bot added a comment - 2023/09/26 11:19 AM CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.7-rhel-8_ upstream _fb7ca74910c1e8efabe58996df2ab695 : Updated US source to: 4c21e86 LOG-4556 : Enable vector API and CLI

GitLab CEE Bot added a comment - 2023/09/25 7:50 PM

CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.8-rhel-9_upstream_ea9378f8029c95668f7b7458e58c5929:

Updated US source to: 4c21e86 ~~LOG-4556~~: Enable vector API and CLI

GitLab CEE Bot added a comment - 2023/09/25 7:50 PM CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.8-rhel-9_ upstream _ea9378f8029c95668f7b7458e58c5929 : Updated US source to: 4c21e86 LOG-4556 : Enable vector API and CLI

Emmanuel Kasprzyk added a comment - 2023/08/25 10:38 AM - edited

> For checking the content of the data log forwarded, it was easy to review the content of the buffer files and if the content of it was the expected: metadata

Would collecting a core dump ( like the cri-o teams sometime asks: https://access.redhat.com/solutions/5488871 ) be something engineering consider in such a case ?

Emmanuel Kasprzyk added a comment - 2023/08/25 10:38 AM - edited > For checking the content of the data log forwarded, it was easy to review the content of the buffer files and if the content of it was the expected: metadata Would collecting a core dump ( like the cri-o teams sometime asks: https://access.redhat.com/solutions/5488871 ) be something engineering consider in such a case ?

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Jeffrey Cantrill added a comment - 2024/02/21 3:08 PM

Expand comment: Jeffrey Cantrill added a comment - 2024/02/21 3:08 PM

Collapse comment: GitLab CEE Bot added a comment - 2023/09/26 5:26 PM

Expand comment: GitLab CEE Bot added a comment - 2023/09/26 5:26 PM

Collapse comment: GitLab CEE Bot added a comment - 2023/09/26 11:19 AM

Expand comment: GitLab CEE Bot added a comment - 2023/09/26 11:19 AM

Collapse comment: GitLab CEE Bot added a comment - 2023/09/25 7:50 PM

Expand comment: GitLab CEE Bot added a comment - 2023/09/25 7:50 PM

Collapse comment: Emmanuel Kasprzyk added a comment - 2023/08/25 10:38 AM, Edited by Emmanuel Kasprzyk - 2023/08/25 10:39 AM

Expand comment: Emmanuel Kasprzyk added a comment - 2023/08/25 10:38 AM, Edited by Emmanuel Kasprzyk - 2023/08/25 10:39 AM

People

Dates