Multi-line logs such as stack traces give you lots of very valuable information for debugging and troubleshooting application problems. Unfortunately, it can be very difficult to correctly collect and store these for two primary reasons:
If your logs are not sent in JSON, most log management systems do not treat multi-line logs as a single event.
Different formats (e.g. stack traces for different programming languages) and different runtimes make it difficult to identify where a log message begins and ends.
Not supporting multi-line logs means troubleshooting your problems will be much harder and maybe even impossible (e.g. log lines may be stored out of order). There are two main methods for handling multi-line logs:
Log everything in the JSON format.
Use log collectors to assemble multi-line logs into a single event.
Unfortunately, there may be situations where you can’t log to JSON. For example, it might require changes to your code or logging strategies that you’re not in a position to make. Or, your environment might use a third-party logging tool that you are not able to configure to write to JSON.
As described in the previous section, assembling back multi-line logs can be difficult. We may want to expose a new configuration that allows customers to tell the log collector to look for specific patterns to indicate the beginning of a new log entry. Since most RFE cases are about stack traces, we may also want to think about providing a simplified API to only configure the programming language we should care about so that Logging can configure the log collector automatically with the appropriate patterns based on that list. A simplified API will also make it easier for us to transition customers to another log collector, if necessary.
Goal & Success
- If logs can’t be treated as JSON, OpenShift Logging should provide a configuration option to assemble multi-line logs back into a single event so that users are able to search and use them to better identify problems.
- A user must be able to configure patterns for stack traces so that the log collector joins them into a single event.
- A user should be able to configure patterns for other multi-line logs other than stack traces.
Use Case: Company X is running more than a hundred different microservices based on Java. Recently, they broke them out from different legacy monolithic applications and they are developed and owned by multiple teams, and they want to continue to use log standards they put in place a year ago to avoid rebuilding or changing more of their code base. Due to that, they are not able to use JSON. Now in production, the AppSRE team constantly hits the problem that OpenShift Logging collects and stores Java stack traces as individual log records and they are mostly out of order which makes identifying the actual problem much harder, and in many cases impossible.
- The AppSRE team configures Logging to match Java stack trace specific patterns so that all log messages matching that, are going to be stored as a single event.
- AppSRE logs into Kibana, queries a specific timeframe, and starts to see stack traces as a single event again.
Open Questions & Key Decisions (optional)
- How does CRIO treat multi-line logs?
- Additionally to stack traces, do we also want to provide a general “regex” type configuration to allow assembling any multi-line log?
- Where do we want to provide that functionality? Do we want to provide this as a more global function (regex on all incoming logs) or basically let users configure patterns for only specific apps/services based on certain criterias?
- is blocked by
LOG-843 Assemble multi-line exception stacktrace log messages