Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: Logging 5.1
Affects Version/s: None
Component/s: Log Collection, Log Storage
Labels:

Epic Name:
JSON Logs
Epic Status:
Done
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Release Note Text:

Hide
JSON logs can now be forwarded as JSON objects, rather than quoted strings.
For the default Elasticsearch store, JSON logs with different formats can be directed to different indices.
See the documentation for ClusterLogForwarder for details.

Show
JSON logs can now be forwarded as JSON objects, rather than quoted strings. For the default Elasticsearch store, JSON logs with different formats can be directed to different indices. See the documentation for ClusterLogForwarder for details.

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goals

Forward records to any output type that are valid JSON objects and contain the original JSON payload as a JSON object.
Provide sufficient flexibility to combine JSON payloads and additional JSON metadata to suite common cases.
Handle mixture of JSON and non-JSON log entries from the same source correctly.
Allow multi-tenant querying for fields inside a JSON document from Kibana.

Non-Goals

General-purpose JSON queries and transformations.
Recording or validating user-defined schema. The forwarder only identifies indices by name.
Support for nested JSON objects (objects inside an object) due to some Elasticsearch limitations for dynamic mapping.
Individual rollover policies for the index related indices in Elasticsearch.
Preserve JSON structure for logs generated by containers in a platform-related namespace (e.g. any openshift-*).

Motivation

When applications write structured JSON logs, consumers want to access fields of JSON log entries for indexing and other purposes. The current logging data model stores the log entry as a JSON string, not a JSON object. Consumers can't access the log entry fields without a second JSON parse of this string. The current implementation also 'flattens' labels to work around Elasticsearch limitations.

Many customers have applications deployed on OpenShift (built by over 1.000 developers) that write their log messages in nested JSON format (e.g. by using log4j). They can't tell all their developers and change all their application to not do that anymore because we can't support them.

JSON, or structured logging, is probably the most common format used in the Java world and our inability to process and store those logs does not only impact our customers but also partners such as IBM Websphere. It will essentially lessen the value of our logging solution.

Alternatives

We do not process any JSON logs and customer always need to forward JSON type logs to a custom log normalizer (e.g. logstash) to do the parsing or use third party systems that allow you to parse incoming logs before they store it (e.g. DataDog).

Since JSON logs is not just an edge case but used by a majority of our customers in one way or the other, this alternative would lessen the need for our log solution and therefore the value we provide.

Acceptance Criteria

Verify that parse only allows "JSON" or "json" and no other value.
Verify that if parse is not defined in a pipeline, there is no structured field in the resulting log record.
Verify that if you have structurdIndexKey AND structuredTypeName defined and the value for structuredTypeKey is empty or missing, use structuredTypeName as the index for the forwarded log record.
Verify that structuredTypeKey always takes precedence over structuredTypeName if both are defined and the key is present on a record.
Verify that if a record does not get assigned an index name by the structured... fields, then the resulting record has no structured field.
Verify that with structuredTypeKey: kubernetes.namespaceName, the resulting record is sent to the index corresponding to the namespace.
Verify that there is a corresponding index inside ES created to what is defined in the record.
Verify that any new index is under the ES index management (specifically rollover).
- All new indices will be under the application source rollover policy. Therefore, what you configure there applies to all other "new" indices.
Verify that you can query a specific field inside the JSON document through the Kibana UI.
Verify that someone has not access to log messages coming from different namespaces but logged by the same application.
- Deploy the same application into two different namespaces.
- Add a app=myapp label to the related pods.
- Both should now log into the same index called myapp.
- User1 who only has access to namespace1 should not have access to the JSON logs from the other namespace even if both log records are colocated inside the same index.
Verify that if you have multiple apps (e.g. 10) and only a subset (e.g. 2) are JSON logs, we only parse the subset and put them into their corresponding indices but the rest is not parsed and goes into the app index.
- We could setup multiple namespaces with multiple apps deployed. A small subset logs in JSON.
- Now we could configure a CLF CR and configure the input with parse: JSON without any filtering. At this point we try to validate everything that comes in.
- If the incoming message is not a valid JSON. We keep the message untouched and it goes to Elasticsearch which will put it into the app index (since the structured field is missing).
- If the incoming message is valid JSON, we parse and it goes into a separate ES index.

Risk and Assumptions

Elasticsearch limitations for not standardized JSON schemas and nested objects inside the JSON message will continue to bring in problems. Even Elastic.co suggest to really try and standardize field and types, as well as avoid nested objects where they can. Here is an interesting read about the mapping conflicts https://www.elastic.co/blog/great-mapping-refactoring#conflicting-mappings. We try to workaround this by given users more choice to distribute common JSON schemas to their own indices.

Documentation Considerations

See https://issues.redhat.com/browse/RHDEVDOCS-2677
Explicitly highlight that although JSON logs go to a dedicated index, users will read them out from the app alias. Specifically important when creating index patterns in Kibana.

The following points caused confusion during the implementation, make sure to be clear in the docs:

Final name for elasticsearch API fields: structuredTypeName, structuredTypeKey

Don't call these "index" names, they are used to form the index name by adding 'app-' prefix.
Emphasise that the structured type should identify JSON docs with different formats, it should not be used to identify applications, namespaces, users or any other type of group. Keep the number small.
"app-" prefix is added to structured type to form index name, conforming to existing Elasticsearch index model.
there is not (currently) any way to route logs to an index that is not named "app-...". May be a future feature.

Open Questions

Additional Notes

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

screenshot-1.png
63 kB
2021/07/28 1:07 PM

is documented by

RHDEVDOCS-2521 Document our JSON log entry format.

Closed

RHDEVDOCS-2677 Document "Allow storing and querying of structured logs (JSON)"

Closed

is related to

LOG-1296 Send JSON logs from containers in the same pod to separate indices

Closed

RHDEVDOCS-4029 Send JSON logs from containers in the same pod to separate indices

Closed

links to

openshift/cluster-logging-operator#908: Parsing JSON formatted field on OCP 4.6

Alan Conway added a comment - 2022/01/24 4:45 PM

jvilaca@redhat.com the intent was that the pod labels would continue to be flattened when sending logs to the Elasticsearch store, for backwards compatibility and because otherwise Elasiticsearch gets indexing indigestion.

They should appear as a JSON object when sent to other types of output (fluentd, syslog etc.)

FYI: We always parsed the metadata envelope, including labels, into JSON - we actually have to take steps to flatten the labels again, to work around Elasticsearch's indexing problems. Since we will move away from ES as the default store, we want to isolate any ES-specific weirdness to the ES output only.

Alan Conway added a comment - 2022/01/24 4:45 PM jvilaca@redhat.com the intent was that the pod labels would continue to be flattened when sending logs to the Elasticsearch store, for backwards compatibility and because otherwise Elasiticsearch gets indexing indigestion. They should appear as a JSON object when sent to other types of output (fluentd, syslog etc.) FYI: We always parsed the metadata envelope, including labels, into JSON - we actually have to take steps to flatten the labels again, to work around Elasticsearch's indexing problems. Since we will move away from ES as the default store, we want to isolate any ES-specific weirdness to the ES output only.

João Vilaça added a comment - 2022/01/24 11:24 AM

rhn-engineering-aconway We see that pod labels are still being flattened in CNV logs. Does this epic also parse pod labels as JSON or only the pod logs?
/cc sradco

João Vilaça added a comment - 2022/01/24 11:24 AM rhn-engineering-aconway We see that pod labels are still being flattened in CNV logs. Does this epic also parse pod labels as JSON or only the pod logs? /cc sradco

Amr Elganzory added a comment - 2021/08/09 7:08 AM

Enabling this feature is now documented here:

https://docs.openshift.com/container-platform/4.7/logging/cluster-logging-enabling-json-logging.html

Amr Elganzory added a comment - 2021/08/09 7:08 AM Enabling this feature is now documented here: https://docs.openshift.com/container-platform/4.7/logging/cluster-logging-enabling-json-logging.html

David Karlsen (Inactive) added a comment - 2021/07/28 12:57 PM

cvogel1 that slack seems to be internal for RH. Maybe you hang out on #openshift-dev on the k8s slack?

David Karlsen (Inactive) added a comment - 2021/07/28 12:57 PM cvogel1 that slack seems to be internal for RH. Maybe you hang out on #openshift-dev on the k8s slack?

David Karlsen (Inactive) added a comment - 2021/07/27 1:21 PM

where is your slack?

David Karlsen (Inactive) added a comment - 2021/07/27 1:21 PM where is your slack?

Christian Heidenreich (Inactive) added a comment - 2021/07/27 12:42 PM

davidkarlsen Can you go and drop information (your CLF CRD) into "forum-logging" in our Slack and ask for help! That way, you may get some responses way faster than on this ticket

Christian Heidenreich (Inactive) added a comment - 2021/07/27 12:42 PM davidkarlsen Can you go and drop information (your CLF CRD) into "forum-logging" in our Slack and ask for help! That way, you may get some responses way faster than on this ticket

David Karlsen (Inactive) added a comment - 2021/07/27 12:21 PM

hm, I still do not get any "structured" field for logs, when message contains json.

David Karlsen (Inactive) added a comment - 2021/07/27 12:21 PM hm, I still do not get any "structured" field for logs, when message contains json.

David Karlsen (Inactive) added a comment - 2021/07/27 12:04 PM

Seems so, https://github.com/openshift/cluster-logging-operator/blob/be12cdf2cc83d9343f93a9eff2cc967f73a2f9d9/test/functional/normalization/json_parsing_test.go#L110

David Karlsen (Inactive) added a comment - 2021/07/27 12:04 PM Seems so, https://github.com/openshift/cluster-logging-operator/blob/be12cdf2cc83d9343f93a9eff2cc967f73a2f9d9/test/functional/normalization/json_parsing_test.go#L110

David Karlsen (Inactive) added a comment - 2021/07/27 11:55 AM

rhn-engineering-aconway like this?

apiVersion: logging.openshift.io/v1

kind: ClusterLogForwarder

metadata:

  name: instance

  namespace: openshift-logging

spec:

  pipelines:

  - name: all-to-default

    parse: json

    inputRefs:

    - infrastructure

    - application

    - audit

    outputRefs:

    - default

David Karlsen (Inactive) added a comment - 2021/07/27 11:55 AM rhn-engineering-aconway like this? apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: pipelines: - name: all-to-default parse: json inputRefs: - infrastructure - application - audit outputRefs: - default

Alan Conway added a comment - 2021/07/26 8:59 PM - edited

davidkarlsen You need to explicitly enable with parse: json on your forwarder's pipeline.
We did not enable by default because attempting a JSON parse on every message would have negative performance effects on users who don't care about structured JSON.

The enhancement proposal may help til we get proper user docs:
https://github.com/openshift/enhancements/blob/master/enhancements/cluster-logging/forwarding-json-structured-logs.md#L33

Alan Conway added a comment - 2021/07/26 8:59 PM - edited davidkarlsen You need to explicitly enable with parse: json on your forwarder's pipeline . We did not enable by default because attempting a JSON parse on every message would have negative performance effects on users who don't care about structured JSON. The enhancement proposal may help til we get proper user docs: https://github.com/openshift/enhancements/blob/master/enhancements/cluster-logging/forwarding-json-structured-logs.md#L33

Assignee:: Alan Conway

Reporter:: Christian Heidenreich (Inactive)

QA Contact:: Giriyamma Karagere Ramaswamy (Inactive)

Votes:: 12 Vote for this issue

Watchers:: 46 Start watching this issue

Created:: 2020/05/26 11:31 AM

Updated:: 2024/12/20 11:10 PM

Resolved:: 2021/07/21 1:24 PM

Details

Description

Goals

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Risk and Assumptions

Documentation Considerations

Open Questions

Additional Notes

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

[LOG-785] Allow storing and querying of structured logs (JSON)

Collapse comment: Alan Conway added a comment - 2022/01/24 4:45 PM

Expand comment: Alan Conway added a comment - 2022/01/24 4:45 PM

Collapse comment: João Vilaça added a comment - 2022/01/24 11:24 AM

Expand comment: João Vilaça added a comment - 2022/01/24 11:24 AM

Collapse comment: Amr Elganzory added a comment - 2021/08/09 7:08 AM

Expand comment: Amr Elganzory added a comment - 2021/08/09 7:08 AM

Collapse comment: David Karlsen (Inactive) added a comment - 2021/07/28 12:57 PM

Expand comment: David Karlsen (Inactive) added a comment - 2021/07/28 12:57 PM

Collapse comment: David Karlsen (Inactive) added a comment - 2021/07/27 1:21 PM

Expand comment: David Karlsen (Inactive) added a comment - 2021/07/27 1:21 PM

Collapse comment: Christian Heidenreich (Inactive) added a comment - 2021/07/27 12:42 PM

Expand comment: Christian Heidenreich (Inactive) added a comment - 2021/07/27 12:42 PM

Collapse comment: David Karlsen (Inactive) added a comment - 2021/07/27 12:21 PM

Expand comment: David Karlsen (Inactive) added a comment - 2021/07/27 12:21 PM

Collapse comment: David Karlsen (Inactive) added a comment - 2021/07/27 12:04 PM

Expand comment: David Karlsen (Inactive) added a comment - 2021/07/27 12:04 PM

Collapse comment: David Karlsen (Inactive) added a comment - 2021/07/27 11:55 AM

Expand comment: David Karlsen (Inactive) added a comment - 2021/07/27 11:55 AM

Collapse comment: Alan Conway added a comment - 2021/07/26 8:59 PM, Edited by Alan Conway - 2021/07/26 9:00 PM

Expand comment: Alan Conway added a comment - 2021/07/26 8:59 PM, Edited by Alan Conway - 2021/07/26 9:00 PM

People

Dates