Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-64

Logging strategy in Openshift 4 for application logs with Kafka

XMLWordPrintable

    • False
    • False
    • Undefined
    • 0

      What is the problem that your customer is facing?

      1. Logging forwarder in OCP4 shall forward application logs to ONE Kafka bus seperated per namespaces.
      Examples on Red Hat Docs show that this may be solved by pointing the application logs to seperated (MULTIPLE) Kafka buses, respectively. Is this the only solution?
      2. To mitigate this lack of ability, our current OCP4 setup has a combination of fluentd with fluentbit-mux capabilities in order to point application logs from fluentd to the fluentbit-mux that forwards the logs to the Kafka with the correct application topic. Is this combination a supported setup by Red Hat?
      3. Why is, in general, the capability of the fluentbit-mux dropped in the current ClusterLogging Framework in OCP4?
      Why shall all nodes be able to connect to Kafka instead of collecting them to a central point (fluntbit-mux) and then forward it to Kafka (enriched with the Kubernetes API meta information) ?

      What is the business impact, if any, if this request will not be made

      The customer is looking for a supported method for sending application logs to Kafka, this needs to be addressed as a part of the customers enterprise Openshift platform. The impact of this could well be financial as this is a big FSI customer.

      What are your expectations for this feature

      A supported implementation for the cluster to send application logs to Kafka using Kafka Topics as keys.

      Have you done this before and/or outside of support and if yes, how?

      The reason why we use fluentbit for the separation of the app logs is that the solution worked for us in 3.11 (works perfect after the new fluentbit version that adds dynamically new kafka topics based on new onboarded namespaces). The part of the log separation and the sending via kafka output that is in place in our 3.11 clusters was easy to adapt for the 4.x cluster. In the version 3.x we had several issues with log performance of fluentd based logging (related to the ruby stack) and the additional output plugin for kafka (which needed a custom image with rubygems dependencies). We are dealing with the issue that this logging stack is not supported by Redhat.

      With the first steps with ver 4.5/4.6 we wanted to drop this custom solution and followed the official logging documentation of ocp 4. We set up the logging operator (based on fluentd) with a custom config map to split the app logs based on their namespaces. This should have been transferred to different logging pipelines which represent a ns-related kafka topic. We are offering a service to our customers to enrich all container logs with K8s information (muxer function), split these log entries based on their ns pods to forward them to ns dedicated kafka topics. These topic events are sent through one logging bus and can be consumed afterwards from our customers (customer ELKs).

      Regarding internal regulation issues and poor performance, we cannot sent all these app logs into one kafka topic nor one log collector instance. After the final kafka log shipping step, we fulfill our service responsibility and rely on the kafka service to transport the logs to the consumers - our customers .

      Following the official logging setup of ocp 4.x, we asked ourselves if this setup we choosed since 3.x is not common to other customers and therefore Redhat is not offering: one logging bus with several independent customer-related kafka topics which individual customers can consume via kafka.

      With the official documentation we cannot see that this solution can be implemented and handled with the fluentd/logging operator setup.

      We also had the question why Redhat dropped with 4.x the muxer concept that fluentd/fluenbit offers - it was introduced in 3.x with the remark regarding performance (hitting the k8s api with each log entry) and security (instead of running the muxer on the infra, all nodes are stressing the k8s api servers).

            Unassigned Unassigned
            rhn-support-andbartl Andy Bartlett
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: