Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-10042

Investigate why KafkaColocatingBrokers alert fired almost 100 times in a week

XMLWordPrintable

    • MK - Sprint 231

      WHAT

      Identify why the KafkaColocatingBrokers alert is firing so frequently.

      WHY

      During Oct. 21 ~ Oct. 28, the `KafkaColocatingBrokers` alert fired 93 times but seems to resolve itself. This is unexpected behaviour because brokers can't move between AZs. This is because the storage volumes for individual brokers is tied to an AZ. So we would not expect this alert to fire and resolve itself.

      How

      Investigate when this alert has fired, identify the cause and fix if possible.
      The alert is coded here: https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/blob/main/resources/prometheus/prometheus-rules.yaml#L160
      There is a unit test here: https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/blob/main/resources/prometheus/unit_tests/KafkaColocatingBrokers.yaml

      We suspect it's firing during a restart, perhaps when a pod moves from one node to another and that the query is mistakenly seeing this as two brokers being colocated.

      To investigate:

              keithbwall Keith Wall
              lukchen@redhat.com Luke Chen
              Kafka Integrations
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: