Uploaded image for project: 'Subscription Watch'
  1. Subscription Watch
  2. SWATCH-4494

Alert when malformed messages are received in the UMB consumers

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • swatch-contracts
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • subs-swatch-lightning
    • Swatch Lightning Sprint 9

      We need to set up monitoring to catch when the swatch-contracts service receives malformed UMB messages that can't be parsed properly.

      In SWATCH-4490, we fixed an issue where the service was crashing when it received invalid JSON messages (like a simple "test" string instead of proper contract data). The fix now handles these cases gracefully by:

      • Supporting both byte[] and String message types
      • Logging errors when messages can't be parsed
      • Preventing service restarts by using the accept failure strategy

      However, while the service no longer crashes, we're now silently ignoring these malformed messages. If this happens in production, we won't know about it unless we actively look for it. This could hide issues with our message producers or indicate problems with the message broker.

      We need an alert that triggers when these parsing errors occur so we can investigate and fix the root cause. For example:

      index=<your-contracts-index> 
      (
        "Unable to read UMB message from JSON" 
        OR "Unsupported message type" 
        OR "Deserialized object is not a String" 
        OR "Failed to deserialize Java object"
      )
      | stats count by message, exception, logger_name
      | where count > 0
      

      This query looks for the specific error messages that were added in the fix:

      • JSON parsing failures in UMB consumers like ContractUMBMessageConsumer
      • Unsupported message types (neither byte[] nor String)
      • Java deserialization errors in MessageUtils

      The alert should trigger when any of these messages appear, as they indicate something is sending malformed data to our contract queue.

      Note that this work might not be necessary since we're migrating from UMB to Kafka, so this depends on SWATCH-3899.
      Also, we're going to migrate Splunk (where we have our alerts today) to Sumo Logic, so the implementation might change after SWATCH-4374 is done.

      Acceptance Criteria

      • Create the alert in Sumo Logic, not splunk. See guide here.
      • Alert defined to send emails.

              rh-ee-tlencion Tommaso Lencioni
              jcarvaja@redhat.com Jose Carvajal Hilario
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: