-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
5
-
False
-
-
True
-
-
We're receiving a lot of splunk alerts from:
index=rh_rhsm namespace=rhsm-prod "TooManyMessagesWithoutAckException" | rex field=message ".+has waited for (?<seconds>\d+).*" | table host, seconds, _time
Error message:
2025-05-21 05:22:21,184 WARN [io.sma.rea.mes.kafka] (vert.x-eventloop-thread-0) SRMSG18228: A failure has been reported for Kafka topics '[platform.rhsm-subscriptions.offering-sync]': io.smallrye.reactive.messaging.kafka.commit.KafkaThrottledLatestProcessedCommit$TooManyMessagesWithoutAckException: The record 498888 from topic/partition 'platform.rhsm-subscriptions.offering-sync-19' has waited for 64 seconds to be acknowledged. At the moment 163 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 498887.
When this error happens, the service gets restarted. Which is causing other issues like:
2025-05-21 05:16:38,979 ERROR [io.sma.rea.mes.provider] (executor-thread-1) SRMSG00200: The method com.redhat.swatch.contract.service.OfferingSyncTaskConsumer#consumeFromTopic has thrown an exception [Error Occurred After Shutdown]: jakarta.ws.rs.ProcessingException: java.io.IOException: Connection was closed
at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:392)
at org.jboss.resteasy.reactive.client.handlers.ClientSendRequestHandler$4.handle(ClientSendRequestHandler.java:383)
at io.vertx.core.impl.future.FutureImpl$2.onFailure(FutureImpl.java:117)
We need to investigate what is causing the messages not being acked to avoid service restarts.