-
Bug
-
Resolution: Done
-
Major
-
False
-
False
-
CounterfactualExplainer is set to timeout after 60,000ms.
SmallRye's Kafka's connector has a default timeout of 60,000ms in which to ACK a message too.
Should the actual Counterfactual timeout be slightly larger the ACK occurs outside of the configured Kafka timeout and integrity of the channel is lost.
In this example Counterfactuals timeout after 60,00*1*ms leading to catastrophic failure of the channel.
trusty_1 | 2021-09-07 15:45:17,835 INFO [org.kie.kog.tru.ser.com.mes.out.ExplainabilityRequestProducer] (executor-thread-53) Sending explainability request with id 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 explainability_1 | 2021-09-07 15:45:17,840 INFO [org.kie.kog.exp.mes.ExplainabilityMessagingHandler] (vert.x-eventloop-thread-15) Received CloudEvent with id 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 from trustyService/ExplainabilityRequestProducer explainability_1 | 2021-09-07 15:45:17,859 INFO [org.opt.cor.imp.sol.DefaultSolver] (pool-28-thread-1) Solving started: time spent (8), best score ([-1.0/0/0]hard/[0.0/0]soft), environment mode (REPRODUCIBLE), move thread count (NONE), random (JDK with seed 0). explainability_1 | 2021-09-07 15:45:19,810 INFO [org.kie.kog.exp.mes.ExplainabilityMessagingHandler] (pool-28-thread-1) Explainability service emits explainability CounterfactualExplainabilityResultDto for execution with ID 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:19,813 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Storing Counterfactual Explainability result for execution 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:19,815 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Purging Counterfactual Explainability results storage for Counterfactual 543c9a7e-8d5c-4d0c-a48d-982356546058 explainability_1 | 2021-09-07 15:45:24,435 INFO [org.kie.kog.exp.mes.ExplainabilityMessagingHandler] (pool-28-thread-1) Explainability service emits explainability CounterfactualExplainabilityResultDto for execution with ID 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:24,439 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Storing Counterfactual Explainability result for execution 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:24,440 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Purging Counterfactual Explainability results storage for Counterfactual 543c9a7e-8d5c-4d0c-a48d-982356546058 explainability_1 | 2021-09-07 15:45:29,799 INFO [org.kie.kog.exp.mes.ExplainabilityMessagingHandler] (pool-28-thread-1) Explainability service emits explainability CounterfactualExplainabilityResultDto for execution with ID 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:29,803 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Storing Counterfactual Explainability result for execution 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 trusty_1 | 2021-09-07 15:45:29,804 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Purging Counterfactual Explainability results storage for Counterfactual 543c9a7e-8d5c-4d0c-a48d-982356546058 explainability_1 | 2021-09-07 15:46:17,852 INFO [org.opt.cor.imp.loc.DefaultLocalSearchPhase] (pool-28-thread-1) Local Search phase (0) ended: time spent (60001), best score ([0.0/0/0]hard/[-0.3645823412806745/-1]soft), score calculation speed (3254/sec), step total (5). explainability_1 | 2021-09-07 15:46:17,852 INFO [org.opt.cor.imp.sol.DefaultSolver] (pool-28-thread-1) Solving ended: time spent (60001), best score ([0.0/0/0]hard/[-0.3645823412806745/-1]soft), score calculation speed (3254/sec), phase total (1), environment mode (REPRODUCIBLE), move thread count (NONE). explainability_1 | 2021-09-07 15:46:17,854 WARN [io.sma.rea.mes.kafka] (vert.x-eventloop-thread-15) SRMSG18231: The amount of received messages without acking is too high for topic partition 'trusty-explainability-request-0', amount 1. The last committed offset was 22. It means that the Kafka connector received Kafka Records that have neither be acked nor nacked in a timely fashion. The connector accumulates records in memory, but that buffer reached its maximum capacity or the deadline for ack/nack expired. The connector cannot commit as a record processing has not completed. explainability_1 | 2021-09-07 15:46:17,855 WARN [io.sma.rea.mes.kafka] (vert.x-eventloop-thread-15) SRMSG18228: A failure has been reported for Kafka topics '[trusty-explainability-request]': io.smallrye.reactive.messaging.kafka.commit.KafkaThrottledLatestProcessedCommit$TooManyMessagesWithoutAckException: Too Many Messages without acknowledgement in topic trusty-explainability-request (partition:0), last committed offset is 22 explainability_1 | explainability_1 | 2021-09-07 15:46:17,859 INFO [org.kie.kog.exp.mes.ExplainabilityMessagingHandler] (executor-thread-0) Explainability service emits explainability CounterfactualExplainabilityResultDto for execution with ID 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 explainability_1 | 2021-09-07 15:46:17,859 INFO [org.apa.kaf.cli.con.int.ConsumerCoordinator] (smallrye-kafka-consumer-thread-0) [Consumer clientId=kafka-consumer-trusty-explainability-request, groupId=explainability-service] Revoke previously assigned partitions trusty-explainability-request-0 explainability_1 | 2021-09-07 15:46:17,865 INFO [org.apa.kaf.cli.con.int.AbstractCoordinator] (smallrye-kafka-consumer-thread-0) [Consumer clientId=kafka-consumer-trusty-explainability-request, groupId=explainability-service] Member kafka-consumer-trusty-explainability-request-13c848fa-29b2-4be4-b132-f5c5f6e23b88 sending LeaveGroup request to coordinator kafka:9093 (id: 2147482646 rack: null) due to the consumer is being closed trusty_1 | 2021-09-07 15:46:17,865 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Storing Counterfactual Explainability result for execution 7c8c9490-7fb5-4a23-9247-e76e7c6c3397 kafka_1 | [2021-09-07 15:46:17,870] INFO [GroupCoordinator 1001]: Member kafka-consumer-trusty-explainability-request-13c848fa-29b2-4be4-b132-f5c5f6e23b88 in group explainability-service has left, removing it from the group (kafka.coordinator.group.GroupCoordinator) trusty_1 | 2021-09-07 15:46:17,871 INFO [org.kie.kog.tru.ser.com.han.CounterfactualExplainerServiceHandler] (vert.x-eventloop-thread-15) Purging Counterfactual Explainability results storage for Counterfactual 543c9a7e-8d5c-4d0c-a48d-982356546058 kafka_1 | [2021-09-07 15:46:17,873] INFO [GroupCoordinator 1001]: Preparing to rebalance group explainability-service in state PreparingRebalance with old generation 1 (__consumer_offsets-43) (reason: removing member kafka-consumer-trusty-explainability-request-13c848fa-29b2-4be4-b132-f5c5f6e23b88 on LeaveGroup) (kafka.coordinator.group.GroupCoordinator) kafka_1 | [2021-09-07 15:46:17,876] INFO [GroupCoordinator 1001]: Group explainability-service with generation 2 is now empty (__consumer_offsets-43) (kafka.coordinator.group.GroupCoordinator) explainability_1 | 2021-09-07 15:46:17,883 INFO [org.apa.kaf.com.met.Metrics] (smallrye-kafka-consumer-thread-0) Metrics scheduler closed explainability_1 | 2021-09-07 15:46:17,884 INFO [org.apa.kaf.com.met.Metrics] (smallrye-kafka-consumer-thread-0) Closing reporter org.apache.kafka.common.metrics.JmxReporter explainability_1 | 2021-09-07 15:46:17,885 INFO [org.apa.kaf.com.met.Metrics] (smallrye-kafka-consumer-thread-0) Metrics reporters closed explainability_1 | 2021-09-07 15:46:17,891 INFO [org.apa.kaf.com.uti.AppInfoParser] (smallrye-kafka-consumer-thread-0) App info kafka.consumer for kafka-consumer-trusty-explainability-request unregistered explainability_1 | 2021-09-07 15:46:17,892 ERROR [io.sma.rea.mes.kafka] (smallrye-kafka-consumer-thread-0) SRMSG18217: Unable to read a record from Kafka topics '[trusty-explainability-request]': java.lang.IllegalStateException: This consumer has already been closed. explainability_1 | at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2432) explainability_1 | at org.apache.kafka.clients.consumer.KafkaConsumer.paused(KafkaConsumer.java:2048) explainability_1 | at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$resume$12(ReactiveKafkaConsumer.java:180) explainability_1 | at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$runOnPollingThread$1(ReactiveKafkaConsumer.java:103) explainability_1 | at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21) explainability_1 | at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28) explainability_1 | at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36) explainability_1 | at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27) explainability_1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) explainability_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) explainability_1 | at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) explainability_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) explainability_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) explainability_1 | at java.base/java.lang.Thread.run(Thread.java:834)
The proposal is to configure the explainability-request channel timeout to be greater than the Counterfactual timeout.
- is related to
-
FAI-566 Explainer invocation: Message ACK
-
- Backlog
-