Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-3762

"VertxException: Thread blocked" during Topic Operator startup

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • 2.2.0.GA
    • None
    • None
    • None

    Description

      When the Topic Operator is starting, the log is often full with with Thread blocked exceptions. That suggests that some blocking operations are being executed outside of executeBlocking code. This should be investigated:

       

      2022-02-02 23:10:54,49092 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 9407 ms, time limit is 2000 ms
      io.vertx.core.VertxException: Thread blocked
              at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
              at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?]
              at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
              at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?]
              at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?]
              at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?]
              at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?]
              at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
              at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
              at com.fasterxml.jackson.databind.DeserializationConfig.<init>(DeserializationConfig.java:108) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3]
              at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:629) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3]
              at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:542) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3]
              at io.fabric8.kubernetes.client.utils.Serialization.<clinit>(Serialization.java:51) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.<clinit>(OperationSupport.java:63) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.fabric8.kubernetes.client.ResourceHandlerImpl.operation(ResourceHandlerImpl.java:77) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.fabric8.kubernetes.client.Handlers.getOperation(Handlers.java:137) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.fabric8.kubernetes.client.BaseKubernetesClient.resources(BaseKubernetesClient.java:439) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.fabric8.kubernetes.client.BaseKubernetesClient.resources(BaseKubernetesClient.java:128) ~[io.fabric8.kubernetes-client-5.12.0.jar:?]
              at io.strimzi.operator.topic.Session.startWatcher(Session.java:353) ~[io.strimzi.topic-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
              at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:231) ~[io.strimzi.topic-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT]
              at io.strimzi.operator.topic.Session$$Lambda$283/0x0000000840326c40.handle(Unknown Source) ~[?:?]
              at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
              at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4]
              at io.vertx.core.impl.future.FutureBase$$Lambda$296/0x0000000840378440.run(Unknown Source) ~[?:?]
              at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at java.lang.Thread.run(Thread.java:829) ~[?:?] 

      The attached code is from a fresh cluster where the topic is empty. You can see that it happens there as well. But it seems that this sometimes gets much worse with clusters in use. I saw logs where the thread was blocked for 30+ seconds and in some cases it was reported that the TO does not start without it and users sometimes need to delete the TO topics to get it fixed. This is from my long running cluster with ~30 topics:

      2022-02-02 22:31:54,89312 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 24658 ms, time limit is 2000 ms
      io.vertx.core.VertxException: Thread blocked
              at io.vertx.core.net.impl.transport.Transport.serverChannelFactory(Transport.java:196) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.net.impl.AsyncResolveConnectHelper.doBind(AsyncResolveConnectHelper.java:37) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.net.impl.TCPServerBase.listen(TCPServerBase.java:143) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:220) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:150) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:164) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.strimzi.operator.topic.Session.startHealthServer(Session.java:379) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1]
              at io.strimzi.operator.topic.Session.lambda$start$7(Session.java:236) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1]
              at io.strimzi.operator.topic.Session$$Lambda$720/0x0000000840556040.apply(Unknown Source) ~[?:?]
              at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.impl.future.FutureImpl.addListener(FutureImpl.java:196) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.impl.future.FutureBase.compose(FutureBase.java:84) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.Future.compose(Future.java:199) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:236) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1]
              at io.strimzi.operator.topic.Session$$Lambda$283/0x000000084031dc40.handle(Unknown Source) ~[?:?]
              at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1]
              at io.vertx.core.impl.future.FutureBase$$Lambda$422/0x000000084042f840.run(Unknown Source) ~[?:?]
              at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final]
              at java.lang.Thread.run(Thread.java:829) ~[?:?] 

      This has to be investigated and fixed:

       

      • Blocking calls should be run as blocking calls to avoid blocking the main event loop
      • We should make sure the TO is able to handle this situation even after running for a long time with many topics.

      Related to Strimzi#6046

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              scholzj JAkub Scholz
              Jakub Stejskal Jakub Stejskal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: