-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
False
-
When the Topic Operator is starting, the log is often full with with Thread blocked exceptions. That suggests that some blocking operations are being executed outside of executeBlocking code. This should be investigated:
2022-02-02 23:10:54,49092 WARN [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 9407 ms, time limit is 2000 ms io.vertx.core.VertxException: Thread blocked at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?] at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?] at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?] at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?] at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?] at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?] at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?] at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?] at com.fasterxml.jackson.databind.DeserializationConfig.<init>(DeserializationConfig.java:108) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3] at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:629) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3] at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:542) ~[com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:2.11.3] at io.fabric8.kubernetes.client.utils.Serialization.<clinit>(Serialization.java:51) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.<clinit>(OperationSupport.java:63) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.fabric8.kubernetes.client.ResourceHandlerImpl.operation(ResourceHandlerImpl.java:77) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.fabric8.kubernetes.client.Handlers.getOperation(Handlers.java:137) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.fabric8.kubernetes.client.BaseKubernetesClient.resources(BaseKubernetesClient.java:439) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.fabric8.kubernetes.client.BaseKubernetesClient.resources(BaseKubernetesClient.java:128) ~[io.fabric8.kubernetes-client-5.12.0.jar:?] at io.strimzi.operator.topic.Session.startWatcher(Session.java:353) ~[io.strimzi.topic-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:231) ~[io.strimzi.topic-operator-0.28.0-SNAPSHOT.jar:0.28.0-SNAPSHOT] at io.strimzi.operator.topic.Session$$Lambda$283/0x0000000840326c40.handle(Unknown Source) ~[?:?] at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.4.jar:4.2.4] at io.vertx.core.impl.future.FutureBase$$Lambda$296/0x0000000840378440.run(Unknown Source) ~[?:?] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) ~[?:?]
The attached code is from a fresh cluster where the topic is empty. You can see that it happens there as well. But it seems that this sometimes gets much worse with clusters in use. I saw logs where the thread was blocked for 30+ seconds and in some cases it was reported that the TO does not start without it and users sometimes need to delete the TO topics to get it fixed. This is from my long running cluster with ~30 topics:
2022-02-02 22:31:54,89312 WARN [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 24658 ms, time limit is 2000 ms io.vertx.core.VertxException: Thread blocked at io.vertx.core.net.impl.transport.Transport.serverChannelFactory(Transport.java:196) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.net.impl.AsyncResolveConnectHelper.doBind(AsyncResolveConnectHelper.java:37) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.net.impl.TCPServerBase.listen(TCPServerBase.java:143) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:220) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:150) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.http.impl.HttpServerImpl.listen(HttpServerImpl.java:164) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.strimzi.operator.topic.Session.startHealthServer(Session.java:379) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1] at io.strimzi.operator.topic.Session.lambda$start$7(Session.java:236) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1] at io.strimzi.operator.topic.Session$$Lambda$720/0x0000000840556040.apply(Unknown Source) ~[?:?] at io.vertx.core.impl.future.Composition.onSuccess(Composition.java:38) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:60) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.impl.future.FutureImpl.addListener(FutureImpl.java:196) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.impl.future.FutureBase.compose(FutureBase.java:84) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.Future.compose(Future.java:199) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:236) ~[io.strimzi.topic-operator-0.27.1.jar:0.27.1] at io.strimzi.operator.topic.Session$$Lambda$283/0x000000084031dc40.handle(Unknown Source) ~[?:?] at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.2.1.jar:4.2.1] at io.vertx.core.impl.future.FutureBase$$Lambda$422/0x000000084042f840.run(Unknown Source) ~[?:?] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) ~[io.netty.netty-transport-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.71.Final.jar:4.1.71.Final] at java.lang.Thread.run(Thread.java:829) ~[?:?]
This has to be investigated and fixed:
- Blocking calls should be run as blocking calls to avoid blocking the main event loop
- We should make sure the TO is able to handle this situation even after running for a long time with many topics.
Related to Strimzi#6046
- account is impacted by
-
ENTMQST-5198 Support ZooKeeper based TopicStore solution for "VertxException: Thread blocked" during Topic Operator startup
- Closed