-
Bug
-
Resolution: Done
-
Critical
-
None
-
1.4.1.GA, 1.6.2.GA
-
None
-
False
-
False
-
-
Undefined
-
Class loader level deadlock when having more than one Connect worker node and starting a bunch of connectors with their SMTs. It seems to be much more likely on an idle cluster (with all connectors in a paused state). We confirmed it's the same issue of the upstream KAFKA-7421 by taking thread dumps from worker pods.
In this case, it happens frequently when deleting a worker pod or a rolling update starts because of a configuration change. In both cases all Connect workers blocks because of the deadlock and the rebalance process hangs. From logs you see that all workers are at different generations and unable to join the cluster. Deadlocks are also detected by the jvm_threads_deadlocked_monitor metric.
Found one Java-level deadlock: ============================= "StartAndStopExecutor-connect-1-8": waiting to lock monitor 0x00007f3bc4004458 (object 0x0000000085220f10, a org.apache.kafka.connect.runtime.isolation.PluginClassLoader), which is held by "StartAndStopExecutor-connect-1-7" "StartAndStopExecutor-connect-1-7": waiting to lock monitor 0x00007f3be8009c78 (object 0x0000000085214aa0, a org.apache.kafka.connect.runtime.isolation.PluginClassLoader), which is held by "StartAndStopExecutor-connect-1-1" "StartAndStopExecutor-connect-1-1": waiting to lock monitor 0x00007f3bc4006fa8 (object 0x000000008007d7d8, a org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader), which is held by "StartAndStopExecutor-connect-1-4" "StartAndStopExecutor-connect-1-4": waiting to lock monitor 0x00007f3be8009c78 (object 0x0000000085214aa0, a org.apache.kafka.connect.runtime.isolation.PluginClassLoader), which is held by "StartAndStopExecutor-connect-1-1" Java stack information for the threads listed above: =================================================== "StartAndStopExecutor-connect-1-8": at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91) - waiting to lock <0x0000000085220f10> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:716) at org.apache.kafka.connect.runtime.ConnectorConfig.enrich(ConnectorConfig.java:308) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:212) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:206) at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:430) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1147) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:126) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1162) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1158) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "StartAndStopExecutor-connect-1-7": at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91) - waiting to lock <0x0000000085214aa0> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:394) at java.lang.ClassLoader.loadClass(ClassLoader.java:405) - locked <0x00000000f9f80b98> (a java.lang.Object) at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104) - locked <0x00000000f9f80b98> (a java.lang.Object) - locked <0x0000000085220f10> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:716) at org.apache.kafka.connect.runtime.ConnectorConfig.enrich(ConnectorConfig.java:308) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:212) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:206) at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:430) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1147) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:126) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1162) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1158) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "StartAndStopExecutor-connect-1-1": at java.lang.ClassLoader.loadClass(ClassLoader.java:398) - waiting to lock <0x000000008007d7d8> (a org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:397) at java.lang.ClassLoader.loadClass(ClassLoader.java:405) - locked <0x00000000f9f823c0> (a java.lang.Object) at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104) - locked <0x00000000f9f823c0> (a java.lang.Object) - locked <0x0000000085214aa0> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:719) at org.apache.kafka.connect.runtime.ConnectorConfig.enrich(ConnectorConfig.java:308) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:212) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:206) at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:249) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1190) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:126) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1206) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1202) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "StartAndStopExecutor-connect-1-4": at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91) - waiting to lock <0x0000000085214aa0> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:394) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:719) at org.apache.kafka.connect.runtime.ConnectorConfig.enrich(ConnectorConfig.java:308) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:212) at org.apache.kafka.connect.runtime.ConnectorConfig.<init>(ConnectorConfig.java:206) at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:249) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1190) at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:126) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1206) at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1202) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
- clones
-
ENTMQST-2639 Deadlock in Kafka Connect
- Closed