Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-1153

Scaling Kafka logs and exception and doesn't work properly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.4.0.GA
    • 1.3.0.GA
    • cluster-operator
    • None
    • 0
    • 2019.9, 2019.10, 2019.12, 2019.13, 2019.14

      As reported by jstejska, when you try to scale Kafka cluster, you can often see following errors (especially if it uses external listener) and does rolling update before the scaleup:

      2019-07-12 11:25:37 INFO  KafkaSetOperator:75 - Scaling up to 4 replicas
      2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:126 - 0/4 pod(s) ready for StatefulSet: my-cluster-kafka in namespace: kafka-cluster-test  after waiting for 0 seconds so giving up
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 ERROR StatefulSetOperationsImpl:115 - Error while waiting for resource to be scaled.
      io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [StatefulSet]  with name: [my-cluster-kafka]  in namespace: [kafka-cluster-test]  failed.
              at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:161) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.internal.RollableScalableResourceOperation.lambda$waitUntilScaled$0(RollableScalableResourceOperation.java:94) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_212]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_212]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_212]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
      Caused by: java.io.InterruptedIOException: interrupted
              at okio.Timeout.throwIfReached(Timeout.java:146) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.Okio$2.read(Okio.java:137) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226) ~[com.squareup.okio.okio-1.15.0.jar:?]
              at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$1(HttpClientUtils.java:100) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at okhttp3.RealCall.execute(RealCall.java:92) ~[com.squareup.okhttp3.okhttp-3.12.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:749) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217) ~[io.fabric8.kubernetes-client-4.3.0.jar:?]
              ... 9 more
      2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-bootstrap in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 DEBUG Util:52 - Endpoints resource my-cluster-kafka-brokers in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG ServiceAccountOperator:107 - ServiceAccount kafka-cluster-test/my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:107 - RoleBinding kafka-cluster-test/strimzi-my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:107 - ConfigMap kafka-cluster-test/my-cluster-topic-operator-config does not exist, noop
      2019-07-12 11:25:37 DEBUG SecretOperator:107 - Secret kafka-cluster-test/my-cluster-topic-operator-certs does not exist, noop
      2019-07-12 11:25:37 DEBUG DeploymentOperator:107 - Deployment kafka-cluster-test/my-cluster-topic-operator does not exist, noop
      2019-07-12 11:25:37 DEBUG ServiceAccountOperator:98 - ServiceAccount kafka-cluster-test/my-cluster-entity-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-topic-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-topic-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:98 - RoleBinding kafka-cluster-test/strimzi-my-cluster-entity-user-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG RoleBindingOperator:164 - RoleBinding strimzi-my-cluster-entity-user-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-topic-operator-config already exists, patching it
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-topic-operator-config in namespace kafka-cluster-test has not been patched because resources are equal
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:98 - ConfigMap kafka-cluster-test/my-cluster-entity-user-operator-config already exists, patching it
      2019-07-12 11:25:37 DEBUG ConfigMapOperator:47 - ConfigMap my-cluster-entity-user-operator-config in namespace kafka-cluster-test has not been patched because resources are equal
      2019-07-12 11:25:37 DEBUG SecretOperator:98 - Secret kafka-cluster-test/my-cluster-entity-operator-certs already exists, patching it
      2019-07-12 11:25:37 DEBUG SecretOperator:164 - Secret my-cluster-entity-operator-certs in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG DeploymentOperator:98 - Deployment kafka-cluster-test/my-cluster-entity-operator already exists, patching it
      2019-07-12 11:25:37 DEBUG DeploymentOperator:164 - Deployment my-cluster-entity-operator in namespace kafka-cluster-test has been patched
      2019-07-12 11:25:37 DEBUG Util:30 - Waiting for Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test to get ready
      2019-07-12 11:25:37 DEBUG Util:52 - Deployment resource my-cluster-entity-operator in namespace kafka-cluster-test is ready
      2019-07-12 11:25:37 DEBUG StatusDiff:37 - Ignoring Status diff {"op":"replace","path":"/conditions/0/lastTransitionTime","value":"2019-07-12T11:25:37+0000"}
      2019-07-12 11:25:37 DEBUG StatusDiff:42 - Status differs: {"op":"replace","path":"/observedGeneration","value":2}
      2019-07-12 11:25:37 DEBUG StatusDiff:43 - Current Status path /observedGeneration has value 1
      2019-07-12 11:25:37 DEBUG StatusDiff:44 - Desired Status path /observedGeneration has value 2
      2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:394 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Completed status update
      2019-07-12 11:25:37 DEBUG KafkaAssemblyOperator:206 - Status for my-cluster is up to date
      2019-07-12 11:25:37 DEBUG AbstractAssemblyOperator:188 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Lock lock::kafka-cluster-test::Kafka::my-cluster released
      2019-07-12 11:25:37 INFO  AbstractAssemblyOperator:317 - Reconciliation #5(watch) Kafka(kafka-cluster-test/my-cluster): Assembly reconciled
      

      There are two problems:

      • An exception from Fabric8 is printed to the log (it is not thrown to use, is is caught inside Fabric8 and logged)
      • The scale() method should wait for the scaling but it doesn't. And we do not seem to do any further check for readiness - so we pretty much finish the reconciliation without the new broker being ready which is wrong.

              jstejska@redhat.com Jakub Stejskal
              jstejska@redhat.com Jakub Stejskal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: