Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-1411

tls-sidecar can terminate earlier than Kafka container itself

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.4.0.GA
    • 1.2.0.GA
    • None
    • None
    • 0
    • Hide

      The workaround is to avoid terminating a Kafka and a Zookeeper pod at the same time.
      However, it is difficult to use customized PodDisruptionBudget due to [ENTMQST-1412]Provide a way to disable default podDisruptionBudget. So the workaround is the manual operation only.

      Show
      The workaround is to avoid terminating a Kafka and a Zookeeper pod at the same time. However, it is difficult to use customized PodDisruptionBudget due to [ENTMQST-1412]Provide a way to disable default podDisruptionBudget . So the workaround is the manual operation only.
    • Hide

      In the case of upgrading a OpenShift cluster or executing drain command, this problem can happen. Because a Kafka pod and a Zookeeper pod can terminate at the same time.

      The following procedure confirms that there can be no ESTABLISHED connection.

      1. Show ESTABLISHED connection count
                oc rsh -c tls-sidecar my-cluster-kafka-0
                while true; do echo -n `date +"[%m-%d %H:%M:%S]"`; echo -n " : "; netstat -ant | grep -w 
        
      2. restart Zookeeper ensemble
      3. => ESTABLISHED connection count will be sometime 0. And If Kafka pod is terminated at this timing, the tls-sidecar can terminate earlier than Kafka container itself, and Kafka container cannot terminate glacefully.
      Show
      In the case of upgrading a OpenShift cluster or executing drain command, this problem can happen. Because a Kafka pod and a Zookeeper pod can terminate at the same time. The following procedure confirms that there can be no ESTABLISHED connection. Show ESTABLISHED connection count         oc rsh -c tls-sidecar my-cluster-kafka-0         while true ; do echo -n `date + "[%m-%d %H:%M:%S]" `; echo -n " : " ; netstat -ant | grep -w restart Zookeeper ensemble => ESTABLISHED connection count will be sometime 0. And If Kafka pod is terminated at this timing, the tls-sidecar can terminate earlier than Kafka container itself, and Kafka container cannot terminate glacefully.
    • 2019.13, 2019.14, 2019.15, 2020.1, 2020.2

      tls-sidecar can terminate earlier than Kafka container itself

      • tls-sidecar check only the ESTABLISHED connection count in `kafka_stunnel_pre_stop.sh`
      • So If stopping a Kafka pod when there is no connection to zookeeper, the tls-sidecar in Kafk pod can terminate earlier than Kafka container itself.
      • And the Kafka container cannot graceful shutdown.

      In the case of upgrading OpenShift cluster or executing drain command, this problem can happen. Because a Kafka pod and a Zookeeper pod can terminate at the same time.

              Unassigned Unassigned
              rhn-support-tyamashi Tomonari Yamashita
              Jakub Stejskal Jakub Stejskal
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: