Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-4822

Certificate key replacement fails when Cluster Operator crashes before the trust is established

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 2.5.0.GA
    • 2.4.0.GA
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      When a Cluster CA private key is replaced, the operator needs to follow through a series of three different rolling updates:

      1. First rolling update establishes the trust to the new CA while still using the old server certificates and trusting the old CA as well
      2. Second rolling update issues and rolls out the new server certificates signed by the new CA (while still trusting the old CA as well)
      3. Third rolling update removes the trust in the old CA

      When the operator crashes before the first step is completed, it can happen that the operands will not trust the new CA. however, the operator will not detect it and recover from it and instead proceed with phase 2 and roll out the new certificates. This will not work, because trust will not be established and the operands will not sync during the rolling updates. If the trust is not rolled only to EO, KE or CC, it might still recover later. But if it fails before trust is rolled out to Zoo or Kafka, the clusters might not get back online anymore.

      Done in Strimzi#8402

              morsak Maros Orsak
              scholzj JAkub Scholz
              Maros Orsak Maros Orsak
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: