When a Cluster CA private key is replaced, the operator needs to follow through a series of three different rolling updates:
- First rolling update establishes the trust to the new CA while still using the old server certificates and trusting the old CA as well
- Second rolling update issues and rolls out the new server certificates signed by the new CA (while still trusting the old CA as well)
- Third rolling update removes the trust in the old CA
When the operator crashes before the first step is completed, it can happen that the operands will not trust the new CA. however, the operator will not detect it and recover from it and instead proceed with phase 2 and roll out the new certificates. This will not work, because trust will not be established and the operands will not sync during the rolling updates. If the trust is not rolled only to EO, KE or CC, it might still recover later. But if it fails before trust is rolled out to Zoo or Kafka, the clusters might not get back online anymore.
Done in Strimzi#8402