Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-1548

Rethink the Kafka upgrade procedure - discussion

    XMLWordPrintable

Details

    • Task
    • Resolution: Done
    • Major
    • None
    • None
    • None
    • None

    Description

      Our current upgrade procedure has some advantages, but also some significant disadvantages:

      • We need to maintain multiple Kafka productisation builds
      • The automatic operator upgrades on OLM can run users into problems, when the operator automatically upgrades to a version which doesn't support anymore the Kafka version which the user runs. The OLM makes operator upgrades easy, but a bit in-transparent to the user.
      • We need to support multiple Kafka versions in the code (different Admin API support, etc)
      • Users cannot skip Strimzi / AMQ Streams versions when upgrading, they have to upgrade step by step which will be a problem (we have a lot of users running 1.0 or 1.1 today. I can hardly imagine them upgrading version by version, so they are probably stuck).
      • With the increasing need of LTS releases, we will need to support more Kafka versions with the current upgrade model (e.g. the one from previous LTS release + the one form previous STS release)
      • While it has its reasons, it might be confusing for many people that the upgrade is decoupled
      • It is harder to test and makes the code more complex. Ideally we should run all our tests with all versions. But we do not do that and that is fairly big gap.
      • Reusing example files between versions is harder given the hardcode the broker version and not just the protocol / message format version.

      I think we should reconsider the approach. What if we:

      • Support only one Kafka version
      • Deprecate and ignore the version field
      • Include in all our YAMLs a specific protocol and message format versions
      • The upgrade procedure will start by ensuring the version is fixed in the configuration (manually by the user), upgrading clients (manually by the user), upgrading the CO and after the upgrade changing the protocol / message format versions.
      • When autoupdating the operator in OLM, the users will either update Kafka using the old protocol or update both Kafka and default protocol versions if they removed them from the configs. But in general that should provide better experience that the current situation.

      One particular issue will be harder with this approach - running different Kafka versions in the same cluster. One operator will always run only one version on all clusters. However, even today this option is very limited given the amount of supported Kafka versions. So in lot of cases it might still not be enough.

      The current model gives us also more chances to do some interventions when the upgrade is happening. E.g. modify some files etc. This was not really needed so far, but it might be in the future. This will now probably need to be handled by the scripts inside the Docker images.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              scholzj JAkub Scholz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: