-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
None
-
False
-
No
-
MGDSRVS-336 - Keep Openshift Streams components up-to-date
-
---
-
---
-
-
-
MK - Sprint 232
WHAT
Now that we have more reasonable CPU utilisation (MGDSTRM-10045) we must configure the Strimzi Operator for high availability and configure leadership election so that exactly one replica is active as any one time.
WHY
High availability.
HOW
- Configure the strimzi deployment https://gitlab.cee.redhat.com/mk-ci-cd/rhosak-pipeline-configs/-/blob/rhosak-0.1-rhel-8/distgit/containers/rhosak-kas-strimzi-operator-bundle/src/templates/strimzi-cluster-operator.deployment.yaml#L5
- Increase the replica count from 1 to 2
- Set the STRIMZI_LEADER_ELECTION_ENABLED env var to true
- Set the STRIMZI_LEADER_ELECTION_LEASE_NAME env var to match the name of the operator. That way if we have two different operator versions deployed they will use a different Lease object. The configuration for Strimzi leadership election is here: https://strimzi.io/docs/operators/latest/configuring.html#ref-operator-cluster-leader-election-str. We can do this using the Kube downward api: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/
- Review the strimzi operator alerting https://github.com/bf2fc6cc711aee1a0c2a/observability-resources-mk/blob/main/resources/prometheus/prometheus-rules.yaml#L469
Resources:
- Issue where leadership was added to Strimzi: https://github.com/strimzi/strimzi-kafka-operator/issues/7174
- Video of leadership in action: https://strimzi.io/blog/2022/09/13/leader-election/
DONE
- Strimzi reconfigured to run with two replicas.
- Verify that when we have two strimzi deployment both using leadership, the cohorts are indeed separate.
- Verify that when an upgrade occurs and we add a new version of the operator the old Lease is cleaned up (this should have been handled by
MGDSTRM-10325)
- is blocked by
-
MGDSTRM-10045 Rationalise CPU utilisation in order to allow for strimzipodset enablement
- Closed
- is related to
-
MGDSTRM-10325 Update Fleetshard operator to remove old Lease resources
- Closed
- mentioned on
(2 mentioned on)