-
Story
-
Resolution: Done
-
Critical
-
None
-
8
-
False
-
False
-
Undefined
-
-
Sprint 205, Sprint 206, Sprint 211, Sprint 216, Sprint 217, Sprint 218, Sprint 219
Story: As an administrator I want to rely on a default configuration that spreads image registry pods across topology zones so that I don't suffer from a long recovery time (>6 mins) in case of a complete zone failure if all pods are impacted.
Background: The image registry currently uses affinity/anti-affinity rules to spread registry pods across different hosts. However this might cause situations in which all pods end up on hosts of a single zone, leading to a long recovery time of the registry if that zone is lost entirely. However due to problems in the past with the preferred setting of anti-affinity rule adherence the configuration was forced instead with required and the rules became constraints. With zones as constraints the internal registry would not have deployed anymore in environments with a single zone, e.g. internal CI environment. Pod topology constraints is a new API that is supported in OCP which can also relax constraints in case they cannot be satisfied. Details here: https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-scheduler-pod-topology-spread-constraints.html
Acceptance criteria:
- by default the internal registry is deployed with at least two replica
- by default the topology constraints should be on a zone-basis, so that by defaults one registry pod is scheduled in each zone
- when constraints can't be satisfied the registry should deploy anyway
we should not do this in SNO environments- the registry should still work on SNO environments
Open Questions:
- what happens in environments where the storage is zone dependent?