Uploaded image for project: 'OpenShift Image Registry'
  1. OpenShift Image Registry
  2. IR-195

Image registry is resilient against zone failures

XMLWordPrintable

    • Sprint 205, Sprint 206, Sprint 211, Sprint 216, Sprint 217, Sprint 218, Sprint 219

      Story: As an administrator I want to rely on a default configuration that spreads image registry pods across topology zones so that I don't suffer from a long recovery time (>6 mins) in case of a complete zone failure if all pods are impacted.

      Background: The image registry currently uses affinity/anti-affinity rules to spread registry pods across different hosts. However this might cause situations in which all pods end up on hosts of a single zone, leading to a long recovery time of the registry if that zone is lost entirely. However due to problems in the past with the preferred setting of anti-affinity rule adherence the configuration was forced instead with required and the rules became constraints. With zones as constraints the internal registry would not have deployed anymore in environments with a single zone, e.g. internal CI environment. Pod topology constraints is a new API that is supported in OCP which can also relax constraints in case they cannot be satisfied. Details here: https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-scheduler-pod-topology-spread-constraints.html

      Acceptance criteria:

      • by default the internal registry is deployed with at least two replica
      • by default the topology constraints should be on a zone-basis, so that by defaults one registry pod is scheduled in each zone
      • when constraints can't be satisfied the registry should deploy anyway
      • we should not do this in SNO environments
      • the registry should still work on SNO environments

      Open Questions:

      • what happens in environments where the storage is zone dependent?

            fmissi Flavian Missi
            DanielMesser Daniel Messer
            XiuJuan Wang XiuJuan Wang
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: