XMLWordPrintable

    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request

      Geo-Redundancy for SNO

      2. What is the nature and description of the request?

      SNO clusters running in multiple sites / availability zones to provide Geo-Redundancy. In case of a site failure, another takes over.

      The following requirements should be met:

      1. Geo-Redundancy (GR) is needed to ensure there is service continuity (e.g. when a particular site goes down the partner site takes over)
      2. The following GR models should be evaluated:
        • Active-Active
        • Active-Standby
      3. RPO (Recover Point Objective) must ensure there is no data loss.
      4. RTO (Recover Time Objective) must be 5 minutes or less.
      5. Redundancy concept should be transparent for applications and offload the pod.
      6. Active Site and Partner Site will be commissioned individually. It is expected that all applications on the Active Site will be mirrored to the Partner Site, including any future updates of the applications.
      7. At any given point in time, application data need to be in sync between both Sites. Data synchronization should take place between both system as both serve traffic.
      8. Support for N:1 GR model, where N (max:2) is the number of Active Sites and 1 represents the Partner Site. At any given point in time, the Partner Site should be able to act as an Active Site:
        • 1+1 - 2 Sites A/A or A/S
        • 2+1 - 2 Sites A/A or A/S, 1 site available to replace one of 2 sites which handle commercial traffic
      9. Load Balancing is required for seamless access to the both Sites.
      10. GR solution to be supported on Bare Metal OCP clusters.
      11. Switchover and Failover use cases should be supported. Switchover is when both Sites are running and the active role is moved from one to the other. Failover is when the Primary Site is fully down and irrecoverable and all traffic is directed to the Partner Site.
      12. Auto-failover should be possible when Active Site goes down.
      13. When a system recovers from failure, it should be synchronized with the system that acts as the Active Site before it is allowed to serve traffic again.
      14. The GR solution should still work as expected even if the two Sites are on different OCP versions (minor version difference only):
        • Different OCP Z-stream releases, e.g. Site 1: v.4.9.11, Site 2: v.4.9.20
        • Different OCP Y-stream releases, e.g. Site 1: v.4.9.11, Site 2: v.4.10.4
      15. Backup and Recovery for both Sites must be possible.

       

      3. Why does the customer need this? (List the business requirements here)

      Geo-Redundancy is required in cases where applications are running on SNO clusters, so that outages have less impact. The redundancy concept must be agnostic and transparent to the applications running on those nodes.

      4. List any affected packages or components.

      • Red Hat OpenShift Container Platform

            dfroehli42rh Daniel Fröhlich
            dvassili@redhat.com Demetris Vassiliades
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: