Uploaded image for project: 'OpenShift Top Level Product Strategy'
  1. OpenShift Top Level Product Strategy
  2. OCPPLAN-8228

RFE - Routed Provider Networking support for Shift-on-Stack

    XMLWordPrintable

Details

    • False
    • False
    • ?
    • No
    • ?
    • ?
    • ?
    • 0
    • 0% 0%

    Description

      Use Case

      As a ShiftonStack cloud admin for a large enterprise where my RHOSP deployment is distributed across multiple network fabrics (AZs) in a single datacenter - either with spine/leaf or DCN design. Availability Zone can vary from tens to hundreds of compute/hci nodes. The latency is not an issue since the physical location of the cluster is not distributed geographically.

      I run OCP cluster stretched across multiple AZs over Routed Provider Networks to achieve maximum network throughput/performance and remove a need of Kuryr to avoid double encapsulation.

       

      Goal

      As of today, OCP workloads aren't supported on stretched Spine/Leaf or DCN and have never been tested by QE, so we want to design a Reference Architecture of what can be tested now, in the OSP16.2 timeframe, (therefore supported at some point) and potentially delivered to customers.

      And then a next step will be to work on the evolution of this architecture based on the OSP and OCP roadmaps, and find a conjuncture of what customers need and what we'll be able to deliver in the OSP17 timeframe.

      Why is this important

      Customers are looking at the OpenStack Platform as either a replacement for public clouds (AWS, GCP and Azure) or to augment their existing off-premise deployments and they want to match the capability and resiliency of the public cloud offering. Today all the public clouds support deploying OCP across multiple AZs. Here are some examples from AWS:

      apiVersion: v1
      baseDomain: lab.uc2.io
      compute:
      - architecture: amd64
        hyperthreading: Enabled
        name: worker
        platform:
          aws:
            zones:
            - us-east-1a
            - us-east-1b
            rootVolume:
              iops: 4000
              size: 100
              type: io1
            type: m5.xlarge
        replicas: 3
      ...
      platform:
        aws:
          region: us-east-1
          subnets:
          - subnet-02dda9e8fd9317ea0
          - subnet-0b768178e3286ce5e
          - subnet-08b492b781df30773
          - subnet-0643f973ed471a20f
          - subnet-0c169245f0a58ad10
          - subnet-0272ab642a3086432 
      ...

       

      There are some caveats for the same type of deployment in AWS:

      • you need to enable DNS resolution in the VPC, you should add some AWS specific endpoints (for S3 for instance)
      • and if you are publishing externally you need public/private subnets for each AZ
      • private using a NAT gateway and public using an IGW, with route tables updating accordingly

      example playbooks:

      https://github.com/nasx/openshift-vpc

       

       

      High Level Requirements

      • Plan would be Tech Preview in OCP 4.11 and GA in 4.12
      • Deliverable:
      • Reference architecture of OCP at the Edge for Enterprise, that we can test and support
      • Documentation
      • QE testing
      • Features involving code (TBD)
      • Ability to support multiple subnets and/or Routed Provided networks inside install-config.yaml for both Master and Worker nodes (today only a single subnet can be provided).
      • Networking
      • Support for IPv6 provisioning
      • Provider networks
      • Dual stack (IPv4/IPv6) for VM and pods

       

      Enterprise and IoT/Retail Edge Requirements

      Potential customer(s):

      • ADP
      • Telus
      • One very large customer I cannot name at the moment

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            grosenbe-redhat.com Gil Rosenberg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: