Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-219

workaround for NetApp SAN config for ROSA

    XMLWordPrintable

Details

    • True
    • False
    • Not Selected
    • 32
    • 32% 32%
    • 0

    Description

      Feature Overview (aka. Goal Summary)  

      Strategic and general customers alike have expressed use-cases where they require specific storage backing for workload persistent volumes.

      Two offerings that are readily available to ROSA customers are:

      1. AWS FSx-N (NetApp Ontap SAN)
      2. NetApp Trident (ONTAP) CSI

      These storage offerings require specific configuration on the cluster that currently cannot be set by customers. We should enable this configuration so customers are unblocked from using these storage plugins.

      The plugin offers a selection of options that offer block based storage instead of the already available file based storage (NFS).
      The selection of options we need to enable are

      • ontap san driver - multipath, over iscsi
      • ontap san economy driver - multipath, over iscsi

       

      Goals (aka. expected user outcomes)

      The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

      Users would be able to setup the NetApp Trident CSI (Ontap SAN driver) on their cluster and succeed to use it reliably, and as the plugin was intended.

      This is achieved by allowing customers to enable the services on worker nodes: multipath & iscsi.

      Requirements (aka. Acceptance Criteria):

      A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

      • ROSA (Classic and HCP) customers can enable multipath and iscsi services on their worker node machine-configurations so that the storage plugins can function

      Use Cases (Optional):

      Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

       

      Questions to Answer (Optional):

      Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

      Should this configurability be offered as a cluster-wide configuration (affects all machinepools) since customers will typically use this storage option for all worker nodes.

      Out of Scope

      High-level list of items that are out of scope.  Initial completion during Refinement status.

       

      Background

      Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

      What is the nature and description of the request?:

      Customer is implementing their multi-cloud strategy and heavily invested in both ROSA (established) and ARO (in progress). Part of the vision sold to the customer is their ability to have OpenShift as their multi-cloud platform and one of the aspects of building that relies on the ability to recover for failure irrespective of the cloud provider (as you would expect from OpenShift). Part of the big picture involves having the same type of capability at the storage level, i.e. being able to replicate and restore data across clouds – and this is achieved by Customer via their existing investments and implementation of NetApp Astra for both File and Block Storage.

      In order to provide block storage devices to OpenShift via CSI (NetApp Trident ONTAP), worker nodes need to have iSCSI and multipath enabled (see [1] and [2]). Currently, Customer has requested the change to be implemented in ROSA as part of the Support Case #03502175 [3]. The policy for both ARO and ROSA is that changes to machine configs are not allowed to be made by customers and changes would degrade supportability.

      Please implement this change as a feature to both Managed OpenShift offerings as a fully supported capability; or, alternatively, provide a recommendation of a solution that addresses the bigger problem within the current constraints and capabilities of the customer.

      Why does the customer need this? (List the business requirements here):

      With the customer unable to enable iSCSI in the worker nodes, their ability to operationalise Managed OpenShift as their multi-cloud platform for all banking and insurance workloads. Their ability to sync and restore clusters across clouds will be limited which invalidates the greater promise of OpenShift being a true multi-cloud platform.
      It's worth noting that the same feature is supported in both AKS and EKS (and self-managed OCP), and while we know their features are not comparable to what ARO/ROSA offers, ultimately platforms and SRE teams know that data is the most important asset of all, and reliability, recoverability and stability will trump anything. So, the main impact is that the platform will not be hosting stateful applications which will greatly reduce the "addressable market" within limiting our ability to expand the number of OpenShift cores, ultimately impacting our revenue and possibly our reputation with the promise of a true multi-cloud platform.

       

      In addition to the above and similar customer demand, this feature would be a dependency for running OpenShift-Virt on ROSA as well.

      Documentation Considerations

      Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

      There will need to be Red Hat documentation changes to inform customers how to set the configurable parameters in order to successfully use the storage plugins.

      We should also inform the partner organisations about specifics that should change in their documentation, regarding implementation in ROSA clusters.

      Interoperability Considerations

      Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

       

      Additional information

      [1] https://netapp-trident.readthedocs.io/en/stable-v20.10/support/requirements.html
      [2] https://netapp-trident.readthedocs.io/en/stable-v20.10/kubernetes/operations/tasks/worker.html#worker-preparation
      [3] https://access.redhat.com/support/cases/#/case/03502175
      [4] https://issues.redhat.com/browse/OHSS-21791

      Attachments

        Issue Links

          Activity

            People

              rh-ee-adejong Aaren de Jong
              rh-ee-adejong Aaren de Jong
              James Harrington James Harrington
              Xue Li Xue Li
              Aaren de Jong Aaren de Jong
              Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: