Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-220

[closed/dupe] NetApp Trident CSI and AWS FSx config for ROSA

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • 0
    • 0% 0%
    • 0

      Feature Overview (aka. Goal Summary)  

      Strategic and general customers alike have expressed use-cases where they require specific storage backing for workload persistent volumes.

      Two offerings that are readily available to ROSA customers are:

      1. AWS FSx
      2. NetApp Trident (ONTAP) CSI

      These storage offerings require specific configuration on the cluster that currently cannot be set by customers. We should enable this configuration so customers are unblocked from using these storage plugins.

       

      Goals (aka. expected user outcomes)

      The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

      Users would be able to setup the NetApp Trident CSI on their cluster and succeed to use it reliably, and as the plugin was intended.

      Users would be able to setup the AWS FSx on their cluster and succeed to use it reliably, and as the plugin was intended.

      Requirements (aka. Acceptance Criteria):

      A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

      • ROSA HCP customers can enable multipath and iscsi services on their worker node machine-configurations so that the storage plugins can function

      Use Cases (Optional):

      Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

       

      Questions to Answer (Optional):

      Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

       

      Out of Scope

      High-level list of items that are out of scope.  Initial completion during Refinement status.

       

      Background

      Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

      What is the nature and description of the request?:

      Customer is implementing their multi-cloud strategy and heavily invested in both ROSA (established) and ARO (in progress). Part of the vision sold to the customer is their ability to have OpenShift as their multi-cloud platform and one of the aspects of building that relies on the ability to recover for failure irrespective of the cloud provider (as you would expect from OpenShift). Part of the big picture involves having the same type of capability at the storage level, i.e. being able to replicate and restore data across clouds – and this is achieved by Customer via their existing investments and implementation of NetApp Astra for both File and Block Storage.

      In order to provide block storage devices to OpenShift via CSI (NetApp Trident ONTAP), worker nodes need to have iSCSI and multipath enabled (see [1] and [2]). Currently, Customer has requested the change to be implemented in ROSA as part of the Support Case #03502175 [3]. The policy for both ARO and ROSA is that changes to machine configs are not allowed to be made by customers and changes would degrade supportability.

      Please implement this change as a feature to both Managed OpenShift offerings as a fully supported capability; or, alternatively, provide a recommendation of a solution that addresses the bigger problem within the current constraints and capabilities of the customer.

      Why does the customer need this? (List the business requirements here):

      With the customer unable to enable iSCSI in the worker nodes, their ability to operationalise Managed OpenShift as their multi-cloud platform for all banking and insurance workloads. Their ability to sync and restore clusters across clouds will be limited which invalidates the greater promise of OpenShift being a true multi-cloud platform.
      It's worth noting that the same feature is supported in both AKS and EKS (and self-managed OCP), and while we know their features are not comparable to what ARO/ROSA offers, ultimately platforms and SRE teams know that data is the most important asset of all, and reliability, recoverability and stability will trump anything. So, the main impact is that the platform will not be hosting stateful applications which will greatly reduce the "addressable market" within limiting our ability to expand the number of OpenShift cores, ultimately impacting our revenue and possibly our reputation with the promise of a true multi-cloud platform.

      Customer Considerations

      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

       

      Documentation Considerations

      Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

      There will need to be Red Hat documentation changes to inform customers how to set the configurable parameters in order to successfully use the storage plugins with ROSA HCP (if different from the implementation for ROSA Classic).

      We should also inform the partner organisations about specifics that should change in their documentation, regarding implementation in ROSA HCP clusters  (if different from the implementation for ROSA Classic).

      Interoperability Considerations

      Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

      Additional information

      [1] https://netapp-trident.readthedocs.io/en/stable-v20.10/support/requirements.html
      [2] https://netapp-trident.readthedocs.io/en/stable-v20.10/kubernetes/operations/tasks/worker.html#worker-preparation
      [3] https://access.redhat.com/support/cases/#/case/03502175
      [4] https://issues.redhat.com/browse/OHSS-21791

            Unassigned Unassigned
            rh-ee-adejong Aaren de Jong
            Xue Li Xue Li
            Aaren de Jong Aaren de Jong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: