XMLWordPrintable

    • Enable multi-node LVMS
    • Product / Portfolio Work
    • OCPSTRAT-390LVM Storage on multi node clusters
    • 0% To Do, 0% In Progress, 100% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • Green
    • Hide
      2023-12-15:
      Dev - Green - All 4.15 NVRs should now contain multi-node support for lvms by default
      Docs - Green - All references to single-node in LVMS docs that also apply to Multi-Node need to be removed, PRs in Progress
      QE - Green - QE Test cases created, Test activity ongoing

       

      Show
      2023-12-15 : Dev - Green - All 4.15 NVRs should now contain multi-node support for lvms by default Docs - Green - All references to single-node in LVMS docs that also apply to Multi-Node need to be removed, PRs in Progress QE - Green - QE Test cases created, Test activity ongoing  
    • M

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      <--- Cut-n-Paste the entire contents of this description into your new Epic --->

      Epic Goal

      • Introduce a technically stable version of multi-node LVMS into our regular releases, and making sure it can run aside from our SNO configuration

      Why is this important?

      • Multi-Node Clusters are the norm in kubernetes, and supporting them opens the solution for the majority of standard K8s topologies
      • Customers are already requesting Support Exceptions

      Scenarios

      1. LVMS is deployed on multiple nodes, where every node provides a single point of failure and is not highly-available by default. The administrator of the nodes is responsible for providing HA storage that can be consumed by LVM volume groups, or the Application developer makes sure to use HA and uses different volumes to replicate data across nodes.
      2. LVMS should be able to run stable on a Multi-Node Environment, with the LVMCluster object being responsible to trigger a DaemonSet which deploys the VolumeGroup on every node

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated - Especially we will need a multi-node test pipeline that can verify various edge cases around nodes becoming unavailable.
      • We will have to define outage scenarios and how to properly recover from them within LVMS. Especially we will need to define what happens when an entire node falls out of the cluster and the LVMCluster object will have to be recovered
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • The deployed TopoLVM instance needs to use these VolumeGroups as deviceClasses and smartly uses Kubernetes Capacity tracking to correctly determine where to setup Bindings based on CSI Topology
      • Once LVMCluster is created, it should make use of deviceSelector to correctly identify devices as pvs that have to be initialized on each node. This is the basis for our configuration.

      Dependencies (internal and external)

      1. E2E Test Pipeline setup for Multi-Node deployment as base for our test scaffolding

      Previous Work (Optional):

      Open questions::

      1. How do we cover all edge cases around multi-node failure scenarios?
      2. Are there any Status API changes that we should introduce to more clearly reflect the status of the VolumeGroups per Node in the Cluster.

      Done Checklist

              rh-ee-jmoller Jakob Moeller (Inactive)
              rhn-support-cscribne Chad Scribner
              None
              Rahul Deore Rahul Deore
              Daniel Macpherson Daniel Macpherson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: