Uploaded image for project: 'OpenShift Storage'
  1. OpenShift Storage
  2. STOR-1265

Opt-in to CSI migration on vSphere



    • Opt-in to CSI migration on vSphere
    • False
    • None
    • False
    • Not Selected
    • To Do
    • OCPSTRAT-505 - CSI Migration
    • OCPSTRAT-505CSI Migration
    • 100
    • 100% 100%


      Epic Goal*

      Kubernetes upstream has chosen to allow users to opt-out from CSI volume migration in Kubernetes 1.26 (1.27 PR, 1.26 backport). It is still GA there, but allows opt-out due to non-trivial risk with late CSI driver availability.

      We want a similar capability in OCP - a cluster admin should be able to opt-in to CSI migration on vSphere in 4.13. Once they opt-in, they can't opt-out (at least in this epic).

      Why is this important? (mandatory)

      See an internal OCP doc if / how we should allow a similar opt-in/opt-out in OCP.

      Scenarios (mandatory) 


      1. Admin upgrades 4.12 -> 4.13 as usual
      2. Storage CR has CSI migration disabled (or nil), in-tree volume plugin handles in-tree PVs.
      3. At the same time, external CCM runs, however, due to kubelet running with –cloud-provider=vsphere, it does not do kubelet’s job.
      1. Admin can opt-in to CSI migration by editing Storage CR. That enables OPENSHIFT_DO_VSPHERE_MIGRATION env. var. everywhere + runs kubelet with –cloud-provider=external.
        1. If we have time, it should not be hard to opt out, just remove the env. var + update kubelet cmdline. Storage / in-tree volume plugin will handle in-tree PVs again, not sure about implications on external CCM.
      2. Once opted-in, it’s not possible to opt out.
      1. Both with opt-in and without it, the cluster is Upgradeable=true. Admin can upgrade to 4.14, CSI migration will be forced there.


      New install

      1. Admin installs a new 4.13 vSphere cluster, with UPI, IPI, Assisted Installer, or Agent-based Installer.
      2. During installation, Storage CR is created with CSI migration enabled
      3. (We want to have it enabled for a new cluster to enable external CCM and have zonal.  This avoids new clusters from having in-tree as default and then having to go through migration later.)
      4. Resulting cluster has OPENSHIFT_DO_VSPHERE_MIGRATION env. var set + kubelet with –cloud-provider=external + topology support.
      5. Admin cannot opt-out after installation, we expect that they use CSI volumes for everything.
      1. If the admin really wants, they can opt-out before installation by adding a Storage install manifest with CSI migration disabled.


      EUS to EUS (4.12 -> 4.14)

      • Will have CSI migration enabled once in 4.14
      • During the upgrade, a cluster will have 4.13 masters with CSI migration disabled (see regular upgrade to 4.13 above) + 4.12 kubelets.
      • Once the masters are 4.14, CSI migration is force-enabled there, still, 4.14 KCM + in-tree volume plugin in it will handle in-tree volume attachments required by kubelets that still have 4.12 (that’s what kcm --external-cloud-volume-plugin=vsphere does).
      • Once both masters + kubelets are 4.14, CSI migration is force enabled everywhere, in-tree volume plugin + cloud provider in KCM is still enabled by --external-cloud-volume-plugin, but it’s not used.
      • Keep in-tree storage class by default
      • A CSI storage class is already available since 4.10
      • Recommend to switch default to CSI
      • Can’t opt out from migration
        Dependencies (internal and external) (mandatory)
      • We need a new FeatureSet in openshift/api that disables CSIMigrationvSphere feature gate.
      • We need kube-apiserver-operator, kube-controller-manager-operator, kube-scheduler-operator, MCO must reconfigure their operands to use in-tree vSphere cloud provider when they see CSIMigrationvSphere FeatureGate disabled.
      • We need cloud controller manager operator to disable its operand when it sees CSIMigrationvSphere FeatureGate disabled.

      Contributing Teams(and contacts) (mandatory) 

      • Development - 
      • Documentation -
      • QE - 
      • PX - 
      • Others -

      Acceptance Criteria (optional)

      Provide some (testable) examples of how we will know if we have achieved the epic goal.  

      Drawbacks or Risk (optional)

      Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

      Done - Checklist (mandatory)

      The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

      • CI Testing -  Basic e2e automationTests are merged and completing successfully
      • Documentation - Content development is complete.
      • QE - Test scenarios are written and executed successfully.
      • Technical Enablement - Slides are complete (if requested by PLM)
      • Engineering Stories Merged
      • All associated work items with the Epic are closed
      • Epic status should be “Release Pending” 




            hekumar@redhat.com Hemant Kumar
            rhn-engineering-jsafrane Jan Safranek
            Rohit Patil Rohit Patil
            0 Vote for this issue
            12 Start watching this issue