Uploaded image for project: 'OpenStack as Infra'
  1. OpenStack as Infra
  2. OSASINFRA-3650

Add topology-awareness to Cinder CSI Driver

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • openshift-4.19
    • openshift-4.19
    • None
    • None
    • add-cinder-csi-topology-smarts
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 33% To Do, 33% In Progress, 33% Done
    • S

      Goal

      The Cinder CSI driver reports the VOLUME_ACCESSIBILITY_CONSTRAINTS plugin capability, meaning it supports Topology-Aware Volume Provisioning, as described in the k8s CSI docs.

      Since OpenStack does not provide a mechanism to map compute nodes to block storage AZs, the Cinder CSI driver treats the compute AZ as a block storage AZ, assuming that the operator has used the same naming convention across their deployment (that is, if there are three compute AZs, az-0, az-1, and az-2, then there will always be at least three block storage AZs with the same name and same semantic meaning (e.g. azN implies a particular rack, room, or data center for both the compute and block storage services). This is a reasonable position and is one the Nova project endorses, however, it isn't always true. Where a deployment is not doing and has divergent compute and block storage AZs, the Cinder CSI driver can end up requesting volumes with block storage AZs that don't exist.

      The way we have worked around this to date is to selectively enable or disable the topology feature flag provided to the external provisioner side car container, as deployed and managed by the Cinder CSI Driver Operator. This feature flag is being removed in a future release (when?), which means we can't rely on this long-term. We should therefore port the logic for determining whether or not to enable the topology feature from the Cinder CSI Driver Operator to the Cinder CSI Driver itself. Once this is done, we should remove the logic from the Operator since it should no longer be needed and will eventually not be supported.

      This epic tracks the above work.

      Why is this important?

      If we don't do this, we would lose the ability to disable the topology feature in environment where this is not supported (due to mismatched compute and block storage AZ sets). This will affect a number of customers.

      Scenarios

      TODO.

      Acceptance Criteria

      • Upstream CI
      • Downstream CI

      Dependencies (internal and external)

      This work will mainly take place upstream. There should be no dependency on other teams, other than potentially reviews for the Operator changes.

      Previous Work (Optional):

      None.

      Testing

      Given this is a rework of an existing feature rather than a wholly new feature, we expect our existing tests (in openstack-test) to cover much of this and prevent regressions. We will need to manually test the negative case, where there is a mismatch between the set of Cinder AZs and set of Nova AZs, but this should be trivial to do.

      Open questions::

      None.

              sfinucan@redhat.com Stephen Finucane
              sfinucan@redhat.com Stephen Finucane
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: