Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49771

Scaling MachineSet from zero with CSI storage requests is failing in OpenShift Container Platform 4

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.16
    • Cluster Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      accessDescription of problem:

      The use case is autoscaling nodes with local disks, where a CSI driver is able to provide a StorageClass with capacity once the node is scaled up. An example of such a CSI driver is the LVM Storage Operator. When properly configured, a new node will automatically add the local disk to the StorageClass capacity.
      However when a MachineSet is configured with a MachineAutoscaler that scales from 0, a new Pod that requests hyperconverged storage from a CSI driver is stuck in "Pending".
      
      

       

       

      Version-Release number of selected component (if applicable):

          4.16

      How reproducible:

          Most of the times but not always.

      Steps to Reproduce:

          1. Create a LVM Storage storageclass by way of LVMCluster resource. The LVMCluster resource has a nodeSelector targetting nodes created by a MachineSet. The LVM Storage operator will create a StorageClass and the CSI related objects (CSIStorageCapacity, CSIDriver) although CSIStorageCapacity.capacity is empty.
          2. Create a MachineSet, where the template.spec.metadata contains a label matching the nodeSelector we defined in 1. We also set capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk` with a disk size to the machine set as explained in the resolution of https://access.redhat.com/solutions/7041119
          3. Create a Pod with a PersistentVolumeClaim pointing to the StorageClass defined at 1.
          

      Actual results:

      The Pod scheduling will fail, and the AutoScaler will not create Machines to satisfy the scheduling.
      
      $ oc get pod nginx-cb7659cfd-2dlqj
      NAME                    READY   STATUS    RESTARTS   AGE
      nginx-cb7659cfd-2dlqj   0/1     Pending   0          42s
      
      $ oc get events
      21s         Normal    NotTriggerScaleUp      pod/nginx-cb7659cfd-2dlqj            pod didn't trigger scale-up: 3 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {kubernetes.io/arch: arm64}, 1 node(s) did not have enough free storage
      35s         Normal    WaitForPodScheduled    persistentvolumeclaim/my-lvm-claim   waiting for pod nginx-cb7659cfd-2dlqj to be scheduled
      44s         Normal    ScalingReplicaSet      deployment/nginx                     Scaled up replica set nginx-cb7659cfd to 1 from 0
      44s         Normal    SuccessfulCreate       replicaset/nginx-cb7659cfd           Created pod: nginx-cb7659cfd-2dlqj
      44s         Warning   FailedScheduling       pod/nginx-cb7659cfd-2dlqj            0/8 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
      1m33s       Warning   NotEnoughCapacity      persistentvolumeclaim/my-lvm-claim   Requested storage (1Gi) is greater than available capacity on any node ().
      5m          Normal    WaitForFirstConsumer   persistentvolumeclaim/my-lvm-claim   waiting for first consumer to be created before binding
      
      See cluster autoscaler logs below.

       

      Expected results:

      Machine creation is triggered, Pod is scheduled and running  

      Additional info:

      workaround
      ==========
      
      It is possible to trigger the autoscaling and correct Pod scheduling by setting CSIDriver spec.storageCapacity=false. However by doing this, we take the risk of scheduling Pods with storage requests which are actually impossible to satisfy.    

              mimccune@redhat.com Michael McCune
              rhn-support-ekasprzy Emmanuel Kasprzyk (Inactive)
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: