Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33532

"failed to find LV" or "No space left on device" on HostedControlPlane

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      This is an LVMS Bug Report:

      Please create & attach a must-gather as indicated by this Guide to collect LVMS relevant data from the Cluster (linked to the latest version, use older versions of the documentation for older OCP releases as applicable

      Please make sure that you describe your storage configuration in detail. List all devices that you plan to work with for LVMS as well as any relevant machine configuration data to make it easier for an engineer to help out.

      Description of problem:

      Customer has a Hosted Control Plane environment, each etcd Pod (3) spawns 1 volume from the LVMS provisioner.
      
      This is set in the *HostedControlPlane* CR owned by the Cluster CR
      
      ~~~
        name: hcp-automation
        namespace: clusters-hcp-automation
      spec:
        autoscaling: {}
        clusterID: a2731297-bac3-4279-90f5-f0ba9fcf891e
        configuration: {}
        controllerAvailabilityPolicy: HighlyAvailable
        dns:
          baseDomain: <DOMAIN>
        etcd:
          managed:
            storage:
              persistentVolume:
                size: 8Gi
                storageClassName: lvms-hcp-etcd
              type: PersistentVolume
          managementType: Managed
      ~~~
      
      2 etcd pods starts fine, there's one that fails to start with error: "xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device" 

      Version-Release number of selected component (if applicable):

      - OCP 4.15.2(Baremetal) 
      - LVM Storage 4.15.3 

      Steps to Reproduce:

      - Not sure at the moment, but customer mentioned this happens for multiple HCP clusters: * This is reproducible for any number of clusters. I have already attached hypershift dump from 2 clusters having similar issue "hcp-cluster-small" and "hcp-automation" 
      

      Actual results:

      - HCP cluster is not fully up waiting for etcd to start
          

      Expected results:

      - etcd pod to be able to generate the LogicalVolume to start and the HCP cluster to complete its provisioning phase.
      

      Additional info:

       

              sakbas@redhat.com Suleyman Akbas
              rhn-support-jcoscia Javier Coscia
              Rahul Deore Rahul Deore
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: