Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.15.z
Component/s: Logical Volume Manager Storage
Labels:
- customer
- customer-bug
- customer-facing
- lvms
- ocpedge
- triaged

Severity:
Important
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

This is an LVMS Bug Report:

Please create & attach a must-gather as indicated by this Guide to collect LVMS relevant data from the Cluster (linked to the latest version, use older versions of the documentation for older OCP releases as applicable

Please make sure that you describe your storage configuration in detail. List all devices that you plan to work with for LVMS as well as any relevant machine configuration data to make it easier for an engineer to help out.

Description of problem:

Customer has a Hosted Control Plane environment, each etcd Pod (3) spawns 1 volume from the LVMS provisioner.

This is set in the *HostedControlPlane* CR owned by the Cluster CR

~~~
  name: hcp-automation
  namespace: clusters-hcp-automation
spec:
  autoscaling: {}
  clusterID: a2731297-bac3-4279-90f5-f0ba9fcf891e
  configuration: {}
  controllerAvailabilityPolicy: HighlyAvailable
  dns:
    baseDomain: <DOMAIN>
  etcd:
    managed:
      storage:
        persistentVolume:
          size: 8Gi
          storageClassName: lvms-hcp-etcd
        type: PersistentVolume
    managementType: Managed
~~~

2 etcd pods starts fine, there's one that fails to start with error: "xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device"

Version-Release number of selected component (if applicable):

- OCP 4.15.2(Baremetal) 
- LVM Storage 4.15.3

Steps to Reproduce:

- Not sure at the moment, but customer mentioned this happens for multiple HCP clusters: * This is reproducible for any number of clusters. I have already attached hypershift dump from 2 clusters having similar issue "hcp-cluster-small" and "hcp-automation"

Actual results:

- HCP cluster is not fully up waiting for etcd to start

Expected results:

- etcd pod to be able to generate the LogicalVolume to start and the HCP cluster to complete its provisioning phase.

Additional info:

is triggering

OCPBUGS-35072 LVMS - Guideline to troubleshoot a single node in a multi-node cluster is missing

Closed

Assignee:: Suleyman Akbas

Reporter:: Javier Coscia

QA Contact:: Rahul Deore

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/05/10 2:20 PM

Updated:: 2024/06/07 11:00 AM

Resolved:: 2024/06/07 11:00 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates