Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: odf-4.12
Component/s: Documentation
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2209942
Dev Approval:
?
QE Approval:
?
Release Note Type:
If docs needed, set a value
Target Release:

odf-4.13.z
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Ceph, being a core component of OCS suite, has the network requirements specifically outlined in the product documentation. The crucial part of these requirements are those regarding the bandwidth concerns:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/hardware_guide/index#network-considerations_hw

--8<--
2.5. Network considerations

Carefully consider bandwidth requirements for the cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-cluster traffic.

Important
Red Hat recommends using 10 GB Ethernet for Ceph production deployments. 1 GB Ethernet is not suitable for production storage clusters.
...
At a minimum, a single 10 GB Ethernet link should be used for storage hardware. If the Ceph nodes have many drives each, add additional 10 GB Ethernet links for connectivity and throughput.
-->8--

However there is nothing said about the required bandwidth in ODF documentation:

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/planning_your_deployment/index#network-requirements_rhodf

The only things mentioned there are the IPv6 addressing and Multus support, which is currently a technology preview feature.

I believe that the networking requirements for the OCS deployment should include those for standalone Ceph installation, since a slow network between the worker nodes can become a bottleneck causing the communication failures between the Ceph components, which then leads into:

poor performance of a storage cluster (both on heavy workloads and/or recovery operations)
missed heartbeats from the osd/mon/mgr daemons, resulting into the respective pods crashing or constant monitor re-elections due to quorum changes

We have actually observed this behaviour in quite a few customer scenarios, where they were running OCS cluster on less than 10G network, and thus facing the above issues.

Based on the above, i suggest that we should probably update the ODF docs to include the network specifications in the same way, as we currently have them in the Ceph documentation.

Regards,
Sergii

Assignee:: Anjana Sriram

Reporter:: Sergii Mykhailushko (Inactive)

QA Contact:: Neha Berry

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2023/05/25 9:44 AM

Updated:: 2024/11/29 1:14 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty