-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.12
-
None
Ceph, being a core component of OCS suite, has the network requirements specifically outlined in the product documentation. The crucial part of these requirements are those regarding the bandwidth concerns:
--8<--
2.5. Network considerations
Carefully consider bandwidth requirements for the cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-cluster traffic.
Important
Red Hat recommends using 10 GB Ethernet for Ceph production deployments. 1 GB Ethernet is not suitable for production storage clusters.
...
At a minimum, a single 10 GB Ethernet link should be used for storage hardware. If the Ceph nodes have many drives each, add additional 10 GB Ethernet links for connectivity and throughput.
-->8--
However there is nothing said about the required bandwidth in ODF documentation:
The only things mentioned there are the IPv6 addressing and Multus support, which is currently a technology preview feature.
I believe that the networking requirements for the OCS deployment should include those for standalone Ceph installation, since a slow network between the worker nodes can become a bottleneck causing the communication failures between the Ceph components, which then leads into:
- poor performance of a storage cluster (both on heavy workloads and/or recovery operations)
- missed heartbeats from the osd/mon/mgr daemons, resulting into the respective pods crashing or constant monitor re-elections due to quorum changes
We have actually observed this behaviour in quite a few customer scenarios, where they were running OCS cluster on less than 10G network, and thus facing the above issues.
Based on the above, i suggest that we should probably update the ODF docs to include the network specifications in the same way, as we currently have them in the Ceph documentation.
Regards,
Sergii