Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-650

[2209942] Incomplete networking specifications in ODF documentation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.13
    • odf-4.12
    • Documentation
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Ceph, being a core component of OCS suite, has the network requirements specifically outlined in the product documentation. The crucial part of these requirements are those regarding the bandwidth concerns:

      https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/hardware_guide/index#network-considerations_hw

      --8<--
      2.5. Network considerations

      Carefully consider bandwidth requirements for the cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-cluster traffic.

      Important
      Red Hat recommends using 10 GB Ethernet for Ceph production deployments. 1 GB Ethernet is not suitable for production storage clusters.
      ...
      At a minimum, a single 10 GB Ethernet link should be used for storage hardware. If the Ceph nodes have many drives each, add additional 10 GB Ethernet links for connectivity and throughput.
      -->8--

      However there is nothing said about the required bandwidth in ODF documentation:

      https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/planning_your_deployment/index#network-requirements_rhodf

      The only things mentioned there are the IPv6 addressing and Multus support, which is currently a technology preview feature.

      I believe that the networking requirements for the OCS deployment should include those for standalone Ceph installation, since a slow network between the worker nodes can become a bottleneck causing the communication failures between the Ceph components, which then leads into:

      • poor performance of a storage cluster (both on heavy workloads and/or recovery operations)
      • missed heartbeats from the osd/mon/mgr daemons, resulting into the respective pods crashing or constant monitor re-elections due to quorum changes

      We have actually observed this behaviour in quite a few customer scenarios, where they were running OCS cluster on less than 10G network, and thus facing the above issues.

      Based on the above, i suggest that we should probably update the ODF docs to include the network specifications in the same way, as we currently have them in the Ceph documentation.

      Regards,
      Sergii

              asriram@redhat.com Anjana Sriram
              rhn-support-smykhail Sergii Mykhailushko
              Neha Berry Neha Berry
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: