Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-565

[2292442] [RFE] Make mon out timeout configurable

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Rook Ceph allows setting cephcluster.spec.healthCheck.daemonHealth.mon.timeout to a custom value. It can also be set to 0 which disables the mon failover. We would like this value to be configurable in ODF including the option to disable it.

      For mon failover, ODF currently uses a default value of 10 minutes. It doesn't look like it can be changed. The 10-minute value is too low for our use case: we deploy ODF on bare metal clusters with OpenShift Virtualization. During node draining, the virtual machines are live migrated away from the node. The live migration process can take 40-60 minutes depending on how many virtual machines are on the node and how fast the virtual machine memory can be copied over the network to another cluster node. Due to the mon failover value being too low, a failover of all three monitors occurs for us on every OpenShift upgrade.

      We would like the option to disable the mon failover as well. Recently, we had a scenario (https://bugzilla.redhat.com/show_bug.cgi?id=2292435) where the mon failover likely caused a Ceph mon outage. In the interim, until this issue is confirmed and fixed, we would like to disable the mon failover.

              nigoyal Nitin Goyal
              anosek@redhat.com Ales Nosek
              Elad Ben Aharon Elad Ben Aharon
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: