Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-15008

[2025401] [TEST ONLY] [CNV+OCS/ODF] Virtualization poison pill implemenation

XMLWordPrintable

    • CNV Infra 221
    • Important
    • None

      Description of problem:
      In HCI environment(CNV+ODF) to see if the automated poison pill approach could cause ODF mon quorum loss or VM outages for OCP Virt. Or has it been tested with these products already?

      NHC/PP only kicks in once there has been a failure... so ODF would have already lost quorum (because the mon already failed, or was unreachable), we're trying to get it back by recovering the node. Same for the VMs... either it's already dead or unreachable, and we're giving you the chance to bring it up somewhere else.

      The most common reason that fencing might "create downtime" is when admins set overly aggressive timeouts (causing NHC/PP to react to every little blip), and under spec the machines (causing the process keeping the watchdog alive to be starved of CPU).

      We can't control the hardware, but we have some rules in place for configuring timeouts, and have more planned for future releases.

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              bodnopoz@redhat.com Boris Odnopozov
              godas@redhat.com Gobinda Das
              Geetika Kapoor Geetika Kapoor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: