Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7760

Improve bare metal nodes failure detection

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Improve bare metal nodes failure detection

      2. What is the nature and description of the request?

      Currently when using Node Health Check Operator it's possible to trigger remediation when there's kubernetes API connectivity failure or when the node crashes. However, there are several other failure scenarios that should be considered, mainly for the OpenShift Virtualization use case. Some examples, considering API connectivity is available:

      • Secondary networks not communicating because of link failure.
      • Access to storage is lost (SAN, NAS or SDS).

      3. Why does the customer need this? (List the business requirements here)

      Improve virtual machine based workloads availability, preventing them to continuously run on a node that isn't fully functional despite being "Ready" from kubernetes API standpoint.

      4. List any affected packages or components.

      Node Health Check Operator ?

      5. Previous work

      Node Problem Detector seems to cover the use cases above, but it isn't a supported component in OpenShift.

              racedoro@redhat.com Ramon Acedo
              vagnerfarias Vagner Farias
              None
              Votes:
              9 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                None
                None