Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-60410

Faster remediation start with baremetal events

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • CNV Infrastructure
    • None
    • remediation-with-baremetal-events
    • Product / Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • In Progress
    • VIRTSTRAT-77 - Fencing: Additional out-of-band health checks for faster remediation
    • VIRTSTRAT-77Fencing: Additional out-of-band health checks for faster remediation
    • 11% To Do, 0% In Progress, 89% Done

      K8s / OCP needs some time to detect unhealthy nodes [1], which adds a significant delay until remediation starts. Decreasing the relevant timeouts and intervals potentially introduces risks and isn't wanted [2]

      An alternative could be to use the Baremetal Event Relay [3] to watch for relevant Redfish events. For converting those events into node conditions for NHC, knative [4] might be useful.

      [1] https://docs.google.com/document/d/1NKZBTu4UFaCR-tJgkrPR7-DHLabe4MtSL1L8Qfws0sU/edit#heading=h.x3fr6xu7zkbj
      [2] https://issues.redhat.com/browse/RFE-3727
      [3] https://docs.openshift.com/container-platform/4.13/monitoring/using-rfhe.html
      [4] https://docs.openshift.com/serverless/1.29/about/about-knative-eventing.html

              fmatousc@redhat.com Felix Matouschek
              slintes Marc Sluiter
              Geetika Kapoor Geetika Kapoor
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: