Uploaded image for project: 'OpenShift Windows Containers'
  1. OpenShift Windows Containers
  2. WINC-657

Health management for Windows nodes

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • WMCO 8.0.0
    • None
    • None
    • None
    • Health management
    • False
    • False
    • Green
    • Done
    • 0% To Do, 0% In Progress, 100% Done
    • Undefined
    • XL
    • Hide

      On track

      Show
      On track

      Epic Goal

      • Ensure that expected number of Windows nodes in a cluster exist in a usable state.

      Why is this important?

      • As an OpenShift cluster administrator, I expect that the number usable Windows nodes is always equal to the number I have have specified. I do not want to have to concern myself with keeping track of the state of the nodes myself, checking if they have entered an unusable state.
      • Windows nodes should be resilient and not require manual intervention to fix small issues.

      Scenarios

      1. A Windows node has a Kubernetes node binary crash. A controller running on the Node will recognize this and work to return the Windows node to a working state. If this is not possible, an event is generated alerting the cluster administrator to the issue.
      2. A Windows node configured from a Machine enters an unrecoverable state. Remediation of the node can be left to user defined MachineHealthChecks.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Engineering Details

      • This should take the form of a Windows service, replacing WMCB.
      • The expected state of Windows services on a node should be given by a ConfigMap managed by WMCO.
      • This epic will result in much of the work WMCO is doing to be moved onto the nodes being configured. This will allow for WMCO to scale better as the amount of Windows nodes in a cluster rises.

      Dependencies (internal and external)

      1. ...

      Open questions:

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

          1.
          QE Tracker Sub-task Closed Undefined Unassigned
          2.
          Docs Tracker Sub-task Closed Undefined Michael Burke
          3.
          TE Tracker Sub-task Closed Undefined Unassigned

              rh-ee-ssoto Sebastian Soto
              rh-ee-ssoto Sebastian Soto
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: