Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-5039

RHOCP control plane machineset for machine healthcheck on bare metal clusters

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • openshift-4.11, openshift-4.12, openshift-4.13, openshift-4.14
    • MCO
    • None
    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request
      "Control node failure handling"

      2. What is the nature and description of the request?
      It should be possible to avoid user intervention to recover a crashed control plane node ( In one of the customer trails we observed kernel crash of a node due to storage driver where a STS POD is stuck and due to K8S native behavior of not relocating STS, user intervention was needed to reboot the node manually. This resulted in application outage for 1 day to recognize and recover ). it is observed that those STS POD's are moved to other healthy node only after node reboot. ( Here basically K8S cplane has lost connectivity kubelet ).

      3. Why does the customer need this? (List the business requirements here)
      We are using Master node as schedulable ( workloads include STS PODS ). we are already utilising selfnoderemedation operator to address worker node failure. Similarly we need a solution in case STS PODS running on master node.

      4. List any affected packages or components. - Not sure on affected packages

              rhn-support-mrussell Mark Russell
              rhn-support-akanekar Ankita Kanekar
              Mark Russell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: