Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-5039

RHOCP control plane machineset for machine healthcheck on bare metal clusters

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • openshift-4.11, openshift-4.12, openshift-4.13, openshift-4.14
    • MCO
    • None
    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

      1. Proposed title of this feature request
      "Control node failure handling"

      2. What is the nature and description of the request?
      It should be possible to avoid user intervention to recover a crashed control plane node ( In one of the customer trails we observed kernel crash of a node due to storage driver where a STS POD is stuck and due to K8S native behavior of not relocating STS, user intervention was needed to reboot the node manually. This resulted in application outage for 1 day to recognize and recover ). it is observed that those STS POD's are moved to other healthy node only after node reboot. ( Here basically K8S cplane has lost connectivity kubelet ).

      3. Why does the customer need this? (List the business requirements here)
      We are using Master node as schedulable ( workloads include STS PODS ). we are already utilising selfnoderemedation operator to address worker node failure. Similarly we need a solution in case STS PODS running on master node.

      4. List any affected packages or components. - Not sure on affected packages

            rhn-support-mrussell Mark Russell
            rhn-support-akanekar Ankita Kanekar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: