Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: API
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Hierarchy Progress:
0
Hierarchy Progress Bar:

0% 0%
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Links:

1. Proposed title of this feature request

Fast baremetal node fencing.

2. What is the nature and description of the request?

Currently metal3 components running in openshift-machine-api are configured to run as replica 1 with toleration for node.kubernetes.io/not-ready and node.kubernetes.io/unreachable set to 120 seconds.

This makes node fencing unnecessary long in case the outage affect the node where these resources were running - they have to be recreated on the remaining nodes before the affected node will be fenced with the extra 120 seconds of delay.

3. Why does the customer need this? (List the business requirements here)

During the PoC we were running baremetal 3 master nodes compact cluster. It was installed with Agent Based Installer. The customer has defined one of the use-cases to ensure OCP-V based VMs are being rescheduled fast in case of single master node failure. To fence the node and allow VM restart on the remaining nodes I'm using MachineHealthCheck as described here:

The problem I observe is with time required to fence the node if the node is hosting crucial to the fencing process pods, for an instance:

$ oc get pods -n openshift-machine-api -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-autoscaler-operator-74f9b6b57c-f9r7t 2/2 Running 10 (2d19h ago) 3d 10.132.0.89 node2 <none> <none>
cluster-baremetal-operator-79c464cc4d-v7hdl 2/2 Running 4 3d 10.132.0.96 node2 <none> <none>
control-plane-machine-set-operator-5f94b56df6-lhlhx 1/1 Running 8 (2d19h ago) 3d 10.132.0.125 node2 <none> <none>
ironic-proxy-b6qbh 1/1 Running 0 2d21h 10.90.26.21 node1 <none> <none>
ironic-proxy-lfmq8 1/1 Running 0 2d19h 10.90.26.23 node3 <none> <none>
ironic-proxy-sdtfc 1/1 Running 2 3d1h 10.90.26.22 node2 <none> <none>
machine-api-controllers-7f65fc86-npn98 7/7 Running 43 (2d19h ago) 3d1h 10.132.0.8 node2 <none> <none>
machine-api-operator-768598bf68-gqs7d 2/2 Running 6 (2d19h ago) 3d1h 10.132.0.49 node2 <none> <none>
metal3-55fc56d99b-tcbk6 5/5 Running 10 3d1h 10.90.26.22 node2 <none> <none>
metal3-image-customization-dc549f75b-wwb9n 1/1 Running 2 3d1h 10.132.0.45 node2 <none> <none>

If crash affects node2, these pods have to be restarted first on the other nodes before they can fence node2.

Couldn't we run all necessary to the fencing process resources on more nodes simultaneously? This would speedup fencing process a lot. If that's difficult to achieve due to the initial design would it be possible to minimise delay for metal3 pods to reschedule after node hosting them becomes not-ready or unreachable?

4. List any affected packages or components.

openshift-machine-api

Assignee:: William Caban

Reporter:: Rafal Szmigiel

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/04/17 8:26 AM

Updated:: 2023/07/13 12:12 PM

Resolved:: 2023/07/12 3:53 PM

Details

Description

Attachments

Activity

People

Dates