-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
should recommend usage of machinehealthchecks for VMs with Ephemeral OS disks https://docs.openshift.com/container-platform/4.14/machine_management/creating_machinesets/creating-machineset-azure.html#machineset-creating-azure-ephemeral-os_creating-machineset-azure
Version-Release number of selected component (if applicable):
OCP 4.14.23
How reproducible:
running a lot of VM with Ephemeral OS disks on Azure
Steps to Reproduce:
1. wait for Azure infrastructure to fail 2. see error: "title": "Redeploying due to host failure", 3. see machine in "Running" state but node "NotReady" and pending csr not approved
Actual results:
VM is redeployed but csr does not get approved. Approving the csr manually, when running ovn-kubernetes as CNI does not make the node ready as the node chassis-id has changed meanwhile
Expected results:
machine automatically rebuilt without user intervention
Additional info:
I would like the doc to be updated so that usage of MachineHealthCheck is recommended for VMs that use ephemeral OS disks