-
Bug
-
Resolution: Done-Errata
-
Major
-
4.13, 4.12.0
-
None
-
Rejected
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
Nutanix machine without enough memory stuck in Provisioning and machineset scale/delete cannot work
Version-Release number of selected component (if applicable):
Server Version: 4.12.0 4.13.0-0.nightly-2023-01-17-152326
How reproducible:
Always
Steps to Reproduce:
1. Install Nutanix Cluster Template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/tree/master/functionality-testing/aos-4_12/ipi-on-nutanix//versioned-installer master_num_memory: 32768 worker_num_memory: 16384 networkType: "OVNKubernetes" installer_payload_image: quay.io/openshift-release-dev/ocp-release:4.12.0-x86_64 2. 3. Scale up the cluster worker machineset from 2 replicas to 40 replicas 4. Install a Infra machinesets with 3 replicas, and a Workload machinesets with 1 replica Refer to this doc https://docs.openshift.com/container-platform/4.11/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-nutanix_creating-infrastructure-machinesets and config the following resource VCPU=16 MEMORYMB=65536 MEMORYSIZE=64Gi
Actual results:
1. The new infra machines stuck in 'Provisioning' status for about 3 hours. % oc get machines -A | grep Prov openshift-machine-api qili-nut-big-jh468-infra-48mdt Provisioning 175m openshift-machine-api qili-nut-big-jh468-infra-jnznv Provisioning 175m openshift-machine-api qili-nut-big-jh468-infra-xp7xb Provisioning 175m 2. Checking the Nutanix web console, I found infra machine 'qili-nut-big-jh468-infra-jnznv' had the following msg " No host has enough available memory for VM qili-nut-big-jh468-infra-48mdt (8d7eb6d6-a71e-4943-943a-397596f30db2) that uses 4 vCPUs and 65536MB of memory. You could try downsizing the VM, increasing host memory, power off some VMs, or moving the VM to a different host. Maximum allowable VM size is approximately 17921 MB " infra machine 'qili-nut-big-jh468-infra-jnznv' is not round infra machine 'qili-nut-big-jh468-infra-xp7xb' is in green without warning. But In must gather I found some error: 03:23:49openshift-machine-apinutanixcontrollerqili-nut-big-jh468-infra-xp7xbFailedCreateqili-nut-big-jh468-infra-xp7xb: reconciler failed to Create machine: failed to update machine with vm state: qili-nut-big-jh468-infra-xp7xb: failed to get node qili-nut-big-jh468-infra-xp7xb: Node "qili-nut-big-jh468-infra-xp7xb" not found 3. Scale down the worker machineset from 40 replicas to 30 replicas can not work. Still have 40 Running worker machines and 40 Ready nodes after about 3 hours. % oc get machinesets -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api qili-nut-big-jh468-infra 3 3 176m openshift-machine-api qili-nut-big-jh468-worker 30 30 30 30 5h1m openshift-machine-api qili-nut-big-jh468-workload 1 1 176m % oc get machines -A | grep worker| grep Running -c 40 % oc get nodes | grep worker | grep Ready -c 40 4. I delete the infra machineset, but the machines still in Provisioning status and won't get deleted % oc delete machineset -n openshift-machine-api qili-nut-big-jh468-infra machineset.machine.openshift.io "qili-nut-big-jh468-infra" deleted % oc get machinesets -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api qili-nut-big-jh468-worker 30 30 30 30 5h26m openshift-machine-api qili-nut-big-jh468-workload 1 1 3h21m % oc get machines -A | grep -v Running NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api qili-nut-big-jh468-infra-48mdt Provisioning 3h22m openshift-machine-api qili-nut-big-jh468-infra-jnznv Provisioning 3h22m openshift-machine-api qili-nut-big-jh468-infra-xp7xb Provisioning 3h22m openshift-machine-api qili-nut-big-jh468-workload-qdkvd 3h22m
Expected results:
The new infra machines should be either Running or Failed. Cluster worker machinest scaleup and down should not be impacted.
Additional info:
must-gather download url will be added to the comment.
- blocks
-
OCPBUGS-19731 [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines
- Closed
- is cloned by
-
OCPBUGS-19731 [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines
- Closed
- links to
-
RHEA-2023:7198 rpm