-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17, 4.18
-
Moderate
-
None
-
Installer Sprint 260, Installer Sprint 261, Installer Sprint 262
-
3
-
False
-
-
-
Known Issue
-
Done
Description of problem:
Create cluster on instance type Standard_M8-4ms, installer failed to provision machines. install-config: ================ controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: azure: type: Standard_M8-4ms Create cluster: ===================== $ ./openshift-install create cluster --dir ipi3 INFO Waiting up to 15m0s (until 2:31AM UTC) for machines [jimainstance01-h45wv-bootstrap jimainstance01-h45wv-master-0 jimainstance01-h45wv-master-1 jimainstance01-h45wv-master-2] to provision... ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: control-plane machines were not provisioned within 15m0s: client rate limiter Wait returned an error: context deadline exceeded INFO Shutting down local Cluster API controllers... INFO Stopped controller: Cluster API WARNING process cluster-api-provider-azure exited with error: signal: killed INFO Stopped controller: azure infrastructure provider INFO Stopped controller: azureaso infrastructure provider INFO Shutting down local Cluster API control plane... INFO Local Cluster API system has completed operation In openshift-install.log, all machines were created failed with below error: ================= time="2024-09-20T02:17:07Z" level=debug msg="I0920 02:17:07.757980 1747698 recorder.go:104] \"failed to reconcile AzureMachine: failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string '218.75' as int64: strconv.ParseInt: parsing \\\"218.75\\\": invalid syntax. Object will not be requeued\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AzureMachine\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"jimainstance01-h45wv-bootstrap\",\"uid\":\"d67a2010-f489-44b4-9be9-88d7b136a45b\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta1\",\"resourceVersion\":\"1530\"} reason=\"ReconcileError\"" ... time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-bootstrap has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-bootstrap has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-0 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-0 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-1 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-1 has not yet provisioned: Failed" time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-2 has provisioned..." time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-2 has not yet provisioned: Failed" ... Also see same error in .clusterapi_output/Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml =================== $ yq-go r Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml 'status' noderef: null nodeinfo: null lastupdated: "2024-09-20T02:17:07Z" failurereason: CreateError failuremessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1beta1, Kind=AzureMachine with name "jimainstance01-h45wv-bootstrap": failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' addresses: [] phase: Failed certificatesexpirydate: null bootstrapready: false infrastructureready: false observedgeneration: 1 conditions: - type: Ready status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 0 of 2 completed - type: InfrastructureReady status: "False" severity: Error lasttransitiontime: "2024-09-20T02:17:07Z" reason: Failed message: 'virtualmachine failed to create or update. err: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be requeued' - type: NodeHealthy status: "False" severity: Info lasttransitiontime: "2024-09-20T02:16:27Z" reason: WaitingForNodeRef message: "" From above error, seems unable to parse the memory of instance type Standard_M8-4ms, which is a decimal, not an integer. $ az vm list-skus --size Standard_M8-4ms --location southcentralus | jq -r '.[].capabilities[] | select(.name=="MemoryGB")' { "name": "MemoryGB", "value": "218.75" }
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-09-16-082730
How reproducible:
Always
Steps to Reproduce:
1. set controlPlane type as Standard_M8-4ms in install-config 2. create cluster 3.
Actual results:
Installation failed
Expected results:
Installation succeeded
Additional info: