Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42241

[CAPI Azure] Failed to provision machines when setting controlPlane instnace type as Standard_M8-4ms

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Installer Sprint 260, Installer Sprint 261, Installer Sprint 262, Installer (PB) Sprint 263, Installer (PB) Sprint 265, Installer Sprint 267, Installer Sprint 268
    • 7
    • Done
    • Bug Fix
    • Hide
      * Previously, when installing a cluster on {azure-full}, specifying the `Standard_M8-4ms` instance type resulted in an error due to that instance type specifying its memory in decimal format instead of integer format. With this update, the installation program correctly parses the memory value. (link:https://issues.redhat.com/browse/OCPBUGS-42241[OCPBUGS-42241])
      Show
      * Previously, when installing a cluster on {azure-full}, specifying the `Standard_M8-4ms` instance type resulted in an error due to that instance type specifying its memory in decimal format instead of integer format. With this update, the installation program correctly parses the memory value. (link: https://issues.redhat.com/browse/OCPBUGS-42241 [ OCPBUGS-42241 ])
    • None
    • None
    • None
    • None

      Description of problem:

      Create cluster on instance type Standard_M8-4ms, installer failed to provision machines.
      
      install-config:
      ================
      controlPlane:
        architecture: amd64
        hyperthreading: Enabled
        name: master
        platform:
          azure:
            type: Standard_M8-4ms
      
      Create cluster:
      =====================
      $ ./openshift-install create cluster --dir ipi3
      INFO Waiting up to 15m0s (until 2:31AM UTC) for machines [jimainstance01-h45wv-bootstrap jimainstance01-h45wv-master-0 jimainstance01-h45wv-master-1 jimainstance01-h45wv-master-2] to provision... 
      ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: control-plane machines were not provisioned within 15m0s: client rate limiter Wait returned an error: context deadline exceeded 
      INFO Shutting down local Cluster API controllers... 
      INFO Stopped controller: Cluster API              
      WARNING process cluster-api-provider-azure exited with error: signal: killed 
      INFO Stopped controller: azure infrastructure provider 
      INFO Stopped controller: azureaso infrastructure provider 
      INFO Shutting down local Cluster API control plane... 
      INFO Local Cluster API system has completed operation
      
      In openshift-install.log, all machines were created failed with below error:
      =================
      time="2024-09-20T02:17:07Z" level=debug msg="I0920 02:17:07.757980 1747698 recorder.go:104] \"failed to reconcile AzureMachine: failed to reconcile AzureMachine service virtualmachine: failed to get desired parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine): reconcile error that cannot be recovered occurred: failed to validate the memory capability: failed to parse string '218.75' as int64: strconv.ParseInt: parsing \\\"218.75\\\": invalid syntax. Object will not be requeued\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AzureMachine\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"jimainstance01-h45wv-bootstrap\",\"uid\":\"d67a2010-f489-44b4-9be9-88d7b136a45b\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta1\",\"resourceVersion\":\"1530\"} reason=\"ReconcileError\""
      ...
      time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-bootstrap has provisioned..."
      time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-bootstrap has not yet provisioned: Failed"
      time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-0 has provisioned..."
      time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-0 has not yet provisioned: Failed"
      time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-1 has provisioned..."
      time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-1 has not yet provisioned: Failed"
      time="2024-09-20T02:17:12Z" level=debug msg="Checking that machine jimainstance01-h45wv-master-2 has provisioned..."
      time="2024-09-20T02:17:12Z" level=debug msg="Machine jimainstance01-h45wv-master-2 has not yet provisioned: Failed"
      ... 
      
      Also see same error in .clusterapi_output/Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml
      ===================
      $ yq-go r Machine-openshift-cluster-api-guests-jimainstance01-h45wv-bootstrap.yaml 'status'
      noderef: null
      nodeinfo: null
      lastupdated: "2024-09-20T02:17:07Z"
      failurereason: CreateError
      failuremessage: 'Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1beta1,
        Kind=AzureMachine with name "jimainstance01-h45wv-bootstrap": failed to reconcile
        AzureMachine service virtualmachine: failed to get desired parameters for resource
        jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap (service: virtualmachine):
        reconcile error that cannot be recovered occurred: failed to validate the memory
        capability: failed to parse string ''218.75'' as int64: strconv.ParseInt: parsing
        "218.75": invalid syntax. Object will not be requeued'
      addresses: []
      phase: Failed
      certificatesexpirydate: null
      bootstrapready: false
      infrastructureready: false
      observedgeneration: 1
      conditions:
      - type: Ready
        status: "False"
        severity: Error
        lasttransitiontime: "2024-09-20T02:17:07Z"
        reason: Failed
        message: 0 of 2 completed
      - type: InfrastructureReady
        status: "False"
        severity: Error
        lasttransitiontime: "2024-09-20T02:17:07Z"
        reason: Failed
        message: 'virtualmachine failed to create or update. err: failed to get desired
          parameters for resource jimainstance01-h45wv-rg/jimainstance01-h45wv-bootstrap
          (service: virtualmachine): reconcile error that cannot be recovered occurred:
          failed to validate the memory capability: failed to parse string ''218.75'' as
          int64: strconv.ParseInt: parsing "218.75": invalid syntax. Object will not be
          requeued'
      - type: NodeHealthy
        status: "False"
        severity: Info
        lasttransitiontime: "2024-09-20T02:16:27Z"
        reason: WaitingForNodeRef
        message: ""
      
      
      From above error, seems unable to parse the memory of instance type Standard_M8-4ms, which is a decimal, not an integer.
      
      $ az vm list-skus --size Standard_M8-4ms  --location southcentralus | jq -r '.[].capabilities[] | select(.name=="MemoryGB")'
      {
        "name": "MemoryGB",
        "value": "218.75"
      }

      Version-Release number of selected component (if applicable):

      4.17.0-0.nightly-2024-09-16-082730

       

      How reproducible:

       Always

      Steps to Reproduce:

          1. set controlPlane type as Standard_M8-4ms in install-config
          2. create cluster
          3.
          

      Actual results:

          Installation failed

      Expected results:

          Installation succeeded

      Additional info:

          

              sdasu@redhat.com Sandhya Dasu
              jinyunma Jinyun Ma
              None
              None
              Jinyun Ma Jinyun Ma
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: