-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14
-
None
-
Low
-
No
-
CLOUD Sprint 234, CLOUD Sprint 235, CLOUD Sprint 236, CLOUD Sprint 237
-
4
-
False
-
Description of problem:
When creating machine and attaching Azure Ultra Disks as Data Disks in Arm cluster, machine is Provisioned, but checked in azure web console, instance is failed with error ZonalAllocationFailed.
Version-Release number of selected component (if applicable):
4.13.0-0.nightly-arm64-2023-03-22-204044
How reproducible:
Always
Steps to Reproduce:
/// Not Needed up to point 6 //// 1. Make sure storagecluster is already present kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: ultra-disk-sc provisioner: disk.csi.azure.com # replace with "kubernetes.io/azure-disk" if aks version is less than 1.21 volumeBindingMode: WaitForFirstConsumer # optional, but recommended if you want to wait until the pod that will use this disk is created parameters: skuname: UltraSSD_LRS kind: managed cachingMode: None diskIopsReadWrite: "2000" # minimum value: 2 IOPS/GiB diskMbpsReadWrite: "320" # minimum value: 0.032/GiB 2. Create a new custom secret using the worker-data-secret $ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.userData | base64decode}}' | jq > userData.txt 3. Edit the userData.txt by adding below part just before the ending '}' and add a comma "storage": { "disks": [ { "device": "/dev/disk/azure/scsi1/lun0", "partitions": [ { "label": "lun0p1", "sizeMiB": 1024, "startMiB": 0 } ] } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/lun0p1", "format": "xfs", "path": "/var/lib/lun0p1" } ] }, "systemd": { "units": [ { "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var/lib/lun0p1\nWhat=/dev/disk/by-partlabel/lun0p1\nOptions=defaults,pquota\n[Install]\nWantedBy=local-fs.target\n", "enabled": true, "name": "var-lib-lun0p1.mount" } ] } 4. Extract the disabling template value using below $ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.disableTemplating | base64decode}}' | jq > disableTemplating.txt 5. Merge the two files to create a datasecret file to be used $ oc -n openshift-machine-api create secret generic worker-user-data-x5 --from-file=userData=userData.txt --from-file=disableTemplating=disableTemplating.txt /// Not needed up to here /// 6.modify the new machineset yaml with below datadisk being seperate field as the osDisks dataDisks: - nameSuffix: ultrassd lun: 0 diskSizeGB: 4 # The same issue on the machine status fields is reproducible on x86_64 by setting 65535 to overcome the maximum limits of the Azure accounts we use. cachingType: None deletionPolicy: Delete managedDisk: storageAccountType: UltraSSD_LRS 7. scale up machineset or delete an existing machine to force the reprovisioning.
Actual results:
Machine stuck in Provisoned phase, but check from azure, it failed $ oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE zhsunaz3231-lds8h-master-0 Running Standard_D8ps_v5 centralus 1 4h15m zhsunaz3231-lds8h-master-0 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-0 Running zhsunaz3231-lds8h-master-1 Running Standard_D8ps_v5 centralus 2 4h15m zhsunaz3231-lds8h-master-1 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-1 Running zhsunaz3231-lds8h-master-2 Running Standard_D8ps_v5 centralus 3 4h15m zhsunaz3231-lds8h-master-2 azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-2 Running zhsunaz3231-lds8h-worker-centralus1-sfhs7 Provisioned Standard_D4ps_v5 centralus 1 3m23s azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-worker-centralus1-sfhs7 Creating $ oc get machine zhsunaz3231-lds8h-worker-centralus1-sfhs7 -o yaml - lastTransitionTime: "2023-03-23T06:07:32Z" message: 'Failed to check if machine exists: vm for machine zhsunaz3231-lds8h-worker-centralus1-sfhs7 exists, but has unexpected ''Failed'' provisioning state' reason: ErrorCheckingProvider status: Unknown type: InstanceExists - lastTransitionTime: "2023-03-23T06:07:05Z" status: "True" type: Terminable lastUpdated: "2023-03-23T06:07:32Z" phase: Provisioned
Expected results:
Machine should be failed if failed in azure
Additional info:
must-gather: https://drive.google.com/file/d/1z1gyJg4NBT8JK2-aGvQCruJidDHs0DV6/view?usp=sharing