-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.16.0
-
Important
-
No
-
Sprint 255, Installer Sprint 256
-
2
-
Rejected
-
False
-
-
N/A
-
Release Note Not Required
-
Done
Description of problem:
For a cluster having one worker machine of A3 instance type, during "destroy cluster" it keeps telling below failure until I stopped the instance via "gcloud". WARNING failed to stop instance jiwei-0530b-q9t8w-worker-c-ck6s8 in zone us-central1-c: googleapi: Error 400: VM has a Local SSD attached but an undefined value for `discard-local-ssd`. If using gcloud, please add `--discard-local-ssd=false` or `--discard-local-ssd=true` to your command., badRequest
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-multi-2024-05-29-143245
How reproducible:
Always
Steps to Reproduce:
1. "create install-config" and then "create manifests" 2. edit a worker machineset YAML, to specify "machineType: a3-highgpu-8g" along with "onHostMaintenance: Terminate" 3. "create cluster", and make sure it succeeds 4. "destroy cluster"
Actual results:
Uninstalling the cluster keeps telling stopping instance error.
Expected results:
"destroy cluster" should proceed without any warning/error, and delete everything finally.
Additional info:
FYI the .openshift-install.log is available at https://drive.google.com/file/d/15xIwzi0swDk84wqg32tC_4KfUahCalrL/view?usp=drive_link FYI to stop the A3 instance via "gcloud" by specifying "--discard-local-ssd=false" does succeed. $ gcloud compute instances list --format="table(creationTimestamp.date('%Y-%m-%d %H:%M:%S'):sort=1,zone,status,name,machineType,tags.items)" --filter="name~jiwei" 2>/dev/null CREATION_TIMESTAMP ZONE STATUS NAME MACHINE_TYPE ITEMS 2024-05-29 20:55:52 us-central1-a TERMINATED jiwei-0530b-q9t8w-master-0 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 20:55:52 us-central1-b TERMINATED jiwei-0530b-q9t8w-master-1 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 20:55:52 us-central1-c TERMINATED jiwei-0530b-q9t8w-master-2 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 21:10:08 us-central1-a TERMINATED jiwei-0530b-q9t8w-worker-a-rkxkk n2-standard-4 ['jiwei-0530b-q9t8w-worker'] 2024-05-29 21:10:19 us-central1-b TERMINATED jiwei-0530b-q9t8w-worker-b-qg6jv n2-standard-4 ['jiwei-0530b-q9t8w-worker'] 2024-05-29 21:10:31 us-central1-c RUNNING jiwei-0530b-q9t8w-worker-c-ck6s8 a3-highgpu-8g ['jiwei-0530b-q9t8w-worker'] $ gcloud compute instances stop jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c ERROR: (gcloud.compute.instances.stop) HTTPError 400: VM has a Local SSD attached but an undefined value for `discard-local-ssd`. If using gcloud, please add `--discard-local-ssd=false` or `--discard-local-ssd=true` to your command. $ gcloud compute instances stop jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c --discard-local-ssd=false Stopping instance(s) jiwei-0530b-q9t8w-worker-c-ck6s8...done. Updated [https://compute.googleapis.com/compute/v1/projects/openshift-qe/zones/us-central1-c/instances/jiwei-0530b-q9t8w-worker-c-ck6s8]. $ gcloud compute instances list --format="table(creationTimestamp.date('%Y-%m-%d %H:%M:%S'):sort=1,zone,status,name,machineType,tags.items)" --filter="name~jiwei" 2>/dev/null CREATION_TIMESTAMP ZONE STATUS NAME MACHINE_TYPE ITEMS 2024-05-29 20:55:52 us-central1-a TERMINATED jiwei-0530b-q9t8w-master-0 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 20:55:52 us-central1-b TERMINATED jiwei-0530b-q9t8w-master-1 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 20:55:52 us-central1-c TERMINATED jiwei-0530b-q9t8w-master-2 n2-standard-4 ['jiwei-0530b-q9t8w-master'] 2024-05-29 21:10:08 us-central1-a TERMINATED jiwei-0530b-q9t8w-worker-a-rkxkk n2-standard-4 ['jiwei-0530b-q9t8w-worker'] 2024-05-29 21:10:19 us-central1-b TERMINATED jiwei-0530b-q9t8w-worker-b-qg6jv n2-standard-4 ['jiwei-0530b-q9t8w-worker'] 2024-05-29 21:10:31 us-central1-c TERMINATED jiwei-0530b-q9t8w-worker-c-ck6s8 a3-highgpu-8g ['jiwei-0530b-q9t8w-worker'] $ gcloud compute instances delete -q jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c Deleted [https://www.googleapis.com/compute/v1/projects/openshift-qe/zones/us-central1-c/instances/jiwei-0530b-q9t8w-worker-c-ck6s8]. $
- blocks
-
OCPBUGS-36965 [GCP NVIDIA H100] "destroy cluster" will hang at "VM has a Local SSD attached but an undefined value for 'discard-local-ssd'" when trying to stop the A3 instance
- Closed
- is cloned by
-
OCPBUGS-36965 [GCP NVIDIA H100] "destroy cluster" will hang at "VM has a Local SSD attached but an undefined value for 'discard-local-ssd'" when trying to stop the A3 instance
- Closed
- relates to
-
CORS-3287 List GCP's NVIDIA H100 instances as tested instance type
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update