Description of problem: VMs are receiving `410 Gone` errors and failing to provision.
According to Microsoft's recommendations, and their direct recommendation during our outage bridge call, the call must be retried after 70s to succeed
Version-Release number of selected component (if applicable): 4.12
Steps to Reproduce:{code:none} 1. Attempt to provision an ARO cluster in either "eastus", "australiaeast", "japaneast", "uswest" 2. Monitor node provisioning for 410 Gone errors 3. Node(s) should fail to provision
Actual results:
In jmilhau-test3: master-0 failed to download ignition after it receives a 410 on the second attempt (extracted from serial logs. Full serial logs here): Feb 08 12:36:15 ignition[1013]: GET error: Get "http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text": dial tcp 169.254.169.254:80: connect: network is unreachable Feb 08 12:36:15 ignition[1013]: GET http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text: attempt #2 Feb 08 12:36:15 ignition[1013]: GET result: Gone master-1 is able to GET the same resource after the 3rd attempt (Full serial logs): [ 6.644027] ignition[979]: GET http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text: attempt #3 [ 6.729304] ignition[979]: GET result: OK MSFT pointed out to their docs where they specify that after receiving a 410, the request can be retried after 70s: Azure Instance Metadata Service for virtual machines - Azure Virtual Machines | Microsoft Learn They insisted that even if 410 code in the standard HTTP specs says no retry, we should/must retry for this specific use case Ignition service however stops retrying after receiving a "410: Gone" error, in line with HTTP specs (it retries on other errors).
Expected results:
Node OSs to provision successfully.
Additional info:
- is cloned by
-
RHEL-24950 OS Provisioning Timeout Getting Azure Instance Metadata
- Closed
- relates to
-
OCPBUGS-29441 [4.16] Bootimage bump tracker
- Closed
-
OCPBUGS-29442 [4.15] Bootimage bump tracker
- Closed
-
OCPBUGS-29626 [4.14] Bootimage bump tracker
- Closed
-
OCPBUGS-29627 [4.13] Bootimage bump tracker
- Closed
-
OCPBUGS-30768 [4.12] Bootimage bump tracker
- Closed