-
Bug
-
Resolution: Done-Errata
-
Critical
-
CentOS Stream 9
-
None
-
ignition-2.17.0-2.el9
-
None
-
None
-
rhel-sst-rhcos
-
24
-
24
-
None
-
QE ack, Dev ack
-
False
-
-
No
-
None
-
Bug Fix
-
-
Proposed
-
None
Description of problem: VMs are receiving `410 Gone` errors and failing to provision.
According to Microsoft's recommendations, and their direct recommendation during our outage bridge call, the call must be retried after 70s to succeed
Version-Release number of selected component (if applicable): 4.12
Steps to Reproduce:{code:none} 1. Attempt to provision an ARO cluster in either "eastus", "australiaeast", "japaneast", "uswest" 2. Monitor node provisioning for 410 Gone errors 3. Node(s) should fail to provision
Actual results:
In jmilhau-test3: master-0 failed to download ignition after it receives a 410 on the second attempt (extracted from serial logs. Full serial logs here): Feb 08 12:36:15 ignition[1013]: GET error: Get "http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text": dial tcp 169.254.169.254:80: connect: network is unreachable Feb 08 12:36:15 ignition[1013]: GET http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text: attempt #2 Feb 08 12:36:15 ignition[1013]: GET result: Gone master-1 is able to GET the same resource after the 3rd attempt (Full serial logs): [ 6.644027] ignition[979]: GET http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text: attempt #3 [ 6.729304] ignition[979]: GET result: OK MSFT pointed out to their docs where they specify that after receiving a 410, the request can be retried after 70s: Azure Instance Metadata Service for virtual machines - Azure Virtual Machines | Microsoft Learn They insisted that even if 410 code in the standard HTTP specs says no retry, we should/must retry for this specific use case Ignition service however stops retrying after receiving a "410: Gone" error, in line with HTTP specs (it retries on other errors).
Expected results:
Node OSs to provision successfully.
Additional info:
- clones
-
OCPBUGS-29252 OS Provisioning Timeout Getting Azure Instance Metadata
- Closed
- links to
-
RHBA-2024:126042 ignition update
- mentioned on