-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.14.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Based on the analysis in https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-18%2000%3A00%3A00&capability=operator-conditions&component=Cloud%20Compute%20%2F%20Other%20Provider&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Calibaba%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-09-12%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-09-05%2000%3A00%3A00&testId=Operator%20results%3Aff3f4ce2ada4b853ece12306b1ef3eaf&testName=operator%20conditions%20machine-api&upgrade=no-upgrade&variant=standard, we can see towards the bottom that a number of jobs are now permafailing on 4.14, which worked on 4.13 (hence this being considered a blocker/regression). Having reviewed the failures, the worker machines are not successfully joining the cluster, the majority of the time. Some clusters had 1 worker join, some had no workers join. The Machine API appears to be working correctly and is creating EC2 instances correctly. Looking at the MCS logs we see that the nodes have requested and been served the ignition data. However, the next clue, CSRs for client credential bootstrapping, are missing. Somehow, somewhere between fetching the ignition and requesting client credentials the boot process has failed.
Version-Release number of selected component (if applicable):
How reproducible:
Currently showing as permanent on the disruptive installer jobs
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
This was raised to my attention in https://redhat-internal.slack.com/archives/C01CQA76KMX/p1694547905992139