-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
None
-
None
-
None
-
None
-
None
In OSD, we desire a mode where hive only attempts 1 installation, unless it detects that there was a recoverable error worth retrying – like aws throttling. Hive mindlessly retying 3 times mostly just wastes time and money.
Stories:
As a user of OpenShift Dedicated / ROSA, I want to be notified as early as possible when I have a quota problem blocking install so that I can fix it faster and get my cluster faster (no need to install 3 times and fail all 3 times the same way when for example there aren't enough free S3 buckets).
As a user of OpenShift Dedicated / ROSA, I want my install to seamlessly retry (silently, without me knowing about it) when there is a transient recoverable failure, like AWS throttling, where an immediate retry is likely to succeed.
Slack discussion: https://coreos.slack.com/archives/CE3ETN3J8/p1633107844016800?thread_ts=1633105109.010900&cid=CE3ETN3J8
proposed API for such a feature:
HiveConfig ProvisionMode: RetryOnRecoverableErrorOnly | RetryAlways MaxInstallAttempts: 3 - and/or - ClusterDeployment ProvisionMode: RetryOnRecoverableErrorOnly | RetryAlways MaxInstallAttempts: 3