Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-1662

add RetryOnRecoverableErrorOnly mode

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • None
    • None
    • None
    • None

      In OSD, we desire a mode where hive only attempts 1 installation, unless it detects that there was a recoverable error worth retrying – like aws throttling. Hive mindlessly retying 3 times mostly just wastes time and money.

      Stories:

      As a user of OpenShift Dedicated / ROSA, I want to be notified as early as possible when I have a quota problem blocking install so that I can fix it faster and get my cluster faster (no need to install 3 times and fail all 3 times the same way when for example there aren't enough free S3 buckets).

      As a user of OpenShift Dedicated / ROSA, I want my install to seamlessly retry (silently, without me knowing about it) when there is a transient recoverable failure, like AWS throttling, where an immediate retry is likely to succeed.

      Slack discussion: https://coreos.slack.com/archives/CE3ETN3J8/p1633107844016800?thread_ts=1633105109.010900&cid=CE3ETN3J8

      proposed API for such a feature:

      HiveConfig
      
      ProvisionMode: RetryOnRecoverableErrorOnly | RetryAlways
      MaxInstallAttempts: 3 
      
      - and/or -
      
      ClusterDeployment
      
      ProvisionMode: RetryOnRecoverableErrorOnly | RetryAlways MaxInstallAttempts: 3 
      
      

       

              efried.openshift Eric Fried
              rhn-engineering-gshereme Greg Sheremeta (Inactive)
              None
              None
              Jianping Shu Jianping Shu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: