Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2281

Hive: retry installation on GeneralOperatorDegraded

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • False
    • None
    • False

      Hive should re-attempt in case of GeneralOperatorDegraded, so that the clusters don't run into an error state (bad customer experience). 

      For example, if OCPBUGS-17062 occurs, we can repair the cluster by replacing a worker node.
      The OCM state, however, cannot be bounced out of error, thus we currently have to send customers a generic SL to retry: "Your install failed due to an intermittent issue. Please retry installation." 

      Note: Ideally, we should be able to re-initiate installs (kick them off again where they stopped) even if a cluster is already in error state. I wonder if that's feasible? 

      Done:

      • Have Hive attempt a re-install for GeneralOperatorDegraded Failures while we pursue the issue upstream

            Unassigned Unassigned
            cbusse.openshift Claudio Busse
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: