-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
-
Hive should re-attempt in case of GeneralOperatorDegraded, so that the clusters don't run into an error state (bad customer experience).
For example, if OCPBUGS-17062 occurs, we can repair the cluster by replacing a worker node.
The OCM state, however, cannot be bounced out of error, thus we currently have to send customers a generic SL to retry: "Your install failed due to an intermittent issue. Please retry installation."
Note: Ideally, we should be able to re-initiate installs (kick them off again where they stopped) even if a cluster is already in error state. I wonder if that's feasible?
Done:
- Have Hive attempt a re-install for GeneralOperatorDegraded Failures while we pursue the issue upstream
- is caused by
-
OCPBUGS-17062 Static Pods & Guard Pods causing cluster install failures
-
- Closed
-