Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
- shift-improvement

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

Hive should re-attempt in case of GeneralOperatorDegraded, so that the clusters don't run into an error state (bad customer experience).

For example, if ~~OCPBUGS-17062~~ occurs, we can repair the cluster by replacing a worker node.
The OCM state, however, cannot be bounced out of error, thus we currently have to send customers a generic SL to retry: "Your install failed due to an intermittent issue. Please retry installation."

Note: Ideally, we should be able to re-initiate installs (kick them off again where they stopped) even if a cluster is already in error state. I wonder if that's feasible?

Done:

Have Hive attempt a re-install for GeneralOperatorDegraded Failures while we pursue the issue upstream

is caused by

OCPBUGS-17062 Static Pods & Guard Pods causing cluster install failures

Closed

Assignee:: Unassigned

Reporter:: Claudio Busse

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/07/31 7:08 AM

Updated:: 2023/08/02 7:40 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates