Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Work Type:
Improvement
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

The goal of this spike is to create an epic to improve user visibility into the installation process after infrastructure creation, particularly to improve feedback in the bootkube process.

At the moment, we simply wrap bootkube in timeouts and fail if we exceed the timeout. This presents problems in both directions:

There are some cases where progress is being made but may take longer than the timeouts. This comes up in baremetal installs, where reboots can take significant amounts of time. In these cases we would ideally identify a way to monitor progress without resorting to a hard timeout.
On the other hand, an error such as the failure to pull the release image represents the opposite problem: this is an unrecoverable error that happens immediately but we spend X number of minutes waiting for the timeout before returning. It would be better if we could report failure immediately.

We also have seen another class of problem with manifests that fail to apply. These could be user-provided or in some cases they have snuck in from openshift components. These failures can be hard to identify, and require inspecting bootkube logs.

is related to

OCPBUGS-6604 Invalid extra manifest only caught by the agent installer timeout @60 minutes

Closed

relates to

CORS-2087 Installer improvements to provide user controls to extend bootstrap timeouts

Assignee:: Unassigned

Reporter:: Patrick Dillon

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/02/10 7:39 PM

Updated:: 2024/11/14 5:59 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates