-
Spike
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Improvement
-
False
-
None
-
False
-
-
The goal of this spike is to create an epic to improve user visibility into the installation process after infrastructure creation, particularly to improve feedback in the bootkube process.
At the moment, we simply wrap bootkube in timeouts and fail if we exceed the timeout. This presents problems in both directions:
- There are some cases where progress is being made but may take longer than the timeouts. This comes up in baremetal installs, where reboots can take significant amounts of time. In these cases we would ideally identify a way to monitor progress without resorting to a hard timeout.
- On the other hand, an error such as the failure to pull the release image represents the opposite problem: this is an unrecoverable error that happens immediately but we spend X number of minutes waiting for the timeout before returning. It would be better if we could report failure immediately.
We also have seen another class of problem with manifests that fail to apply. These could be user-provided or in some cases they have snuck in from openshift components. These failures can be hard to identify, and require inspecting bootkube logs.
- is related to
-
OCPBUGS-6604 Invalid extra manifest only caught by the agent installer timeout @60 minutes
- Closed
- relates to
-
CORS-2087 Installer improvements to provide user controls to extend bootstrap timeouts
- New