-
Story
-
Resolution: Obsolete
-
Undefined
-
None
-
None
-
None
-
False
-
False
-
-
Sprint 215
For OSD SREs, there has been a large amount of toil around the troubleshooting of bootstrap failures during customer installs. Usually our first attempt at diagnosing the problem is to look at Hive's cluster installer logs.
As an SRE-P on-call engineer diagnosing failed cluster installation, I would like more verbose logging around the bootstrap phase of the install (between terraform completion and waiting 20m for the apiserver).
Done Criteria
- More checkpoints and associated logs added to installer logging during bootstrap phase.
- Suggestions include: bootstrap's machine-config-server check, instance state (running/aws-health-status), attempts to fetch machine configs on <internal-api>:22623/config/master, network-egress checks, and anything else to assist SREPs in isolating issues during the install process.