-
Epic
-
Resolution: Done
-
Critical
-
None
-
Gather bootstrap should capture serial console logs
-
False
-
False
-
Done
-
Impediment
-
0% To Do, 0% In Progress, 100% Done
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- Capture serial console logs from bootstrap and master hosts whenever a cluster fails to bootstrap
Why is this important?
- This is a gap in our bootstrap log bundle which would help illustrate why masters fail to join the cluster or the bootstrap host fails to boot and start requisite services, if for instance S3 permissions prevent retrieval of bootstrap.ign.
- Any future improvement to RHCOS pre-boot diagnostics would likely come in the form of serial console output, though no specific improvements have been identified as of yet.
Scenarios
- Cluster fails to bootstrap
-
- Assuming platform support, automatically attempt to capture serial console logs from relevant machines, placed into standard bootstrap log bundle. If Installer credentials for some reason are not sufficient for console retrieval fail gracefully and emit an error message.
- Cluster bootstraps but bootstrap host was preserved and someone runs `openshift-install gather bootstrap`
- Assuming platform support, capture serial console logs from relevant machines, placed into standard bootstrap log bundle. If Installer credentials for some reason are not sufficient for console retrieval fail gracefully and emit an error message.
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - This is a rather minor incremental change to the gathered bootstrap log bundle, I'm not sure additional TE is warranted.
- Documented permissions requirements are amended if gathering this debug information requires permissions not previously documented
- Once fully tested please backport to 4.N-2
Dependencies (internal and external)
- Platform SDKs must support console retrieval, aws-sdk-go does, see docs reference below
Previous Work (Optional):
- I believe some of the CI jobs have console gathering methods, once we implement this across supported platforms we should probably reduce that duplication.
Open questions::
- Given the primary goal here is for managed services, we need for them to ensure permissions both in STS and non-STS installation modes as well as SSH access to the bootstrap host from Hive which is currently known not to work.
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is duplicated by
-
CORS-1904 Gather bootstrap captures serial console logs
- Closed
-
CORS-1905 Gather bootstrap captures serial console logs
- Closed
- is related to
-
CORS-2067 Gather bootstrap should capture serial console logs for Azure and GCP
- Closed
-
CORS-1899 Logging Improvements During Bootstrap Phase
- Closed
- relates to
-
CORS-1899 Logging Improvements During Bootstrap Phase
- Closed
- links to
(1 links to)
1.
|
QE Tracker | Closed | Yunfei Jiang | ||
2.
|
TE Tracker | Closed | Eric Rich | ||
3.
|
Docs Tracker | Closed | Unassigned |