-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
-
False
-
-
Feature Overview (aka. Goal Summary)
An elevator pitch (value statement) that describes the Feature in a clear,
concise way.
This feature introduces an RHDH-specific must-gather image to consistently and efficiently collect diagnostics from RHDH installations. By building upon the existing "must-gather" framework, this tailored approach ensures that support and engineering teams receive all the relevant data needed for troubleshooting customer issues, hopefully improving the speed and quality of the resolution process by reducing the number of back-and-forth discussions to get this data.
NOTE: A POC has been implemented and shared with the Support team: https://github.com/rm3l/rhdh-must-gather. This allowed us to capture their requirements.
Goals (aka. expected user outcomes)
The observable functionality that the user now has as a result of receiving
this feature. Include the anticipated primary user type/persona and which
existing features, if any, will be expanded.
The primary personas for this feature are Support and Engineering, who will both gain a more efficient and reliable method for acquiring diagnostic data. Support Engineers will use the tool to investigate customer cases, while Engineering will use the collected data for further troubleshooting and better assist with RHDHSUPP support requests. We might as well extract useful usage information. The Customer will also benefit from a consistent process for providing this kind of information.
As a result of this feature, the user will:
- Have a single, application-specific tool to collect comprehensive diagnostic data relevant only to RHDH.
- Have a reliable method for gathering data even in disconnected environments.
- Have a clear understanding of what data is being collected from the cluster.
This feature expands upon the existing capabilities of OpenShift's must-gather tool by creating an RHDH-specific implementation.
Requirements (aka. Acceptance Criteria):
A list of specific needs or objectives that a feature must deliver in order
to be considered complete. If the feature spans across releases then good
to have scope for each release with acceptance criteria. Be sure to
include nonfunctional requirements such as security, reliability,
performance, maintainability, scalability, usability, etc.
- The tool should be an RHDH-specific image that builds upon "oc adm must-gather" to focus data collection on relevant application information, thereby improving performance and scope. See the POC in https://github.com/rm3l/rhdh-must-gather
- The RHDH-specific must-gather image should be officially released (i.e., in registry.redhat.io). Release should follow the same release process as RHDH, ensuring that a specific must-gather version is tied to the RHDH version it is designed to collect data for.
- The tool should be compatible and interface well with existing support tooling and the support team's existing workflow and parsing tools. More details in https://access.redhat.com/solutions/6962483 and this blog post
- It should embed "oc adm inspect" as part of the data collection process. The Support team is already familiar with tools like OMC (formerly OMG), able to handle outputs from "oc adm inspect". So, having the must-gather integrate the output from "oc adm inspect" in addition to RHDH-specific data would meet both Support and Engineering requirements
- It should not require any extra external download/installation from the user's standpoint. The user should not need to download must-gather from the Customer Portal.
- The tool should function properly in disconnected environments
- An idea discussed with Support is to list the accompanying must-gather image in the operator's related images when productizing the Operator, as well as in the Helm Charts, so it can be mirrored accordingly.
- The tool should work on all supported platforms, including OpenShift, AKS, EKS, and GKE. OpenShift users would be able to call "oc adm must-gather --image=...", but we should make it possible for non-OCP users to use the tool as well
- The tool should be able to collect data from both Helm and Operator-based installations of RHDH. Note that a same cluster may have multiple RHDH instances running.
- The tool should also collect data about the RHDH Operator itself, if installed in the cluster.
- It must primarily collect RHDH-specific data, with minimal collection of general cluster data.
- The tool should return minimal platform information like the type of cluster (OCP, K8s), underlying platform (AWS, AKS, ...), cluster version, and (if possible) whether it is running in a disconnected setup
- The tool should be able to collect key information for both Helm and Operator-based RHDH instances, such as:
- RHDH and Backstage versions (e.g., from packages/app/src/build-metadata.json).
- Node.js version.
- The list of installed and enabled dynamic plugins and their versions.
- Container logs. "oc adm must-gather" supports passing "since" or "since-time", so the tool should handle that as well
- RHDH configuration, including app-config(s) and Backstage CRs.
- For Operator-based instances, it should collect Operator data (version, OLM information, Backstage CRD, default configuration, and logs) as well as Operands data
- For Helm-based instances, it should collect Helm-specific information (release status, user-provided values file, history, and chart version deployed).
- The tool should be able to redact or otherwise handle sensitive information, such as secrets, by hashing or redacting the data
- In the first phase, the tool should skip collecting sensitive Secrets from the cluster (or make it opt-in). It should anyway make sure to redact such data if the user opted in.
- The collected files must be in a format that would allow extra tools to analyze them (like to enhance our telemetry). For example, it should return data in both structured (JSON, YAML) and unstructured (human-readable text) formats.
- NOTE: Support advised caution when collecting data for telemetry purposes, especially from government agencies, to avoid over-collection or breaching implied confidentiality. They stressed the importance of being mindful of the type of data extracted from clusters, as some information can be highly sensitive and should not be inadvertently exposed
- The tool should support specifying a namespace (or a list of namespaces) to analyse, for further scoping. By default, it should analyze all the namespaces, trying to collect information from the ones that have RHDH installs.
- The tool's behavior, overhead, and performance must be well-understood.
- The tool should ideally run in a timely manner. As it is scoped to only RHDH-specific data, it should not take too long (roughly 5 minutes at most) to return.
- Any changes potentially impacting the support team's current workflow must be discussed and verified with them before implementation.
Out of Scope (Optional)
High-level list of items that are out of scope.
- Collecting broad, general cluster data beyond what is relevant for RHDH.
- Showing this information to users / admins in the UI
- Potential future enhancements:
- An initial user interface (UI) to trigger diagnostics data collection from within a web console (RHDH UI?)
- expand the support dashboard we have to include one or more metrics around how many cases have this attached (could expand to see if resolution time decreases if extra data is attached)?
Customer Considerations (Optional)
Provide any additional customer-specific considerations that must be made
when designing and delivering the Feature. Initial completion during
Refinement status.
- The tool should provide customers with a clear understanding of what data is being collected.
- Customers must have the option to sanitize or exclude sensitive data from the final output before sharing it.
- The process should be as simple as possible for customers, avoiding the need for manual downloads from a customer portal or complex installation steps.
Documentation Considerations
Provide information that needs to be considered and planned so that
documentation will meet customer needs. If the feature extends existing
functionality, provide a link to its current documentation.
- The documentation must provide clear instructions on how to run the new support tool.
- It should explain how the tool handles sensitive data and how a customer can further sanitize the collected information.
- Information on how the tool functions in disconnected environments should be included.
- Consider providing a training session to the Support team
PREVIOUS DESCRIPTION FOR MORE CONTEXT
h1. Feature Overview (aka. Goal Summary)
Provide a consistent way how to collect and share information from a cluster, esp. for support caes.
h3. Goals (aka. expected user outcomes)
A script or tool to collect important/necessary information from an RHDH installation to simplify and streamline support cases.
h3. Requirements (aka. Acceptance Criteria):
A script or tool should collect at least this information:
# Versions
## RHDH / Backstage version
##* might be the information from packages/app/src/build-metadata.json
##* RHIDP-4788 might introduce a second resource to override this information!
## Platform RHDH is running on (Openshift, AWS, GKE, etc.) if that's possible
## Perhaps underlying OS is of interest (RHEL I presume)?
## List of installed and enabled plugins with their versions
## NodeJS version
# Configuration
## Maybe the complete Helm, Backstage CR or app-config configuration values in a way that we hide secrets?
## Or:
### Techdocs builder?
### VCS integrated (GitLab, GitHub, etc.)?
### Authentication provider?
### RBAC enabled/disabled?
# Logs
#* Full log or maybe just the first 1k and last 1k lines of the container?
h3. Out of Scope (Optional)
Showing this information to users / admins in the UI
h3. Customer Considerations (Optional)
# Customers should be aware of what kind of information we share.
# They should have the option to not share some information like logs?
# Sensitive information should be an opt-in? Like full app-config?
h3. Documentation Considerations
# This feature should be documented
# This feature might influence support as well
- is related to
-
RHIDP-6568 [Docs] Document how users can override the RHDH Metadata card with custom information
-
- Closed
-
-
RHIDP-3625 Add troubleshooting guide to product documentation
-
- Refinement
-
- relates to
-
RHIDP-4788 Clean up the RHDH Metadata info in the Settings page
-
- Closed
-
- mentioned on