-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
produce an error code when an OCP IPI install fails
corollary, maintain a system of error codes for known failures
2. What is the nature and description of the request?
In OSD/ROSA, we wrap openshift-installer with several services to enhance the capabilities of the installation process. One of our enhancements is that we want to show users clean, concise, and actionable error messages for installation failures. We don't want the users to have to dig through giant terraform-filled installation logs to try to figure out what went wrong. We want to tell them with a one line message and a corresponding error code, like "OCM3019 - NAT gateway limit exceeded. Clean unused NAT gateways or increase quota and try again."
This RFE is a request to push this mechanism we've built into the installer itself so that all users of the installer can benefit from a common set of error codes and messages.
3. Why does the customer need this? (List the business requirements here)
All users of the installer can benefit from a common set of error codes and messages. This will lead to a better user experience for all. It would also reduce the burden on Service Delivery for maintaining our own custom system.
4. List any affected packages or components.
openshift-install
Background
Our current system is a bit naive in that it maps regular expressions to error codes. Our full list of regular expressions can be found here: https://github.com/openshift/hive/blob/master/config/configmaps/install-log-regexes-configmap.yaml
For example, when we find:
searchRegexStrings:
- "NatGatewayLimitExceeded"
we map that to
"OCM3019 - NAT gateway limit exceeded. Clean unused NAT gateways or increase quota and try again."
We want the installer itself to be the source of truth and the doer of that ^ such that the installer itself sees "NatGatewayLimitExceeded" and it outputs "INSTALLER-AWS-1234: NAT gateway limit exceeded. Clean unused NAT gateways or increase quota and try again." or similar.