-
Feature
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
-
False
-
Not Selected
-
XCMSTRAT-225Managed Services Operational Excellence
-
0% To Do, 33% In Progress, 67% Done
-
-
0
Feature Overview (aka. Goal Summary)
SRE, CEE, and various automated processes currently lack any centralized insight or context
into granted support exceptions for clusters or organizations.
This causes Limited Support's to be added to clusters where they shouldn't apply, alerting
being silenced where it shouldn't be, Service Log messages being sent where they shouldn't
be, incorrect statements from CEE in support cases where they shouldn't be, etc...
The goal of this feature is to centralize our support exceptions, primarily to prevent
any automated processes from incorrectly making changes to a cluster due to a Limited Support
reason.
A cluster can be in limited support for a variety of reasons such as the OpenShift version being EOL,
failing network validation, etc. Not all of these limited support reasons are currently automated,
however the number of automated limited supports is expected to increase. In contrast, the customer
can request and be approved for a support exception that should override or cause the limited support
reason to no longer be necessary or apply. The exception can be reason-specific or generic, where they
would apply to any/all reasons for the cluster being in support exception. Cluster-specific
or org-specific where they apply to any/all clusters belonging to that organization.
There must be a mechanism of preventing clusters in Limited Support from being automatically
displayed as Limited Support and having alerts automatically disabled/silenced, while
still being able to keep the context that this cluster was in Limited Support and why that was.
Requirements (aka. Acceptance Criteria):
- Customers should not be aware of any internal records or references to a SUPPORTEX Jira number, or any other indicator
that they may have a "Support Exception". - Only fully approved Support Exceptions should be considered (individual SUPPORTEX's typically have subtasks assigned
to different teams, e.g., CEE/SRE/PM, a SUPPORTEX is only fully approved when all subtasks have been approved) - The Limited Support automations (both in UI and in PagerDuty, possibly other automations/integrations) should
not apply when the Limited Support reason/cluster/org has a matching and approved SUPPORTEX - The Limited Support reason (if it exists) should continue to exist, even with a matching SUPPORTEX,
only the automatic actions should be ignored/disabled - Otherwise there's no history/context as to why the Limited support
reason is missing, or there may be an assumption of broken automation - SUPPORTEX's have expiration dates, upon expiration of a SUPPORTEX, the automated Limited Support actions
should be applied.
- is blocked by
-
OCMUI-2010 Hide Limited Support Reasons once cluster is supported
- Closed