Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: OLM
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request
In the OLM we have now a feature called `UnsafeFailForward` to let the cluster admin opt-in in failover mechanism to the next available version when the upgrade of an OLM managed application is getting stuck for every reason:
https://olm.operatorframework.io/docs/advanced-tasks/unsafe-fail-forward-upgrades/

This is basically a safety net when something goes wrong.
The feature is quite hidden (it's documented only upstream) with the idea of having it suggested and guided just by our support team.

On the other side, as operator authors or as another engineer in the support team it will be really interesting to know:

how many clusters on the field should enable this feature to get rid of a specific bugged release
if a cluster consumed something like this in the past with possible future implications (eg. leftovers...)

So the idea is to have a metric to count when and how `UnsafeFailForward` got used.
The metric will enable us to track it with Telemetry and the Insight tool.

More technical details are tracked here: https://docs.google.com/document/d/1KVEyQqg9Kwq93rfX9dOPwE98M_Hs33uRsNJXboLnxHY/edit#heading=h.of662m97fj1v

2. What is the nature and description of the request?
Expose a new metric to let us detect if/when `UnsafeFailForward` got used with Telemetry and Insight.

3. Why does the customer need this? (List the business requirements here)
The customers are not really supposed to directly consume the metric but:

operator authors will be able to see how many customers had to skip a specific upgrade-bugged release (with Telemetry)
the support team (with Insight) will be able to easily detect that the cluster used `UnsafeFailForward` in the past with possible actual implications.

4. List any affected packages or components.
OLM

Assignee:: Tony Wu

Reporter:: Simone Tiraboschi

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/08/10 8:35 AM

Updated:: 2025/07/05 1:26 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates