Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Storage Platform
Labels:
- grooming

Activity Type:
Quality / Stability / Reliability
Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
CNV-12960
Acceptance Criteria:
- CDI CR aggregates progressing and degraded conditions from it's operands
- Upgrade flow ready for failforward mechanism
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Today, the CDI CR will never reach "Deployed" unless all its operands are
happy.
While this is a good indication for debugging an install, it would be
a great step forward to also aggregate fatal conditions from CDI operands onto the CDI CR:
$ oc get deployments -n openshift-cnv cdi-deployment -o json | jq .status.conditions
[

{ "lastTransitionTime": "2022-09-21T21:21:28Z", "lastUpdateTime": "2022-09-21T21:21:28Z", "message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available" }

{ "lastTransitionTime": "2022-09-21T21:21:28Z", "lastUpdateTime": "2022-09-21T21:21:28Z", "message": "pods \"cdi-deployment-6f4888b5cb-r9f5h\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"cdi-controller\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container \"cdi-controller\" must set securityContext.capabilities.drop=[\"ALL\"]), seccompProfile (pod or container \"cdi-controller\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")", "reason": "FailedCreate", "status": "True", "type": "ReplicaFailure" }

{ "lastTransitionTime": "2022-09-21T21:31:29Z", "lastUpdateTime": "2022-09-21T21:31:29Z", "message": "ReplicaSet \"cdi-deployment-6f4888b5cb\" has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing" }

]
Today this would translate to

lastHeartbeatTime: "2022-09-21T21:21:25Z"
lastTransitionTime: "2022-09-21T21:21:25Z"
message: Started Deployment
reason: DeployStarted
status: "True"
type: Progressing
lastHeartbeatTime: "2022-09-21T21:21:25Z"
lastTransitionTime: "2022-09-21T21:21:25Z"
status: "False"
type: Degraded
on the CDI CR, instead of Progressing=False, Degraded=True.

Note that correctly reporting a failed install/upgrade is actually a prerequisite for the unsafe fail forward upgrades feature:
https://olm.operatorframework.io/docs/advanced-tasks/unsafe-fail-forward-upgrades/

Currently, we are still not opting in, but that feature will eventually let customers try to recover from stuck upgrades.
Source:
https://bugzilla.redhat.com/show_bug.cgi?id=2128906

Assignee:: Adam Litke

Reporter:: Alex Kalenyuk

QA Contact:: Natalie Gavrielov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/07/18 1:51 PM

Updated:: 2025/09/06 3:36 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates