Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.15.z
Affects Version/s: 4.14, 4.15
Component/s: OLM
Labels:
- pre-merge-tested
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:
None
Story Points:
None
Severity:
None
Regression:
Yes

Target Backport Versions:

4.15
Target Version:

4.15.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:

Hide
* Previously, default Operator Lifecycle Manager (OLM) catalog pods remained in a termination state when there was an outage of the node that was being used. With this release, the OLM catalog pods that are backed by a `CatalogSource` correctly recover from planned and unplanned node maintenance. (link:https://issues.redhat.com/browse/OCPBUGS-35305[*~~OCPBUGS-35305~~*]).

Show
* Previously, default Operator Lifecycle Manager (OLM) catalog pods remained in a termination state when there was an outage of the node that was being used. With this release, the OLM catalog pods that are backed by a `CatalogSource` correctly recover from planned and unplanned node maintenance. (link: https://issues.redhat.com/browse/OCPBUGS-35305 [* OCPBUGS-35305 *]).

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Observed behavior: Default OpenShift OLM catalog pods do not survive outage of the node that they are currently being executed on. The pods remain in termination state, despite the tolerations that should move them away from unresponsive nodes latest after 5 minutes.

Impact: Operators can no longer be installed or update from catalogs that were previously executed on a node that has gone down.

Expected behavior: The catalog pods get automatically rescheduled on remaining nodes and their gRPC API endpoint recovers as a result.

clones

OCPBUGS-32183 OLM catalog pods do not recover from node failure

Closed

depends on

OCPBUGS-32183 OLM catalog pods do not recover from node failure

Closed

links to

openshift/operator-framework-olm#779: OCPBUGS-35305: [release-4.15] catalog-operator: delete catalog pods stuck in Terminating state due to unreachable node

RHBA-2024:4151 OpenShift Container Platform 4.15.z bug fix update

Assignee:: Per Goncalves da Silva

Reporter:: Daniel Messer

QA Contact:: Jian Zhang

Need Info From:: None

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/06/11 7:38 PM

Updated:: 2025/09/13 1:59 PM

Resolved:: 2024/07/02 7:33 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates