Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.0
Component/s: Installer / Single Node OpenShift
Labels:
- edge-payload
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.18.z
Release Blocker:
Rejected
Sprint:
OCPEDGE Sprint 261
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

[sig-architecture] platform pods in ns/openshift-cluster-storage-operator should not exit an excessive amount of times

The Snapshot controller on SNO is restarting a lot during kube api operator progressing, the error is due to not being able to pull the volume snapshots from kubeapi during start up.

After some investigation I think the best approach here will be to modify the interval the snapshot-controller waits for until it continues it's operation. We can't really do health check probes or startup probes on this deployment since the restart mechanism is part of the operand and it's not kubernetes that's restarting the pod. It might be best to utilize the --retry-crd-interval-max for SNO deployments of the operand to account for the API server not being reachable during rollouts. The operand is applied by the operator with these args and the deployment is ran through a template processor that we should be able to hook into for updating this behavior. (template replace logic)

Note: This error does seem to be present in the 4.17 branches as well

Ex run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29183/pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade/1844749579753361408

is triggered by

OCPBUGS-43059 SNO Connection Error During Upgrades

Closed

Assignee:: Suleyman Akbas

Reporter:: Egli Hila

Need Info From:: None

Contributors:: None

QA Contact:: Neil Hamza

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/10/11 6:04 PM

Updated:: 2025/07/20 1:16 PM

Resolved:: 2025/05/28 1:20 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates