-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16.z
Description of problem:
For various reasons, Pods may get evicted. Once they are evicted, the owner of the Pod should recreate the Pod so it is scheduled again.
With OLM, we can see that evicted Pods owned by Catalogsources are not rescheduled. The outcome is that all subscriptions have a "ResolutionFailed=True" condition, which hinders an upgrade of the operator. Specifically the customer is seeing an affected CatalogSource is "multicluster-engine-CENSORED_NAME-redhat-operator-index "in openshift-marketplace namespace, pod name: "multicluster-engine-CENSORED_NAME-redhat-operator-index-5ng9j"
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16.21
How reproducible:
Sometimes, when Pods are evicted on the cluster
Steps to Reproduce:
1. Set up an OpenShift Container Platform 4.16 cluster, install various Operators
2. Create a condition that a Node will evict Pods (for example by creating DiskPressure on the Node)
3. Observe if any Pods owned by CatalogSources are being evicted
Actual results:
If Pods owned by CatalogSources are being evicted, they are not recreated / rescheduled.
Expected results:
When Pods owned by CatalogSources are being evicted, they are being recreacted / rescheduled.
Additional info:
- Discussion: https://redhat-internal.slack.com/archives/C3VS0LV41/p1726170881413389?thread_ts=1726126461.479019&cid=C3VS0LV41
- Support Case with "must-gather": 04003784
- blocks
-
OCPBUGS-46598 Evicted Pods owned by Catalogsource are not rescheduled
- ON_QA
- depends on
-
OCPBUGS-45490 Evicted Pods owned by Catalogsource are not rescheduled
- Verified
- is cloned by
-
OCPBUGS-46598 Evicted Pods owned by Catalogsource are not rescheduled
- ON_QA
- links to