Description of problem:
Subscription processing issues are hard to debug
Version-Release number of selected component (if applicable):
How reproducible:
When there are issues of this type, the debugging hardship is 100% there.
Steps to Reproduce:
1. Have a slow cluster.
2. Create a a CatalogSource, OperatorGroup and Subscription to install an operator
3. Wait for the OLM Job to time out
Actual results:
1. The hex-string-named pod is gone, so it's not possible at this point to figure out what it was stuck on for 10 minutes 2. The Conditions in kubectl describe subsription are a mess. All lumped together so it's hard to see which field applies to which one, and it's not possible to see which ones constitute the current state and which ones are stale. 3. It is not possible to extend the deadline of the job nor have it retain the pod for inspection.
Expected results:
1. Clear, actionable status information about the cause for failure, down to the root cause. 2. Ability to tweak things to extend deadline, retain pods, etc.
Additional info:
In the case I'm facing the operator index pod looks healthy, already 2 minutes after start, and the CatalogSource status agrees.
Conditions on the subscription resource are... unclear. Stale? How can the sources be all healthy and one of them unreachable at the same time?
Conditions:
Last Transition Time: 2025-07-24T09:29:28Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
Message: error using catalogsource stackrox-operator/stackrox-operator-test-index: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.30.4.208:50051: connect: connection refused"
Reason: ErrorPreventedResolution
Status: True
Type: ResolutionFailed
Reason: UnpackingInProgress
Status: True
Type: BundleUnpacking
Message: bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
Reason: BundleUnpackFailed
Status: True
Type: BundleUnpackFailed
the job decided to remove the pod after 10 minutes: Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 14m job-controller Created pod: 9ac57f1c7f9b5705fa3ec9f16aed4e3b7ed23d28d5729f0bea6aeb146fqhtmd
Normal SuccessfulDelete 4m7s job-controller Deleted pod: 9ac57f1c7f9b5705fa3ec9f16aed4e3b7ed23d28d5729f0bea6aeb146fqhtmd
Warning DeadlineExceeded 4m7s job-controller Job was active longer than specified deadline
Slack thread.