Description of problem:
Subscription processing issues are hard to debug
Version-Release number of selected component (if applicable):
How reproducible:
When there are issues of this type, the debugging hardship is 100% there.
Steps to Reproduce:
1. Have a slow cluster. 2. Create a a CatalogSource, OperatorGroup and Subscription to install an operator 3. Wait for the OLM Job to time out
Actual results:
1. The hex-string-named pod is gone, so it's not possible at this point to figure out what it was stuck on for 10 minutes 2. The Conditions in kubectl describe subsription are a mess. All lumped together so it's hard to see which field applies to which one, and it's not possible to see which ones constitute the current state and which ones are stale. 3. It is not possible to extend the deadline of the job nor have it retain the pod for inspection.
Expected results:
1. Clear, actionable status information about the cause for failure, down to the root cause. 2. Ability to tweak things to extend deadline, retain pods, etc.
Additional info:
In the case I'm facing the operator index pod looks healthy, already 2 minutes after start, and the CatalogSource status agrees. Conditions on the subscription resource are... unclear. Stale? How can the sources be all healthy and one of them unreachable at the same time? Conditions: Last Transition Time: 2025-07-24T09:29:28Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Message: error using catalogsource stackrox-operator/stackrox-operator-test-index: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.30.4.208:50051: connect: connection refused" Reason: ErrorPreventedResolution Status: True Type: ResolutionFailed Reason: UnpackingInProgress Status: True Type: BundleUnpacking Message: bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline Reason: BundleUnpackFailed Status: True Type: BundleUnpackFailed the job decided to remove the pod after 10 minutes: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 14m job-controller Created pod: 9ac57f1c7f9b5705fa3ec9f16aed4e3b7ed23d28d5729f0bea6aeb146fqhtmd Normal SuccessfulDelete 4m7s job-controller Deleted pod: 9ac57f1c7f9b5705fa3ec9f16aed4e3b7ed23d28d5729f0bea6aeb146fqhtmd Warning DeadlineExceeded 4m7s job-controller Job was active longer than specified deadline
Slack thread.