-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
3
-
False
-
None
-
False
-
This fix address the lack of retry on transient kubernetes errors during remote resolution for tasks and pipelines.
-
Bug Fix
-
-
-
Pipelines Sprint Pioneers 2, Pipelines Sprint Pioneers 3, Pipelines Sprint Pioneers 4, Pipelines Sprint Pioneers 5, Pipelines Sprint Pioneers 6, Pipelines Sprint Pioneers 7, Pipelines Sprint Pioneers 8
-
Important
A few slack threads exist, but the most active is https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1713252235282299
During both sides of remote resolution (core controller and resolver) typically transient kubernetes errors were being treated as permanent knative errors and no attempts at trying to reconcile again were made, leading to failures which could be avoided.
I've been collaborating with rh-ee-kbaig and sashture from openshift pipelines
We have upstream PRs https://github.com/tektoncd/pipeline/pull/7894 and https://github.com/tektoncd/pipeline/pull/7893 up for this.
The core server side logging also does not account for bundle based task names correctly. If we can sort out that fix as part of our changes we will. Otherwise, we'll open something separate for that.
An example log snippet from the core controller
Pipeline rh-acs-tenant/operator-on-pull-request-bwqxj can't be Run; it contains Tasks that don't exist: Couldn't retrieve Task "": retryable error validating referenced object source-build: Internal error occurred: failed calling webhook "validation.webhook.pipeline.tekton.dev": failed to call webhook: Post "https://tekton-pipelines-webhook.openshift-pipelines.svc:443/resource-validation?timeout=10s": context deadline exceeded
Accompanying log snippet from the resolver
{{
}}
- is related to
-
SRVKP-4442 need to tune tekton webhooks from TektonConfig
- Closed
- links to