Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: Pipelines 5.0.5, Pipelines 1.16.0
Affects Version/s: None
Component/s: Tekton Pipelines
Labels:
- 1.16.0-rc2
- customer
- konflux
- resolver

Story Points:
3
Blocked:
False
Blocked Reason:
None
Ready:
False
Release Note Text:
This fix address the lack of retry on transient kubernetes errors during remote resolution for tasks and pipelines.
Release Note Type:
Bug Fix
Git Pull Request:
https://github.com/tektoncd/pipeline/pull/7894
Intelligence Requested:
Market:

Sprint:
Pipelines Sprint Pioneers 2, Pipelines Sprint Pioneers 3, Pipelines Sprint Pioneers 4, Pipelines Sprint Pioneers 5, Pipelines Sprint Pioneers 6, Pipelines Sprint Pioneers 7, Pipelines Sprint Pioneers 8
Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

A few slack threads exist, but the most active is https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1713252235282299

During both sides of remote resolution (core controller and resolver) typically transient kubernetes errors were being treated as permanent knative errors and no attempts at trying to reconcile again were made, leading to failures which could be avoided.

I've been collaborating with rh-ee-kbaig and sashture from openshift pipelines

We have upstream PRs https://github.com/tektoncd/pipeline/pull/7894 and https://github.com/tektoncd/pipeline/pull/7893 up for this.

The core server side logging also does not account for bundle based task names correctly. If we can sort out that fix as part of our changes we will. Otherwise, we'll open something separate for that.

An example log snippet from the core controller
Pipeline rh-acs-tenant/operator-on-pull-request-bwqxj can't be Run; it contains Tasks that don't exist: Couldn't retrieve Task "": retryable error validating referenced object source-build: Internal error occurred: failed calling webhook "validation.webhook.pipeline.tekton.dev": failed to call webhook: Post "https://tekton-pipelines-webhook.openshift-pipelines.svc:443/resource-validation?timeout=10s": context deadline exceeded

Accompanying log snippet from the resolver
{{

{"level":"error","ts":"2024-04-17T10:50:05.866Z","logger":"controller","caller":"controller/controller.go:566","msg":"Reconcile error","commit":"f0a1d64","knative.dev/traceid":"b893d6a6-2eb7-4a53-b502-1348803a7085","knative.dev/key":"rh-acs-tenant/bundles-780a1fe396cb0f8c702b34e9289fc770","duration":"10.3628985s","error":"error updating resource request \"rh-acs-tenant/bundles-780a1fe396cb0f8c702b34e9289fc770\" with data: Internal error occurred: failed calling webhook \"webhook.pipeline.tekton.dev\": failed to call webhook: Post \"https://tekton-pipelines-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s\": context deadline exceeded","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/go/src/github.com/tektoncd/pipeline/vendor/knative.dev/pkg/controller/controller.go:491"}

}}

is related to

SRVKP-4442 need to tune tekton webhooks from TektonConfig

Closed

links to

upstream PR

Assignee:: Gabe Montero

Reporter:: Danny Baez

Contributors:: Khurram Baig, Savita .

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/18 6:41 PM

Updated:: 2024/11/07 4:56 PM

Resolved:: 2024/11/07 4:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates