-
Bug
-
Resolution: Done
-
Major
-
None
-
2
-
False
-
-
False
-
Transient Kubernetes errors during a PipelineRun's TaskRuns' validation, such as a k8s api timeout, no longer result in the PIpelineRun being marked as failed. The reconciliation will now be retried
-
Bug Fix
-
Done
-
-
-
Pipelines Sprint Tekshift 28, Pipelines Sprint Tekshift 29
-
Customer Reported
Description of problem:
If an transient k8s error occurs during most steps of PipelineRun reconciliation, it is handled such that the PipelineRun is not marked as failed. However this is not the case during dry-run task validation, due to several places where the error is not wrapped: (1) (2). While the PIpelineRun may get retried in this event, the brief Failed state triggers finalizers such as Tekton Chains to run before the PipelineRun is actually executed.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
With a cloned copy of `tektoncd/pipeline`, apply the following patch:
diff --git i/pkg/reconciler/apiserver/apiserver.go w/pkg/reconciler/apiserver/apiserver.go index 336774a57..180c280c0 100644 --- i/pkg/reconciler/apiserver/apiserver.go +++ w/pkg/reconciler/apiserver/apiserver.go @@ -48,6 +48,8 @@ func DryRunValidate(ctx context.Context, namespace string, obj runtime.Object, t } return mutatedObj, nil case *v1.Task: + return nil, handleDryRunCreateErr(apierrors.NewTimeoutError("timeout err", 5), obj.Name) + dryRunObj := obj.DeepCopy() dryRunObj.Name = dryRunObjName dryRunObj.Namespace = namespace // Make sure the namespace is the same as the TaskRun
Run the following test:
go test -v ./pkg/reconciler/pipelinerun/... -run TestReconcileWithTaskResolver
Actual results:
The tests fail with a PIpelineRun yaml diff indicating the PIpelineRun status is failed:
logger.go:146: 2025-05-02T10:45:02.901-0400 DEBUG TestReconcileWithTaskResolver pipelinerun/reconciler.go:325 Updating status with: v1.PipelineRunStatus{ Status: v1.Status{ ObservedGeneration: 0, Conditions: v1.Conditions{ - { - Type: "Succeeded", - Status: "Unknown", - LastTransitionTime: apis.VolatileTime{Inner: s"2025-05-02 10:45:02.88523549 -0400 EDT m=+0.166698213"}, - Reason: "ResolvingTaskRef", - Message: "PipelineRun default/pr awaiting remote resource", - }, + { + Type: "Succeeded", + Status: "False", + LastTransitionTime: apis.VolatileTime{Inner: s"2025-05-02 10:45:02.899741337 -0400 EDT m=+0.181204082"}, + Reason: "CouldntGetTask", + Message: `Pipeline default/pr can't be Run; it contains Tasks that don't exist: Couldn't retrieve Task "resolver type foobar\n": retryable`..., + }, },
Expected results:
The tests should fail due to a retryable error being returned
logger.go:146: 2025-05-02T11:41:39.709-0400 ERROR TestReconcileWithTaskResolver pipelinerun/pipelinerun.go:267 Reconcile error: retryable error validating referenced object foo: Timeout: timeout err logger.go:146: 2025-05-02T11:41:39.709-0400 ERROR TestReconcileWithTaskResolver pipelinerun/reconciler.go:295 Returned an error {"targetMethod": "ReconcileKind", "error": "1 error occurred:\n\t* retryable error validating referenced object foo: Timeout: timeout err\n\n"} pipelinerun_test.go:9286: Error reconciling: 1 error occurred: * retryable error validating referenced object foo: Timeout: timeout err
Reproducibility (Always/Intermittent/Only Once):
Acceptance criteria:
Definition of Done:
Build Details:
Additional info (Such as Logs, Screenshots, etc):