Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-7593

Retryable Errors During TaskRun Dry-Run Validation Cause PLR to Fail

XMLWordPrintable

    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • Transient Kubernetes errors during a PipelineRun's TaskRuns' validation, such as a k8s api timeout, no longer result in the PIpelineRun being marked as failed. The reconciliation will now be retried
    • Bug Fix
    • Done
    • Pipelines Sprint Tekshift 28, Pipelines Sprint Tekshift 29
    • Customer Reported

      Description of problem:

      If an transient k8s error occurs during most steps of PipelineRun reconciliation, it is handled such that the PipelineRun is not marked as failed. However this is not the case during dry-run task validation, due to several places where the error is not wrapped:  (1) (2). While the PIpelineRun may get retried in this event, the brief Failed state triggers finalizers such as Tekton Chains to run before the PipelineRun is actually executed.

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      With a cloned copy of `tektoncd/pipeline`,  apply the following patch:

       

      diff --git i/pkg/reconciler/apiserver/apiserver.go w/pkg/reconciler/apiserver/apiserver.go
      index 336774a57..180c280c0 100644
      --- i/pkg/reconciler/apiserver/apiserver.go
      +++ w/pkg/reconciler/apiserver/apiserver.go
      @@ -48,6 +48,8 @@ func DryRunValidate(ctx context.Context, namespace string, obj runtime.Object, t
                      }
                      return mutatedObj, nil
              case *v1.Task:
      +               return nil, handleDryRunCreateErr(apierrors.NewTimeoutError("timeout err", 5), obj.Name)
      +
                      dryRunObj := obj.DeepCopy()
                      dryRunObj.Name = dryRunObjName
                      dryRunObj.Namespace = namespace // Make sure the namespace is the same as the TaskRun 

       

       

      Run the following test:

      go test -v ./pkg/reconciler/pipelinerun/... -run TestReconcileWithTaskResolver 

       

      Actual results:

      The tests fail with a PIpelineRun yaml diff indicating the PIpelineRun status is failed:

          logger.go:146: 2025-05-02T10:45:02.901-0400 DEBUG   TestReconcileWithTaskResolver   pipelinerun/reconciler.go:325   Updating status with:   v1.PipelineRunStatus{
                      Status: v1.Status{
                              ObservedGeneration: 0,
                              Conditions: v1.Conditions{
              -                       {
              -                               Type:               "Succeeded",
              -                               Status:             "Unknown",
              -                               LastTransitionTime: apis.VolatileTime{Inner: s"2025-05-02 10:45:02.88523549 -0400 EDT m=+0.166698213"},
              -                               Reason:             "ResolvingTaskRef",
              -                               Message:            "PipelineRun default/pr awaiting remote resource",
              -                       },
              +                       {
              +                               Type:               "Succeeded",
              +                               Status:             "False",
              +                               LastTransitionTime: apis.VolatileTime{Inner: s"2025-05-02 10:45:02.899741337 -0400 EDT m=+0.181204082"},
              +                               Reason:             "CouldntGetTask",
              +                               Message:            `Pipeline default/pr can't be Run; it contains Tasks that don't exist: Couldn't retrieve Task "resolver type foobar\n": retryable`...,
              +                       },
                              }, 

      Expected results:

      The tests should fail due to a retryable error being returned

          logger.go:146: 2025-05-02T11:41:39.709-0400 ERROR   TestReconcileWithTaskResolver   pipelinerun/pipelinerun.go:267  Reconcile error: retryable error validating referenced object foo: Timeout: timeout err
          logger.go:146: 2025-05-02T11:41:39.709-0400 ERROR   TestReconcileWithTaskResolver   pipelinerun/reconciler.go:295   Returned an error  {"targetMethod": "ReconcileKind", "error": "1 error occurred:\n\t* retryable error validating referenced object foo: Timeout: timeout err\n\n"}
          pipelinerun_test.go:9286: Error reconciling: 1 error occurred:
                      * retryable error validating referenced object foo: Timeout: timeout err
       

      Reproducibility (Always/Intermittent/Only Once):

      Acceptance criteria: 

       

      Definition of Done:

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

       

       *

              rh-ee-athorp Andrew Thorp
              rh-ee-athorp Andrew Thorp
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: