Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-8550

QA Timeout during webhook validation causes taskrun to fail

XMLWordPrintable

    • 2
    • Customer Reported

      Description of problem:

      When a Pod fails to be created due to a timeout during webhook validation, the TaskRun is marked as failed. This is a transient error and the TaskRun should not be marked as failed when this happens. There is a new feature flag enable-wait-exponential-backoff upstream which retries pod-creation when a webhook timeout is encountered, but the feature flag is not currently supported by the operator

        conditions:
        - lastTransitionTime: "2025-08-20T04:25:45Z"
          message: 'failed to create task run pod "update-graph-on-pull-request-vlv5s-sast-snyk-check":
            Internal error occurred: failed calling webhook "proxy.operator.tekton.dev":
            failed to call webhook: Post "https://tekton-operator-proxy-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s":
            context deadline exceeded. Maybe missing or invalid Task ocp-virt-images-tenant/'
          reason: PodCreationFailed
          status: "False"
          type: Succeeded
       

      Workaround

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

       # <steps>

       

      Actual results:

      Expected results:

      Reproducibility (Always/Intermittent/Only Once):

      Acceptance criteria: 

      • When a TaskRun's pod creation fails due to a timeout during webhook validation, the TaskRun is not marked as failed but the creation is retried

      Definition of Done:

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

      Example log

       

      {
        "level": "error",
        "ts": "2025-08-20T17:20:24.404Z",
        "logger": "webhook-operator",
        "caller": "proxy/proxy.go:265",
        "msg": "Failed the resource specific defaulter",
        "commit": "e118477487a99b1f0b54abe5b01707e2c9cfc80e",
        "knative.dev/kind": "/v1, Kind=Pod",
        "knative.dev/namespace": "rhel-ai-tenant",
        "knative.dev/name": "nvidia-gcp-bootc-on-pull-re3f0784c0b38b38894cb62cf7e3ebb27b-pod",
        "knative.dev/operation": "CREATE",
        "knative.dev/resource": "/v1, Resource=pods",
        "knative.dev/subresource": "",
        "knative.dev/userinfo": "system:serviceaccount:openshift-pipelines:tekton-pipelines-controller",
        "error": "Get \"https://172.30.0.1:443/api/v1/namespaces/rhel-ai-tenant/configmaps/config-service-cabundle\": context canceled",
        "stacktrace": "github.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).mutate\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:265\ngithub.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).Admit\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:119\nknative.dev/pkg/webhook.New.admissionHandler.func4\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2294\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2822\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/webhook.go:337\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:3301\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:2102"
      } 

       *

              jkhelil abdeljawed khelil
              rh-ee-athorp Andrew Thorp
              Sri Vignesh Selvan Sri Vignesh Selvan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: