Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-8377

Timeout during webhook validation causes taskrun to fail

XMLWordPrintable

    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      Eliminated proxy webhook timeout issues during high-concurrency load tests by removing synchronous API calls that checked ConfigMap existence during pod admission. The webhook now uses optional ConfigMap volumes that gracefully handle missing CA bundles without blocking pod creation, improving scalability while maintaining all existing SSL certificate functionality.
      Show
      Eliminated proxy webhook timeout issues during high-concurrency load tests by removing synchronous API calls that checked ConfigMap existence during pod admission. The webhook now uses optional ConfigMap volumes that gracefully handle missing CA bundles without blocking pod creation, improving scalability while maintaining all existing SSL certificate functionality.
    • Enhancement
    • Done
    • 2
    • Pipelines Sprint Pioneers 36
    • Customer Reported

      Description of problem:

      When a Pod fails to be created due to a timeout during webhook validation, the TaskRun is marked as failed. This is a transient error and the TaskRun should not be marked as failed when this happens. There is a new feature flag enable-wait-exponential-backoff upstream which retries pod-creation when a webhook timeout is encountered, but the feature flag is not currently supported by the operator

        conditions:
        - lastTransitionTime: "2025-08-20T04:25:45Z"
          message: 'failed to create task run pod "update-graph-on-pull-request-vlv5s-sast-snyk-check":
            Internal error occurred: failed calling webhook "proxy.operator.tekton.dev":
            failed to call webhook: Post "https://tekton-operator-proxy-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s":
            context deadline exceeded. Maybe missing or invalid Task ocp-virt-images-tenant/'
          reason: PodCreationFailed
          status: "False"
          type: Succeeded
       

      Workaround

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

       # <steps>

       

      Actual results:

      Expected results:

      Reproducibility (Always/Intermittent/Only Once):

      Acceptance criteria: 

      • When a TaskRun's pod creation fails due to a timeout during webhook validation, the TaskRun is not marked as failed but the creation is retried

      Definition of Done:

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

      Example log

       

      {
        "level": "error",
        "ts": "2025-08-20T17:20:24.404Z",
        "logger": "webhook-operator",
        "caller": "proxy/proxy.go:265",
        "msg": "Failed the resource specific defaulter",
        "commit": "e118477487a99b1f0b54abe5b01707e2c9cfc80e",
        "knative.dev/kind": "/v1, Kind=Pod",
        "knative.dev/namespace": "rhel-ai-tenant",
        "knative.dev/name": "nvidia-gcp-bootc-on-pull-re3f0784c0b38b38894cb62cf7e3ebb27b-pod",
        "knative.dev/operation": "CREATE",
        "knative.dev/resource": "/v1, Resource=pods",
        "knative.dev/subresource": "",
        "knative.dev/userinfo": "system:serviceaccount:openshift-pipelines:tekton-pipelines-controller",
        "error": "Get \"https://172.30.0.1:443/api/v1/namespaces/rhel-ai-tenant/configmaps/config-service-cabundle\": context canceled",
        "stacktrace": "github.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).mutate\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:265\ngithub.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).Admit\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:119\nknative.dev/pkg/webhook.New.admissionHandler.func4\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2294\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2822\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/webhook.go:337\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:3301\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:2102"
      } 

       *

              jkhelil abdeljawed khelil
              rh-ee-athorp Andrew Thorp
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: