-
Bug
-
Resolution: Done
-
Major
-
Pipelines 1.19.3
-
2
-
False
-
-
False
-
-
Enhancement
-
Done
-
-
-
2
-
Pipelines Sprint Pioneers 36
-
Customer Reported
Description of problem:
When a Pod fails to be created due to a timeout during webhook validation, the TaskRun is marked as failed. This is a transient error and the TaskRun should not be marked as failed when this happens. There is a new feature flag enable-wait-exponential-backoff upstream which retries pod-creation when a webhook timeout is encountered, but the feature flag is not currently supported by the operator
conditions: - lastTransitionTime: "2025-08-20T04:25:45Z" message: 'failed to create task run pod "update-graph-on-pull-request-vlv5s-sast-snyk-check": Internal error occurred: failed calling webhook "proxy.operator.tekton.dev": failed to call webhook: Post "https://tekton-operator-proxy-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s": context deadline exceeded. Maybe missing or invalid Task ocp-virt-images-tenant/' reason: PodCreationFailed status: "False" type: Succeeded
Workaround
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
# <steps>
Actual results:
Expected results:
Reproducibility (Always/Intermittent/Only Once):
Acceptance criteria:
- When a TaskRun's pod creation fails due to a timeout during webhook validation, the TaskRun is not marked as failed but the creation is retried
Definition of Done:
Build Details:
Additional info (Such as Logs, Screenshots, etc):
Example log
{ "level": "error", "ts": "2025-08-20T17:20:24.404Z", "logger": "webhook-operator", "caller": "proxy/proxy.go:265", "msg": "Failed the resource specific defaulter", "commit": "e118477487a99b1f0b54abe5b01707e2c9cfc80e", "knative.dev/kind": "/v1, Kind=Pod", "knative.dev/namespace": "rhel-ai-tenant", "knative.dev/name": "nvidia-gcp-bootc-on-pull-re3f0784c0b38b38894cb62cf7e3ebb27b-pod", "knative.dev/operation": "CREATE", "knative.dev/resource": "/v1, Resource=pods", "knative.dev/subresource": "", "knative.dev/userinfo": "system:serviceaccount:openshift-pipelines:tekton-pipelines-controller", "error": "Get \"https://172.30.0.1:443/api/v1/namespaces/rhel-ai-tenant/configmaps/config-service-cabundle\": context canceled", "stacktrace": "github.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).mutate\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:265\ngithub.com/tektoncd/operator/pkg/reconciler/proxy.(*reconciler).Admit\n\t/go/src/github.com/tektoncd/operator/pkg/reconciler/proxy/proxy.go:119\nknative.dev/pkg/webhook.New.admissionHandler.func4\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2294\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2822\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/webhook/webhook.go:337\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\t/go/src/github.com/tektoncd/operator/vendor/knative.dev/pkg/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:3301\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:2102" }
*
- is cloned by
-
SRVKP-8550 QA Timeout during webhook validation causes taskrun to fail
-
- To Do
-