-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
2
-
False
-
-
False
-
-
Bug Fix
-
Done
-
-
-
Pipelines Sprint CrookShank 43
Problem Statement: When a PipelineRun is successfully created by the PaC controller, the controller immediately attempts to patch that PipelineRun (e.g., to add labels or annotations). If this subsequent patch operation fails due to a transient Kubernetes API server issue (e.g., temporary network error, API server unavailability, or admission webhook failure), the PaC controller is incorrectly treating this patch failure as a fatal CI failure.
This causes PaC to immediately post a failed check run (e.g., on GitHub) or a failed commit status (e.g., on GitLab, as seen in the "Konflux Production Internal" check).
This behavior is incorrect. A failure in the controller's ability to patch metadata should not be reported to the user as a failure of their CI job, especially since the PipelineRun object was successfully created and is likely executing (or about to execute) in the cluster.
Steps to Reproduce
- A user triggers a PipelineRun via a pull request or push event.
- The PaC controller successfully creates the PipelineRun resource in the cluster.
- The PaC controller immediately attempts to patch the newly created PipelineRun to add metadata (e.g., pipelinesascode.tekton.dev/state: "started").
- This patch call fails for any transient reason (e.g., a momentary API server disconnection, a webhook timeout, or any other k8s server-side issue).
- Observe the commit status on the Git provider.
Actual Result
- A failed check run is created on the Git provider for the commit.
- This gives the user a "false negative," making them believe their PipelineRun or code is broken.
Expected Result
- The controller should log the patch failure as an error (e.g., level=error msg="failed to patch pipelinerun XYZ: ...").
- The controller should re-enqueue the PipelineRun and retry the patch operation according to its standard reconciliation loop.
- No failed check run should be created. The PipelineRun should be allowed to run, and its actual outcome (success or failure) should be the only thing reported as a check run.
Out of Scope
- This ticket is not to fix the underlying Kubernetes API server issues that may be causing the patch to fail. The fix is to make the PaC controller resilient to those failures.
- This ticket does not involve changing the controller's retry or backoff logic, only ensuring that a patch failure correctly uses the existing retry mechanism instead of being treated as a fatal error.
- This ticket does not change the logic for how PaC reports PipelineRuns that run and then genuinely fail (e.g., a test step fails). That remains unchanged.
Acceptance Criteria
- When a PipelineRun is successfully created, a subsequent failure to patch it must not result in a "failed" check run being sent to the Git provider.
- The final check run status (e.g., "success" or "failure") reported to the Git provider must reflect the actual terminal state of the PipelineRun itself, not the transient patch error.
- Creating a "failed" check run is still the correct behavior if the PipelineRun fails to be created in the first place (e.g., a validation error). This functionality must not be broken.