-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
-
Story (Required)
As a Developer using Pipelines-as-Code (PaC), trying to have PaC monitor my repository for events (like webhooks) without necessarily running pipelines on every event (like pull requests), I want PaC to handle underlying Kubernetes API errors gracefully without creating a blocking/failed status check on my pull request if no specific PipelineRun was supposed to execute for that event.
This story addresses an issue where Pipelines-as-Code (PaC) encounters an internal error while interacting with the Kubernetes API (e.g., trying to list Repository CRs) upon receiving a webhook event (like a pull request). Currently, PaC reports this internal error as a "failed" status check on the source control platform (e.g., GitHub), even if no PipelineRun was configured to run for that specific event type (like pull requests). This incorrectly blocks pull requests due to platform-level issues unrelated to the PR's code or configured pipelines. This change aims to improve the user experience by preventing PaC from blocking workflows due to internal processing errors when no user-defined pipeline was matched.
Background (Required)
A user reported an issue with the `terraform-redhat/terraform-provider-rhcs` repository integrated with Pipelines-as-Code.
- PaC is receiving webhook events correctly from this repository.
- The repository's PaC configuration (`Repository` CR) does not define any PipelineRuns (PLRs) triggered by pull request events.
- Despite having no matching PLRs for pull requests, PaC is creating a blocking (failed) status check on all pull requests in this repository.
- The error message associated with the failed status check indicates a Kubernetes API communication failure: `There was an issue validating the commit: "Get "https://172.30.0.1:443/apis/pipelinesascode.tekton.dev/v1alpha1/repositories\": http2: client connection lost"`
- This occurs before PaC attempts to find or trigger a matching PipelineRun.
- Current PaC behavior assumes a working Kubernetes API and interprets such communication errors as failures relevant to the specific commit/PR, reporting them back to the SCM.
- Suggestion from discussion (chmouel): PaC could potentially differentiate K8s infrastructure errors from user/configuration errors and report them differently (e.g., as "skipped") rather than "failed", especially when no specific PipelineRun was matched.
Out of scope
- Debugging or fixing the underlying Kubernetes API connectivity issue (`http2: client connection lost` error accessing the API server at `172.30.0.1`). This is assumed to be a platform/infrastructure problem separate from PaC's logic.
- Adding or modifying PipelineRun definitions for the affected repository.
- Changing the webhook receiving mechanism itself.
Approach (Required)
1. Modify the PaC controller logic that handles incoming SCM events (e.g., pull requests).
2. Before creating a status check related to initial processing (like fetching Repository CRs or evaluating policies), introduce error handling that specifically identifies potential Kubernetes API communication errors (e.g., connection issues, network errors, API server unavailability). This might involve checking for specific error types or patterns, potentially using Go's `errors.Is` or `errors.As` for better categorization.
3. If such a K8s API error occurs and no specific PipelineRun has been matched yet for the triggering event:
- PaC should not create a "failed" or "error" status check that is blocking by default.
- Consider creating a "skipped" status check (as suggested in the discussion) or potentially no status check at all for this phase.
- The internal K8s API error should still be logged clearly within the PaC controller logs for platform administrators to debug.
4. If a K8s API error occurs during the execution or monitoring of a matched PipelineRun, the existing error reporting mechanism might still be appropriate ( TBD - needs confirmation if this behaviour should also change). The primary focus here is errors occuring before PLR execution when no PLR is matched for the event.
Dependencies
- Depends on the core PaC controller logic.
- Relies on Go's error handling capabilities.
Acceptance Criteria (Mandatory)
- *AC1:* Given a repository configured in PaC.
- And the PaC configuration for this repository does not include any PipelineRuns triggered by pull request events.
- When a pull request event is received for this repository.
- And the PaC controller encounters a Kubernetes API communication error (e.g., `http2: client connection lost`, timeout, DNS error) while performing initial checks (like fetching `Repository` CRs) before looking for matching PLRs.
- Then PaC must not create a blocking status check (e.g., "failure" or "error") on the pull request in the SCM (e.g., GitHub).
- *AC2:* Given the same scenario as AC1.
- Then the internal Kubernetes API communication error must be logged appropriately in the PaC controller logs.
- *AC3:* (Optional/Alternative to AC1's negative requirement) Given the same scenario as AC1.
- Then PaC may create a non-blocking status check (e.g., "skipped" or a neutral status) on the pull request, indicating an internal issue occurred but did not pertain to a specific pipeline execution failure.
- *AC4:* Given a repository configured in PaC with a PipelineRun defined for pull request events.
- When a pull request event is received.
- And PaC encounters a K8s API error before matching the PLR (as in AC1).
- Then the behavior defined in AC1/AC3 applies (no blocking status check yet).
- *AC5:* Given a repository configured in PaC with a PipelineRun defined for pull request events.
- When a pull request event triggers a PipelineRun.
- And a K8s API error occurs during the process of creating or monitoring the Tekton PipelineRun object itself.
- Then PaC should still report a relevant failure status check for that PipelineRun execution (current behavior likely remains correct here, subject to review).
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met
- is duplicated by
-
SRVKP-7505 PaC creates failed Pull Request check on errors during event validation
-
- Closed
-