Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-9044

[Konflux/PaC] GitLab Pipeline Status is Incorrect/Stale After KubeAPI 'PipelineRun Already Exists' Error

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Pipelines as Code

      Slack Thread: https://redhat-internal.slack.com/archives/C04PZ7H0VA8/p1759167562126729

      As a Konflux Tenant User trying to check the status of my Merge Request pipeline I want the GitLab pipeline status and link to accurately reflect the latest PipelineRun's status and URL, even when the initial run fails due to a platform issue.

      As a Konflux user, when a Merge Request triggers a pipeline, I want the status reported back to GitLab to be accurate. Currently, when the initial PipelineRun creation fails due to a KubeAPI 'already exists' error, the GitLab pipeline job continues to refer to the failed/stale PipelineRun, leading to an incorrect status and URL redirection.

      Background (Required)

      Users (specifically in the rhel-ai-tenant) have reported that when a PipelineRun is triggered for a Merge Request, the GitLab pipeline status shows a generic failure like "Konflux Production Internal" instead of the actual triggered pipeline status.

      Analysis of logs (Splunk) confirmed a two-part issue:

      1. A KubeAPI error occurs during the initial PipelineRun creation, logging the error: pipelineruns.tekton.dev "..." already exists, the server was not able to generate a unique name for the object. This is an underlying platform issue (suspected to be related to a pending OCP bug release).
      1. Following this creation failure, the PAC (Pipeline as Code) component does not properly update the GitLab status/tracking. The GitLab pipeline job retains the status and URL of the failed/stale PipelineRun, failing to track the new attempt, leading to a stale/incorrect status and redirection.

      Out of scope

      • Fixing the underlying KubeAPI race condition/issue that causes the "already exists" error for the PipelineRun object. (This is a platform/OCP issue).
      • Investigating the high GitLab API rate limit consumption (unless it is directly identified as the root cause for PAC failing to update status).

      Approach (Required)

      1. Investigate PAC's behavior upon receiving a PipelineRun creation failure (specifically the already exists error).
      1. Ensure PAC handles the failure gracefully:
        • Implement logic to correctly identify the current/newest attempt at the PipelineRun and update the GitLab Pipeline status and redirection URL to point to this correct entity or a meaningful error page/log.
      1. Enhance error visibility:
        • Check the possibility of explicitly displaying this specific KubeAPI failure within Konflux/GitLab status instead of a generic "Konflux Production Internal" failure.
      1. If necessary, look into mechanisms to ensure the failed PipelineRun is immediately cleaned up/cancelled by PAC.

      Dependencies

      • Potential dependency on an OCP bug fix related to the KubeAPI 'already exists' issue, but the PAC fix should proceed to handle the symptom (incorrect GitLab status and redirection) regardless of the OCP fix timeline.
      • Require input/clarification from PAC SMEs on how cancelled/new PipelineRuns are tracked and how the GitLab job redirects to the Konflux UI PR.

      Acceptance Criteria (Mandatory)

      • Given a Merge Request triggers a Konflux pipeline.
      • And the initial PipelineRun creation attempt fails due to the KubeAPI error pipelineruns.tekton.dev "..." already exists.
      • When the process attempts to re-run or recover.
      • Then the GitLab Pipeline status for the MR must be updated to reflect the status of the latest/active PipelineRun attempt.
      • And the Konflux PipelineRun link in GitLab must redirect to the correct and active PipelineRun details, or a clear error/log if the process is permanently failed.
      • And the GitLab status should display a more informative error message than a generic "Konflux Production Internal" failure when the kubeapi failure occurs.

              rh-ee-zashaikh Zaki Shaikh
              rh-ee-anataraj Anitha Natarajan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: