Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-10092

E2E test TestGithubPushRequestGitOpsCommentCancel intermittently fails due to race condition

XMLWordPrintable

      Description of problem:

      The E2E test TestGithubPushRequestGitOpsCommentCancel in the GitHub push/cancel flow was experiencing intermittent failures due to timing and race conditions.

      Root Cause:

      1. Repository status was checked after CR was deleted during test teardown
      1. Overly broad regex pattern could match incorrect PipelineRuns
      1. Race condition when PipelineRun completed before cancel comment was processed

      Workaround: Already fixed in commit cfdd3428e

      Prerequisites (if any, like setup, operators/versions):

      • E2E test environment with GitHub integration
      • Pipelines-as-Code controller running
      • GitHub App configured for E2E tests

      Steps to Reproduce

      1. Run make test-e2e TEST_ARGS="-run TestGithubPushRequestGitOpsCommentCancel"
      1. Observe intermittent failures
      1. Error message: "neither a cancelled pipelinerun in repo status or a request to skip the cancellation in the controller log was found"

      Actual results:

      Test fails intermittently with error about not finding cancelled PipelineRun status or skip message in logs. Failure rate depends on timing of PipelineRun completion vs cancellation comment processing.

      Expected results:

      Test should pass consistently regardless of race condition timing, properly handling both scenarios:

      • PipelineRun gets cancelled successfully
      • PipelineRun completes before cancel is processed (skip message logged)

      Reproducibility (Always/Intermittent/Only Once):

      Intermittent - Depends on race condition timing

      Acceptance criteria:

      • Test waits for specific PipelineRun (not generic match)
      • Repository status verified before test teardown
      • Regex pattern specific to actual PipelineRun name/namespace
      • Test handles both fast and normal cancellation scenarios
      • Enhanced logging for debugging

      Definition of Done:

      ✓ Fix committed in cfdd3428e
      ✓ Test now tracks specific PipelineRun via annotations
      ✓ Uses UntilPipelineRunHasReason for precise waiting
      ✓ Regex pattern includes actual namespace/PR name
      ✓ Repository status checked before cleanup
      ✓ Logging enhanced with original PipelineRun names

      Build Details:

      • Branch: investigaste-e2e-failure
      • Commit: cfdd3428e
      • Files modified:
        **test/github_push_retest_test.go (+56/-26 lines)
        • pkg/pipelineascode/cancel_pipelineruns.go (+3/-1 lines)

      Additional info (Such as Logs, Screenshots, etc):

      Investigation Summary:
      The test had two validation paths: check Repository status for Cancelled condition OR find skip message in logs. However:

      1. Repository CR was deleted during NSTearDown before status check
      1. Regex pattern .pipelinerun.*skipping cancelling pipelinerun.*on-push.*already done. matched ANY pipelinerun with "on-push"
      1. No verification that the correct PipelineRun was being checked

      Fix Details:

      1. Identify specific PipelineRun using keys.OriginalPRName and keys.EventType annotations
      1. Wait for cancellation using existing UntilPipelineRunHasReason helper
      1. Use precise regex: .skipping cancelling pipelinerun %s/%s.*already done. with actual namespace/name
      1. Verify status BEFORE teardown
      1. Increase log capture from 20 to 100 lines

      Related commit: cfdd3428e - "test: Fix flaky TestGithubPushRequestGitOpsCommentCancel E2E test"

              cboudjna@redhat.com Chmouel Boudjnah
              cboudjna@redhat.com Chmouel Boudjnah
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: