Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-6740

Concurrency Failure Handling During Pipeline Run Execution

XMLWordPrintable

    • 24
    • False
    • None
    • False
    • Hide
      Enhanced the pipeline's concurrency management to handle high-load scenarios more effectively. Previously, when the cluster was busy, certain pipeline runs could fail to release locks, leading to deadlocks that stalled the progress of subsequent runs. With this fix, pipeline runs encountering concurrency failures will release locks correctly, allowing the next run to proceed without interruption.
      Show
      Enhanced the pipeline's concurrency management to handle high-load scenarios more effectively. Previously, when the cluster was busy, certain pipeline runs could fail to release locks, leading to deadlocks that stalled the progress of subsequent runs. With this fix, pipeline runs encountering concurrency failures will release locks correctly, allowing the next run to proceed without interruption.

      Story:

      As a developer,

      I want the pipeline to handle concurrency failures gracefully when the cluster becomes busy,
      so that pipeline runs continue to progress without encountering deadlocks.

      Description of problem:

      When pipeline runs encounter concurrency failures under high load, the process currently fails to release locks properly. This causes a deadlock as the semaphore fails to remove the lock. This issue arises specifically when the underlying cluster experiences delays in updating the pipeline run's "in progress" status.

      To reproduce this issue, modify the updatePipelineRunToInProgress function to simulate concurrency failures with the following test setup:

      1. Set up a repository with three pipeline runs: test-1, test-2, and test-3, all matching a pull request.
      2. Configure a concurrency limit of 1 in the repository specification.
      3. When the pull request is executed:
        • test-1 should run successfully.
        • test-2 should encounter an error.
        • test-3 should be triggered and start running.

      In a high-load scenario, test-2 fails but test-3 should still start; however, due to a deadlock caused by lock retention, test-3 remains stalled.

      Prerequisites (if any, like setup, operators/versions):

      • The pipeline should gracefully handle concurrency failures without causing a deadlock.
      • Pipeline runs should continue to the next available run when one run encounters a failure.
      • The semaphore should release locks appropriately to prevent deadlocks.

      Steps to Reproduce

      1. Implement the following patch in reconciler/reconciler.go:

      func randomError(prn string) error {
        if strings.HasPrefix(prn, "test-2")

      {     return fmt.Errorf("DEBUG: 😈 randomly failing this PipelineRun: %s", prn)   }

        return nil
      }
       
      Add this at the beginning of the updatePipelineRunToInProgress function:
      if err := randomError(pr.GetName()); err != nil {
        return err
      }

      1. Trigger a pull request to initiate test-1, test-2, and test-3 with the concurrency limit set to 1.
      1. Verify that when test-2 fails, test-3 starts without encountering a deadlock.

        Notes

      • The issue occurs only under high load, so modify updatePipelineRunToInProgress for effective simulation.
      • This patch introduces a simulated random error for test cases to stress-test the concurrency behavior under load.

      Expected results:

      Reproducibility (Always/Intermittent/Only Once):

      Acceptance criteria: 

       

      Definition of Done:

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

       

       *

              cboudjna@redhat.com Chmouel Boudjnah
              cboudjna@redhat.com Chmouel Boudjnah
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: