Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: Pipelines 1.17.0
Affects Version/s: Pipelines 1.17.0
Component/s: Pipelines as Code
Labels:
- Ford

Story Points:
24
Blocked:
False
Blocked Reason:
None
Ready:
False
Release Note Text:

Hide
Enhanced the pipeline's concurrency management to handle high-load scenarios more effectively. Previously, when the cluster was busy, certain pipeline runs could fail to release locks, leading to deadlocks that stalled the progress of subsequent runs. With this fix, pipeline runs encountering concurrency failures will release locks correctly, allowing the next run to proceed without interruption.

Show
Enhanced the pipeline's concurrency management to handle high-load scenarios more effectively. Previously, when the cluster was busy, certain pipeline runs could fail to release locks, leading to deadlocks that stalled the progress of subsequent runs. With this fix, pipeline runs encountering concurrency failures will release locks correctly, allowing the next run to proceed without interruption.
Git Pull Request:
https://github.com/openshift-pipelines/pipelines-as-code/pull/1810
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story:

As a developer,

I want the pipeline to handle concurrency failures gracefully when the cluster becomes busy,
so that pipeline runs continue to progress without encountering deadlocks.

Description of problem:

When pipeline runs encounter concurrency failures under high load, the process currently fails to release locks properly. This causes a deadlock as the semaphore fails to remove the lock. This issue arises specifically when the underlying cluster experiences delays in updating the pipeline run's "in progress" status.

To reproduce this issue, modify the updatePipelineRunToInProgress function to simulate concurrency failures with the following test setup:

Set up a repository with three pipeline runs: test-1, test-2, and test-3, all matching a pull request.
Configure a concurrency limit of 1 in the repository specification.
When the pull request is executed:
- test-1 should run successfully.
- test-2 should encounter an error.
- test-3 should be triggered and start running.

In a high-load scenario, test-2 fails but test-3 should still start; however, due to a deadlock caused by lock retention, test-3 remains stalled.

Prerequisites (if any, like setup, operators/versions):

The pipeline should gracefully handle concurrency failures without causing a deadlock.
Pipeline runs should continue to the next available run when one run encounters a failure.
The semaphore should release locks appropriately to prevent deadlocks.

Steps to Reproduce

Implement the following patch in reconciler/reconciler.go:

func randomError(prn string) error {
if strings.HasPrefix(prn, "test-2")

{ return fmt.Errorf("DEBUG: 😈 randomly failing this PipelineRun: %s", prn) }

return nil
}

Add this at the beginning of the updatePipelineRunToInProgress function:
if err := randomError(pr.GetName()); err != nil {
return err
}

Trigger a pull request to initiate test-1, test-2, and test-3 with the concurrency limit set to 1.

Verify that when test-2 fails, test-3 starts without encountering a deadlock.
Notes

The issue occurs only under high load, so modify updatePipelineRunToInProgress for effective simulation.
This patch introduces a simulated random error for test cases to stress-test the concurrency behavior under load.

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Acceptance criteria:

Definition of Done:

Build Details:

Additional info (Such as Logs, Screenshots, etc):

*

Assignee:: Chmouel Boudjnah

Reporter:: Chmouel Boudjnah

QA Contact:: Zaki Shaikh

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/11/07 1:27 PM

Updated:: 2024/12/02 7:37 AM

Resolved:: 2024/11/18 5:08 PM

Details

Description

Story:

As a developer,

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Notes

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Acceptance criteria:

Build Details:

Additional info (Such as Logs, Screenshots, etc):

*

Attachments

Easy Agile Planning Poker

Activity

People

Dates