-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
1.7
-
False
-
-
False
-
-
Complete Findings with Screenshots and Technical Details
Error Reproduction
Successfully reproduced the 500/404 error using automated Playwright tests with the following approach:
1. Initial workflow abort: ✅ 200 OK - Successfully aborted
2. "Run again" workflow abort: ❌ 500 Internal Server Error with 404 Not Found
Test Execution Details
- Test File: ui_tests/parallel/orchestrator_tests/userOnBoardingWorkflowAbortComprehensive.spec.ts
- Environment: OCP Edge environment (https://backstage-backstage-rhdh-operator.apps.ocp-edge73-0.qe.lab.redhat.com/)
- Workflow Type: User Onboarding workflow
- Test Strategy: Loop through multiple "run again and abort" attempts with incrementally increasing wait times
API Interception Results
The test intercepted and validated all abort API calls:
Attempt 1 (Initial abort): ✅ 200 OK
- Response: "Workflow instance {id} successfully aborted"
Attempt 2 (After "Run again"): ❌ 500 Internal Server Error
- Root cause: 404 Not Found from internal service
- Internal service call: {{http://user-onboarding.rhdh-operator/management/processes/user-onboarding/instances/ {id}
}}
Key Technical Insights
1. Timing Factor: The error occurs consistently on the second attempt (after 2-second wait) but not on subsequent attempts with longer waits
2. State Management Issue: The internal service appears to lose track of workflow instances when using "Run again"
3. Intermittent Nature: The error is reproducible but not 100% consistent, suggesting a race condition or timing dependency
API Response Data from Test Execution
Successful Abort Response (200 OK):
Request URL: https://backstage-backstage-rhdh-operator.apps.ocp-edge73-0.qe.lab.redhat.com/api/orchestrator/v2/workflows/instances/{id}/abort Request Method: DELETE Status Code: 200 OK Response Body: "Workflow instance {id} successfully aborted"
Failed Abort Response (500 Internal Server Error):
{ "error": { "name": "Error", "message": "HTTP DELETE request to http://user-onboarding.rhdh-operator/management/processes/user-onboarding/instances/df1eacdc-cba3-48f8-adc9-8ac20e65eeb7 failed.\nStatus Code: 404\nStatus Text: Not Found" }, "request": { "method": "DELETE", "url": "/v2/workflows/instances/df1eacdc-cba3-48f8-adc9-8ac20e65eeb7/abort" }, "response": { "statusCode": 500 } }
Network Request Details
Abort API Call Pattern:
- *Endpoint: http://user-onboarding.rhdh-operator/management/processes/user-onboarding/instances/df1eacdc-cba3-48f8-adc9-8ac20e65eeb7*
- Method: DELETE
- Headers: Standard authentication and content-type headers
- Response Time: Varies based on timing (2s wait = 404 error, shorter or longer waits = 200 OK)
Error Reproduction Pattern
Consistent Failure Point:
- Attempt 1: ✅ 200 OK (immediate abort)
- Attempt 2: ❌ 500 Internal Server Error (after 2s wait + "Run again")
This pattern confirms the issue is specifically related to the "Run again" functionality and timing of the abort attempt.
Test Code Highlights
The test successfully:
- Navigates to User Onboarding workflow
- Fills out workflow form fields
- Starts workflow execution
- Aborts running workflow
- Validates abort success
- Uses "Run again" button
- Repeats the entire flow multiple times
- Intercepts and validates all API responses
Screenshots and Evidence
The test captured multiple screenshots during execution:
- Workflow form completion
- Workflow running state
- Abort confirmation dialogs
- Aborted workflow status
- "Run again" button visibility
- Error states and confirmations
Root Cause Analysis
The 404 error suggests that when using "Run again", the internal service user-onboarding.rhdh-operator cannot locate the workflow instance, possibly due to:
- Instance ID mismatch between orchestrator and internal service
- State synchronization issues between services
- Timing-related race conditions in instance registration
- Database or cache inconsistencies
Impact Assessment
- User Experience: Users cannot abort workflows started via "Run again"
- Workflow Management: Breaks the expected workflow lifecycle
- Reliability: Creates inconsistent behavior that undermines user trust
- Support: Generates support tickets and user confusion
- links to