-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
None
-
Unset
-
None
-
-
-
-
Access & Management Sprint 120, Access & Management Sprint 121
Cost Management is seeing intermittent 403 responses in the Stage environment when retrieving user entitlements through the Gateway. This issue has been observed in both Cost Management API test automation and UI automation, and it causes test setup failures and inconsistent results.
The error occurs randomly, most requests succeed, but a subset fail with:
HTTP response body: {"errors":[{"detail":"Unable to retrieve user entitlements in the stage environment","meta":{"response_by":"gateway"},"status":403}]}
They are using an org admin user (their own user) that should have access to everything. This org. admin user is used by UI test automation. API test automation is using service account with "Cloud administrator", "Cost administrator" and "User Access administrator" roles. The entitlements issues are observed both in API and UI - so both with service account and org. admin user.
Based on Kibana logs and analysis, all failing requests show "entitlements_time_taken": 10, suggesting a 10-second timeout when Gateway communicates with the Entitlements service. Successful requests typically show either "entitlements_cache_hit": true or "entitlements_time_taken": < 1.
Example log excerpt:
"status":"403", "response_by":"gateway", "entitlements_cache_hit":"false", "entitlements_time_taken":10, "authorization_forwarded":"false", "request":"GET /api/cost-management/v1/organizations/aws/"
Impact:
- Causes automated test jobs (API + UI) to fail intermittently.
- Impacts Cost Management QE using Stage for validation.
- Error rate is low overall but disruptive to automation reliability.
Findings so far:
- Affected users are org admins with valid entitlements.
- Not related to subscription changes or insights-qa org accounts.
- Confirmed by multiple users and visible in Kibana logs.
- Gateway timeout to Entitlements service is set to 10s.
- Appears to occur randomly, not tied to request frequency or caching.
References:
- Kibana examples:
- Glitchtip alert that might be related: https://glitchtip.devshift.net/insights/issues/3957064
- Example job failure

Requested Action:
Investigate intermittent timeouts between Gateway and Entitlements service in Stage. Determine if the 10s timeout threshold is too low or if there are performance or caching issues on the Entitlements side.
Notes:
- Cost Management team is adding retries as a temporary workaround.