Loading...

XML

Word

Printable

Type: Sub-task
Resolution: Done
Priority: Normal
Fix Version/s: ConsoleDot CY24Q2
Affects Version/s: None
Component/s: 3Scale
Labels:
- hcc-incident-rca
- platform-accessmanagement

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
BZ requires_doc_text:
Unset
BZ Keywords:
- Unset
Intelligence Requested:
Market:

Sprint:
Access & Management Sprint 89, Access & Management Sprint 90, Access & Management Sprint 91, Access & Management Sprint 92, Access & Management Sprint 93, Access & Management Sprint 94, A&M Tech Debt Q10, Access & Management Sprint 95, Access & Management Sprint 95, Access & Management Sprint 96, Access & Management Sprint 97, Access & Management Sprint 98, Access & Management Sprint 99, Access & Management Sprint 100, A&M Tech Debt Sprint Q4 2024, Access & Management Sprint 101, Access & Management Sprint 102, Access & Management Sprint 103, Access & Management Sprint 104, Access & Management Sprint 105, A&M Tech Debt Sprint Q1 2025, Access & Management Sprint 106, Access & Management Sprint 107, Access & Management Sprint 108, Access & Management Sprint 109, Access & Management Sprint 110, Access & Management Sprint 111, A&M Tech Debt Sprint Q2 2025, Access & Management Sprint 112, Access & Management Sprint 113, Access & Management Sprint 114, A&M Tech Debt Sprint Q3 2025, Access & Management Sprint 115

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Spike on changes to the upstream apicast base image to make sure there wasn't a change that upped CPU consumption/etc

UPDATE 07-04-2024

The root cause of the high latency has been determined. This PR in our fork in APIcast: https://github.com/RedHatInsights/APIcast/pull/11 synced our fork with the upstream repo. This pulled in a change from upstream that broke the functionality around disabling auto refreshing of the apicast configuration.

Right now we configure apicast with the following:

APICAST_CONFIGURATION_CACHE: -1
APICAST_CONFIGURATION_LOADER: boot

Per apicast docs, this is valid: https://github.com/3scale/APIcast/blob/master/doc/parameters.md
This means we would like to load config on boot, and never auto refresh it

This bit of code: https://github.com/RedHatInsights/APIcast/pull/11/files#diff-69edea98a4b41fba6e3d5f4fdcb9867158d5bd38eee86b79c2497f9944f482e9R207-R221 broke this functionality by calling the "schedule" method (line 218) once on startup. It passes the "handler" method as the scheduled callback. The "handler" method will then call itself recursively at the specified "interval", which is set to APICAST_CONFIGURATION_CACHE from our env: https://github.com/RedHatInsights/APIcast/pull/11/files#diff-69edea98a4b41fba6e3d5f4fdcb9867158d5bd38eee86b79c2497f9944f482e9R204
With the interval set to -1, apicast calls the "ngx.timer.at" method from openresty, which takes the interval var and schedules the task with a delay of <interval> seconds. It appears to interpret -1 as 0 aka no delay. ngx.timer.at docs: https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#ngxtimerat

This results in high latency because ngx.timer.at spawns a new lua coroutine on every call. This appears to eat up our cpu to its limit even when the gateway is completely idle (shown by deploying in ephemeral and monitoring metrics). This likely then causes all other coroutines (like the ones spawned to service requests) to eventually time out as there is no more cpu to handle them. We think this is why calls to cloudwatch aggregator hung indefinitely and crashed the service, since it probably couldnt respond to k8s health checks.

Immediate mitigation: we can set "APICAST_CONFIGURATION_CACHE" to a very high number, 30 days/2592000 seconds, in order to prevent it from running nonstop with no delay. Since we don't actually use the auto refreshed configuration, it doesn't matter how frequently this runs.

Root cause fix: What we need if for apicast to do its magic on boot once only, then schedule auto refresh completely separately. As it stands, these conditions are never actually checked: https://github.com/RedHatInsights/APIcast/pull/11/files#diff-69edea98a4b41fba6e3d5f4fdcb9867158d5bd38eee86b79c2497f9944f482e9R204

Assignee:: Daniel Agbay

Reporter:: Ashley Morgan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/05/22 3:13 PM

Updated:: 2024/06/05 9:36 PM

Resolved:: 2024/06/05 9:36 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide