-
Story
-
Resolution: Done
-
Major
-
None
-
1
-
False
-
-
False
-
Unset
-
None
-
-
-
Plat-Ex Services Sprint 25, Plat-Ex Services Sprint 26, Plat-Ex Services Sprint 27, Plat-Ex Services Sprint 28
We have spent the last several days debugging 503 "application not available" errors on console stage / prod. These errors seem to spike whenever a UI deploys to either of these environments.
During our investigation we found that Akamai is caching these 503 error pages served up from openshift. This makes the issue worse/more widespread than it needs to be as some users may see the page if they hit the child node the error is cached on.
During our debugging - we completely disabled caching of 204-503 level errors on stage. We should re-enable caching to avoid possible attacks (though stage is behind our VPN), however lower the max age from 10 minutes to something more reasonable.
AC:
- Unify stage and prod caching to 10 seconds
- Update stage Akamai - re-enable caching but lower max age (for 204-503) 10 seconds
- Update prod Akamai - lower max age (for 204-503) 10 seconds