-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Incidents & Support
-
False
-
-
False
-
None
-
Unset
-
None
-
-
Status-board production has been having intermittent failures. This usually occurs around when the RDS for the service is overwhelmed and fall over.
The logs in AWS suggest this:
`The RDS Multi-AZ primary instance is busy and unresponsive.`
App-SRE suggested increasing the size from db.t4g.micro. They mention it's not a suitable size for a production DB.
First we can bump the sizing to db.t4g.small and see if that's enough resources to stop the issue from happening again.