-
Sub-task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
None
-
Unset
-
-
When the DB in AWS goes down, status-board handles the issue gracefully but the error messages aren't very clear.
Here's a snippet of the errors we see during a DB outage.
I0616 19:09:18.017207 1 logger.go:100] [opid=2ybQjdY9i5WUw4FT1VXu09R6LZ9] {"response_status":500,"elapsed":"29.972646005s"} I0616 19:09:18.018625 1 logger.go:100] [opid=2ybQnKhFT9ZLj0NPcfyZMxGXwuo] {"request_method":"POST","request_url":"/api/status-board/v1/alertmanager-receiver","request_remote_ip":"127.0.0.1:49688"} E0616 19:09:18.042650 1 logger.go:121] [opid=2ybQjbNaqBUy3sOZ1JjtNMu2D2P] OCM-SB-9: Unable to find Service with fullname='OSDv4/rosa-hcp-fleet-wide/ROSAHCPNodepoolUpgradeSuccess': context canceled
An improvement to this would be to catch Postgres related exceptions explicitly so we can change the error message to something more clear for a developer.
This could be something like "DB connection error: context canceled".