-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Improve stability and observability of Sensor message stream
-
Future Sustainability
-
False
-
-
False
-
Not Selected
-
To Do
-
-
-
We have observed multiple incidents causing catastrophic interruptions to the Sensor/Central message stream. Examples include exceeding the gRPC message size and sending messages with non-UTF-8 characters. In both of these examples, a single "rogue" message was able to take out the entire Sensor without clear indication of what had happened. Exhaustive debugging had to be performed to get to the root causes.
This epic is about making the message stream both more robust and more observable. The idea is to prevent incidents of a similar nature in the future. Possible solutions are still unclear, since we have limited control over the generated gRPC code. Nevertheless, we may investigate some options:
- Dropping of harmful messages or content.
- Enhanced logging when the message stream is terminated.