-
Sub-task
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
-
-
PIDONE Board, PIDONE 18.0.4
By default Eventlet prevent multiple greenlets (aka multiple readers) from reading from the same socket. One socket used by many readers (many greenlets) is not allowed by default. This protection is made to avoid race conditions and is made to avoid readers to consume partial or truncated results.
However, Eventlet also comes with a debug convenience who allow to disable this protection, only for debugging purpose.
This debug convenience should never been used in production context, and even less in runtime code.
Unfortunately some people found a way to hijack this debug feature to implement some specific threads hacks:
https://opendev.org/openstack/oslo.log/src/branch/master/oslo_log/pipe_mutex.py#L46-L60
See the full related commit: https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17
The problem is that this behavior is not allowed by the newly implemented Asyncio hub:
To begin with, multiple readers is a concept tightly coupled to greenlets. The Asyncio hub is made to migrate from Eventlet and so from greenlets to native Python code implemented by using the Asyncio module. So such a feature have place in the Asyncio context. Boths concepts are not compatible due to the difference of nature of the technologies behind. Greenlet relies on pseudo threads where Asyncio do not rely on threads at all. Asyncio by its nature is a greenlet killer.
In addition, after several discussion we also concluded that such kind of multiple readers feature is an open door for bugs and race conditions. Using such feature reflect a bad design that should not be encouraged.
One should not expect the implementation of such feature on the Asyncio hub.
For further details about the Eventlet internal discussions, please take a look to:
- https://github.com/eventlet/eventlet/issues/874
- https://github.com/eventlet/eventlet/issues/868
- https://github.com/eventlet/eventlet/issues/432
This hack seems to have been introduced in oslo.log to bypass/manage logging related deadlocks.
For this reason, parts of oslo.log should be redesigned to handle this problem in a different manner.
We have a couple of potential solutions that should be explored to solve this bad design.
The first one could be to use duplication of file descriptors (duplication of sockets), rather than keeping one file descriptor (one socket) per many threads. One thread would own one duplicated file.
The problem with the duplication approach, is that at some point we could imagine ending up with a lot of open files and potentially leaks...
The second option would be to use something like a log feeder. One or many threads, each owning its own socket, would receive data to log and they would populate a queue to pass data to the writer... or something like that.
A third option, if not yet implemented in oslo.log would be to implement the chain of responsibility pattern to handle coming data.
See the following links for further details:
- https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern
- https://www.man7.org/linux/man-pages/man2/dup.2.html
- https://docs.python.org/3/library/os.html#os.dup
- https://docs.python.org/3/library/socket.html#socket.socket.dup
In all the case, the usage of this debug convenience to bypass protections is the translation of a poor design in oslo.log and in Openstack in general. Indeed this hack is also present in swift and in various Openstack deliverables. The root cause of this problem is design problem that should be solved by providing an elegant design solution.
The person in charge of this ticket should propose a solution based on a design refactor. The person in charge of this ticket can inspect the tracks proposed above and then propose the best approach. A solution could be at the blueprint/specs format https://specs.openstack.org/openstack/oslo-specs/ . Following this process would allow to discuss about the proposal with other oslo core team members.
Some Openstack related discussions have been started on IRC on the oftc channel ##eventlet-to-asyncio . The person in charge of this ticket is invited to join this channel.
The community goal related to the Eventlet migration has been recently updated to reflect that problem and to raise the attention about this blocking point.
Again, this is a blocking topic. We wouldn't be able to start the migration on the Openstack side, until we remove such a blocking point. Indeed, we wouldn't be able to switch devstack over the new Asyncio hub of Eventlet. We wouldn't be able to start the performance evaluation on preprod infra until we are not able to switch to the Asyncio hub.
https://review.opendev.org/c/openstack/devstack/+/914108
See the patch set 18 of the migration goal:
https://review.opendev.org/c/openstack/governance/+/902585/17..18