-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
5
-
False
-
-
False
-
?
-
rhos-ops-platform-services-security
-
None
-
-
-
-
Moderate
With tls-e configured, if a controller is rebooted, no apis/horizon/etc work properly. The issue appears to come from the dogpile.cache.pymemcache module that is used when MemachedTLS is enabled. If TLS is not used, it defaults to the dogpile.cache.memcached module and that seems to operate as expected.
2025-09-24 12:00:49.082 32 ERROR keystone.server.flask.request_processing.middleware.auth_context super().connect(addr)
2025-09-24 12:00:49.082 32 ERROR keystone.server.flask.request_processing.middleware.auth_context ConnectionRefusedError: [Errno 111] Connection refused
2025-09-24 12:00:49.082 32 ERROR keystone.server.flask.request_processing.middleware.auth_context
Latest findings: https://docs.google.com/document/d/1nRnrgpoy2DShUYC-8XlUjbHlgQpYOW2nRI0SXbkCQ9U/edit?tab=t.0
Summary: The issue doesn't manifest itself on OSP18, because it uses oslo_cache.memcache_pool and bmemcache. OSP17 uses pymemcached directly, so it's the only version that is currently affected.
Side note: OSP18 architecture is different, by using OpenShift, the memcached servers will respawn in a shorter amount of time. If one of the memcached servers will be permanently downed, theoretically all services that depend on it should be reconfigured and restarted.