Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-26844

Region2 session/token cache expiration via Keystone notifications

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • Region2 session/token cache expiration via Keystone notifications
    • 7
    • False
    • Hide

      None

      Show
      None
    • False
    • RHOSSTRAT-1210Further SKMO imrovements and CI enablement
    • Not Selected
    • ?
    • ?
    • To Do
    • ?
    • rhos-ops-platform-services-security
    • ?
    • DFG Security: Sprint 22
    • 1

      Problem Statement:

      In a multi-region SKMO setup, Keystone runs only in regionOne (central). When a session-invalidating event occurs (user disabled, password changed, role revoked, trust deleted, etc.), the following happens:

      • In regionOne: Keystone's internal callbacks (_drop_token_cache in keystone/token/provider.py) invalidate the TOKENS_REGION cache in that Keystone process. The revocation list in the database is also updated, so subsequent token validations catch revoked tokens.
      • In regionTwo: There is no Keystone process. Each service (Nova, Glance, Cinder, etc.) uses keystonemiddleware to validate tokens. The middleware caches the validation result in memcached for token_cache_time (default 300s). During this window, a revoked/invalidated token can still be used to access services in regionTwo.

      Solution:

      Implement a new Keystone entry point / command upstream in the Keystone project that runs as a notification listener in non-central regions. This service shares Keystone's codebase – reusing TOKENS_REGION, _drop_token_cache, _register_callback_listeners, and oslo.cache configuration. It is essentially a Keystone process that does not serve the API, but listens for notification events and invalidates the local token cache accordingly (the same way a full Keystone process does).

      Analysis: How Keystone handles cache invalidation

      Keystone's token/provider.py registers the following internal callbacks that trigger _drop_token_cache (which calls TOKENS_REGION.invalidate()):

      Action Resource Type Public oslo.messaging notification?
      deleted OS-TRUST:trust Yes (identity.OS-TRUST:trust.deleted)
      deleted user Yes (identity.user.deleted)
      deleted domain Yes (identity.domain.deleted)
      disabled user Yes (identity.user.disabled)
      disabled domain Yes (identity.domain.disabled)
      disabled project Yes (identity.project.disabled)
      internal INVALIDATE_TOKEN_CACHE No (public=False, internal only)

      The INVALIDATE_TOKEN_CACHE internal event is triggered by:

      • Role assignment create/delete (assignment/core.py) - but identity.role_assignment.created/deleted IS sent publicly
      • User password change (api/users.py, identity/core.py) - identity.user.updated IS sent publicly
      • OAuth token authorization (api/os_oauth1.py)
      • Federation IdP delete/disable (federation/core.py) - identity.identity_provider.deleted/disabled IS sent publicly
      • Project deletion (resource/core.py) - identity.project.deleted IS sent publicly

      Conclusion: All relevant events that should trigger token cache invalidation ARE available as public oslo.messaging notifications. The new service can consume them.

      Events to handle

      The new service should listen for the following oslo.messaging event types:

      identity.user.deleted

      identity.user.disabled

      identity.user.updated (covers password changes)

      identity.domain.deleted

      identity.domain.disabled

      identity.project.disabled

      identity.project.deleted

      identity.OS-TRUST:trust.deleted

      identity.role_assignment.created

      identity.role_assignment.deleted

      identity.identity_provider.deleted

      identity.identity_provider.disabled

      Cache Invalidation Approach

      Keystone calls TOKENS_REGION.invalidate() on these events, which invalidates the entire token cache region (all users, not per-user) in that process. The next token validation cache-misses and re-validates from scratch.

      Since the new service shares Keystone's codebase, it uses the same TOKENS_REGION dogpile.cache region backed by regionTwo's memcached via the [cache] config. When it receives relevant notifications, it calls _drop_token_cache -> TOKENS_REGION.invalidate() – exactly the same codepath as a full Keystone.

      Each region has its own memcached server(s), so the invalidation only affects the local region.

      Open question: TOKENS_REGION.invalidate() is process-local by default in dogpile.cache. Need to evaluate whether a distributed RegionInvalidationStrategy (storing the invalidation timestamp in memcached) is needed so that all processes sharing the same memcached see the invalidation, or whether the single-process listener model is sufficient.

      Tasks

      1. Upstream Keystone implementation

      • [ ] Propose a new entry point / command in the Keystone project (e.g., keystone-cache-invalidator or keystone-listener)
      • [ ] Reuse the existing _register_callback_listeners / _drop_token_cache infrastructure
      • [ ] The service should consume oslo.messaging notifications and fire the same internal callbacks
      • [ ] Determine the pool_name strategy (each region's listener should have a unique pool_name so all regions receive all messages)
      • [ ] No database connection needed – the service only needs oslo.messaging (RabbitMQ) and oslo.cache (memcached)

      2. Keystone notification topic configuration

      • [ ] Currently Keystone is configured with topics = barbican_notifications – need to add a topic for session invalidation or reuse notifications (the default oslo.messaging topic)
      • [ ] Ensure notification format (basic vs cadf) includes all required event fields
      • [ ] Verify that the events listed above are actually emitted with the current notification_format setting

      3. Operator integration (keystone-operator)

      • [ ] Add support for deploying the new Keystone listener in non-central regions
      • [ ] Configure the service's transport_url to point to regionOne's RabbitMQ (via Skupper listener or direct connection)
      • [ ] Configure [cache] to point to regionTwo's local memcached
      • [ ] Mount appropriate TLS certificates (memcached certs, RabbitMQ CA)
      • [ ] Add RBAC, ServiceAccount, health checks

      4. Skupper / cross-region connectivity

      • [ ] Ensure the Skupper connector/listener setup from OSPRH-25296 can be reused (or extended) for this service's RabbitMQ connectivity
      • [ ] OR configure direct connection to regionOne RabbitMQ if Skupper is not available

      5. Testing

      • [ ] Test: disable user in regionOne -> verify token rejected in regionTwo immediately (not after 300s TTL)
      • [ ] Test: change user password in regionOne -> verify old token rejected in regionTwo
      • [ ] Test: remove role assignment -> verify token with that role is rejected
      • [ ] Test: delete project -> verify project-scoped tokens are rejected
      • [ ] Test: multiple listeners (regionTwo, regionThree) all receive and process events
      • [ ] Test: listener restart / reconnection behavior

      6. Documentation

      • [ ] Document the architecture and data flow
      • [ ] Document configuration options for the new service
      • [ ] Update multi-region deployment guide

      Dependencies:

      • OSPRH-25296 (cross-region Keystone notification delivery via Skupper)
      • keystone-operator
      • Upstream Keystone spec / review

      Acceptance Criteria:

      • A new Keystone service (sharing the Keystone codebase) runs in non-central regions (regionTwo, regionN)
      • When a token-invalidating event occurs in regionOne (user disabled, password changed, role removed, etc.), the token cache in regionTwo's memcached is invalidated within seconds
      • No stale sessions persist beyond the invalidation event
      • The service is deployed and managed via the keystone-operator

              ggrasza@redhat.com Grzegorz Grasza
              dmendiza Douglas Mendizabal
              rhos-dfg-security
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: