Uploaded image for project: 'Red Hat build of Keycloak'
  1. Red Hat build of Keycloak
  2. RHBK-3736

Race condition in authorization service leads to NullPointerException when evaluating permissions during concurrent resource deletion [GHI#42907]

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Before reporting an issue

      [x] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

      Area

      authorization-services

      Describe the bug

      When running Keycloak in a clustered environment with multiple pods, concurrent deletion of an Authorization Resource on one pod would trigger a NullPointerException in another pod that is simultaneously evaluating permissions for the same resource.

      It seems like this is caused by a race condition between resource deletion, cache invalidation, and permission evaluation

      Impact: Users experience intermittent login failures (500 errors) when resource deletion overlaps with client login/permission evaluation. This issue is especially visible in multi-pod deployments with concurrent resource management operations.

      Version

      26.0.7

      Regression

      [ ] The issue is a regression

      Expected behavior

      Permission evaluation should handle concurrent deletion gracefully. If a resource no longer exists, Keycloak should either: 1) Skip evaluation of that resource, or 2) Fail with a controlled error (not an uncaught NPE).

      Actual behavior

      The pod evaluating permissions throws a NullPointerException:

      Unexpected error while evaluating permissions: java.lang.RuntimeException: Failed to evaluate permissions
      Caused by: java.lang.NullPointerException: Cannot invoke "org.keycloak.authorization.model.Resource.getScopes()" because "resource" is null
      

      This corresponds to a 500 Internal Server Error response during login.

      How to Reproduce?

      Steps to Reproduce

      1. Run Keycloak with at least 2 pods behind a load balancer, sharing the same database.
      2. Create a client with fine-grained Authorization enabled and some resources with policies.
      3. Trigger a login flow for the client (which causes AuthorizationTokenService to evaluate permissions).
      4. At the same time, issue a DELETE request for one of the Authorization Resources in that client.
      5. Example: DELETE /

      {realm}

      /clients/

      {client-uuid}

      /authz/resource-server/resource/

      {resource-uuid}

      6. Observe the logs on the pod handling the login flow.

      Anything else?

      Keycloak version: 26.0.7

      Deployment: Kubernetes, multiple pods (stateless)

              Unassigned Unassigned
              pvlha Pavel Vlha
              Keycloak Core IAM
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: