Uploaded image for project: 'Red Hat Internal Developer Platform'
  1. Red Hat Internal Developer Platform
  2. RHIDP-4734

When scaling the deployment to 3 pods + redis cache, RBAC roles are not synced

Prepare for Y ReleasePrepare for Z ReleaseRemove QuarterXMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 1.4
    • 1.4
    • RBAC Plugin
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • RHIDP-3055 - Support High Availability
    • Hide
      Fix RBAC API consistency issue when scaling deployment replicas above 1 pod.

      Before this update, when scaling the deployment to above 1 pod, RBAC roles were not synced. Only the pod that creates the resource would be able to serve it afterwards.

      With this update, RBAC roles are now properly synced across all pods, with Redis cache and traffic routing configured to ensure consistency across the deployment.
      Show
      Fix RBAC API consistency issue when scaling deployment replicas above 1 pod. Before this update, when scaling the deployment to above 1 pod, RBAC roles were not synced. Only the pod that creates the resource would be able to serve it afterwards. With this update, RBAC roles are now properly synced across all pods, with Redis cache and traffic routing configured to ensure consistency across the deployment.
    • Bug Fix
    • Done
    • RHDH Plugins 3265, RHDH Plugins 3266
    • Critical

      Description of problem:

      When scaling the deployment to 3 pods + redis cache, RBAC roles are not synced: only the pod that is actually creating the resource will serve it afterwards. This happens when creating the role with both UI and Rest API. ** 
      Issue found as part of manual testing the HA scenario for 1.4.

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce
      While testing the HA scenario with these steps: ** 

      1. Installed an RHDH instance with RBAC enabled, Keycloak authentication, users ingestion, a few catalog resources registered and Redis Cache enabled.
        Scaled the deployments to 3 pods.
        Changed the Openshift route traffic policy to round-robin (the scenario needed to be forced, ensuring that all pods would receive incoming requests).
        Added a static token to the app-config.
      2. Sent batches of 10, 30, 50, 100 REST API calls to add a new location with a random GUID as User-Agent header
        Verified the pods logs to ensure all the pods were responding to those requests.
        Observed behaviour was: pods replied with a 201 Created status code only once, 500 and 409 Conflict for all the other requests; meaning the backstage is correctly handling the conflicts.
      3. Performed a batch of 50 requests to get the Location just created, all pods responded with the correct resource; meaning all pods are acting in sync.
      4. Edited the Opensihft service to serve traffic from one pod at a time (by changing the label selectors): created a new location again, UI is showing consistent results across all pods.
      5. Performed step 1-2-3-4 again, but creating a RBAC Role instead. The results were not consistent and the pods were not aligned.
        When creating the Role via UI, only the pod that actually created the resource was serving it after. The other pods didn’t show the new role.
        When creating the role via REST API, only the first pod completing the request would return it afterwards; the other pods would return a 404 error.

      Example of the role created:

      {
          "memberReferences": [
              "user:default/guest"
          ],
          "name": "role:default/somerole",
          "metadata": {
              "description": ""
          }
      }

      Same role also created with UI:


      Role is always created successfully. Only the pod that completed the creation request (either via UI or API) will serve the role back afterwards.
      This was verified by targeting all the pods one by one (editing the Openshift service selector) and sending a batch of API requests, which returned differently based on the pod that was hit. Logs of each pods were showing 200/404 errors aligned with the UI/API behaviour. 

       

      Actual results:

      Roles are not served consistently across all pods.

      Expected results:

      Roles are served consistently across all pods.

      Reproducibility (Always/Intermittent/Only Once):

      Always. Tested with 1.3.1 ** 

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

              oandriie Aleksander Andriienko
              rh-ee-abarbaro Alessandro Barbarossa
              RHIDP - Plugins
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: