Uploaded image for project: 'Red Hat build of Keycloak'
  1. Red Hat build of Keycloak
  2. RHBK-2958

Cluster is not correctly formed with JDBC_PING2 [GHI#38550]

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Before reporting an issue

      [x] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

      Area

      infinispan

      Describe the bug

      When running clustering tests it intermittently (very rarely ~1/200 runs) happens that Keycloak nodes are not correctly clustered resulting in a split-brain which means invalidation messages are not delivered.

      This seems to be an issue in JDBC_PING2, see JGRP-2870 for more details.

      Version

      main

      Regression

      [ ] The issue is a regression

      Expected behavior

      Cluster must be formed with no excuses 🤣

      Actual behavior

      Cluster is sometimes not formed

      How to Reproduce?

      Add a test like this to cluster tests:

      
      

      @Test
      public void testClusterFormed()

      { {code}
          for (int i = 0; i < 100; i++) {
              log.infof("Iteration %d start", i);
              log.info("---------- Killing node");
              failure();
              log.info("---------- Node killed");
              WaitUtils.pause(100);
              log.info("---------- Starting node");
              failback();
              log.info("---------- Node started");
              WaitUtils.pause(100);
      
              for (ContainerInfo node : suiteContext.getAuthServerBackendsInfo()) {
                  KeycloakTestingClient testingClientFor = getTestingClientFor(node);
      
                  testingClientFor.server().run(CheckClusterSize::clusterSize);
              }
              log.infof("Iteration %d end", i);
          }
      }
      
      and run workflow like this:
      

      name: Stability - Clustering

      on:
      workflow_dispatch:

      env:
      MAVEN_ARGS: "-B -nsu -Daether.connector.http.connectionMaxTtl=25"
      SUREFIRE_RERUN_FAILING_COUNT: 0
      SUREFIRE_RETRY: "-Dsurefire.rerunFailingTestsCount=0"

      defaults:
      run:

      shell: bash
      

      jobs:

      build:

      name: Build
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v4
      
        - name: Build Keycloak
          uses: ./.github/actions/build-keycloak
      

      clustering-integration-tests:

      name: Clustering IT
      needs: build
      runs-on: ubuntu-latest
      timeout-minutes: 35
      env:
        MAVEN_OPTS: -Xmx1024m
      strategy:
        matrix:
          group1: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ]
          group2: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ]
        fail-fast: false
      steps:
        - uses: actions/checkout@v4
      
        - id: integration-test-setup
          name: Integration test setup
          uses: ./.github/actions/integration-test-setup
      
        - name: Run cluster tests
          run: |
            ./mvnw test ${{ env.SUREFIRE_RETRY }} -Pauth-server-cluster-quarkus,db-postgres "-Dwebdriver.chrome.driver=$CHROMEWEBDRIVER/chromedriver" -Dsession.cache.owners=2 -Dtest=RoleInvalidationClusterTest#testClusterFormed -pl testsuite/integration-arquillian/tests/base
      
      
      

      Anything else?

      No response

              Unassigned Unassigned
              pvlha Pavel Vlha
              Keycloak SRE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: