[RHIDP-4475] [QE] Test the High Availability scenario

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: 1.5.0
Affects Version/s: None
Component/s: Performance, Quality
Labels:
- demo

Story Points:
3
Epic Link:
RHIDP-1932
Blocked:
False
Blocked Reason:

Hide

We need to pick up BS 1.34:
https://issues.redhat.com/browse/RHIDP-5186
https://issues.redhat.com/browse/RHIDP-5004

Show
We need to pick up BS 1.34: https://issues.redhat.com/browse/RHIDP-5186 https://issues.redhat.com/browse/RHIDP-5004
Ready:
False
Intelligence Requested:
Market:

Original story points:
3
Sprint:
RHDH Security 3264, RHDH Security 3265, RHDH Security 3266, RHDH Security 3267/3268, RHDH Security 3270

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In order to test the High Availability scenario, we need to understand how to setup the infrastructure and what tests cases will be required to cover it.

is blocked by

RHIDP-4734 Resolved RBAC API inconsistency when scaling deployments to more than one pod

Closed

is cloned by

RHIDP-6026 [QE] Set up the AKS environment

Closed

Alessandro Barbarossa added a comment - 2025/03/03 5:24 PM - edited

Reproduced the same tests in 1.5. All issues are solved. Setup is returning conistent results.

Alessandro Barbarossa added a comment - 2025/03/03 5:24 PM - edited Reproduced the same tests in 1.5. All issues are solved. Setup is returning conistent results.

Kim Tsao added a comment - 2024/12/08 11:39 PM

Moving this to TODO until we can pick up BS 1.34.0

Kim Tsao added a comment - 2024/12/08 11:39 PM Moving this to TODO until we can pick up BS 1.34.0

Alessandro Barbarossa added a comment - 2024/10/28 3:36 PM - edited

How was this tested?

Installed an RHDH instance with RBAC enabled, Keycloak authentication, users ingestion, a few catalog resources registered and Redis Cache enabled.
Scaled the deployments to 3 pods.
Changed the Openshift route traffic policy to round-robin (the scenario needed to be forced, ensuring that all pods would receive incoming requests).
Added a static token to the app-config to perform API call in an automated way.
Sent batches of 10, 30, 50, 100 REST API calls to add a new location with a random GUID as User-Agent header.
Verified the pods logs to ensure all the pods were responding to those requests.
Observed behaviour was: pods replied with a '201 Created' status code only once, 500 and 409 Conflict for all the other requests; meaning the backstage backend is correctly handling the conflicts.
Performed a batch of 50 requests to get the Location just created, all pods responded with the correct resource; meaning all pods are acting in sync.
Edited the Opensihft service to serve traffic from one pod at a time (by changing the label selectors): created a new location again, UI is showing consistent results across all pods.
Performed step 1-2-3-4 again, but creating a RBAC Role instead. The results were not consistent and the pods were not aligned.
When creating the Role via UI, only the pod that actually created the resource was serving it after. The other pods didn’t show the new role.
When creating the role via REST API, only the pod that actually created the role would return it afterwards; the other pods would return a 404 error.

The RBAC sync behaviour should have been fixed with this PR (https://github.com/janus-idp/backstage-plugins/pull/1757), but seems like it’s not working with RHDH 1.3.1. Opened a bug to track it: https://issues.redhat.com/browse/RHIDP-4734
As already commented by Kim, DB failover is out of scope for automated testing since it may be difficult to achieve and performance testing is already covered.
Cache tests have already covered by https://github.com/janus-idp/backstage-showcase/pull/1480

Alessandro Barbarossa added a comment - 2024/10/28 3:36 PM - edited How was this tested? Installed an RHDH instance with RBAC enabled, Keycloak authentication, users ingestion, a few catalog resources registered and Redis Cache enabled. Scaled the deployments to 3 pods. Changed the Openshift route traffic policy to round-robin (the scenario needed to be forced, ensuring that all pods would receive incoming requests). Added a static token to the app-config to perform API call in an automated way. Sent batches of 10, 30, 50, 100 REST API calls to add a new location with a random GUID as User-Agent header. Verified the pods logs to ensure all the pods were responding to those requests. Observed behaviour was: pods replied with a '201 Created' status code only once, 500 and 409 Conflict for all the other requests; meaning the backstage backend is correctly handling the conflicts. Performed a batch of 50 requests to get the Location just created, all pods responded with the correct resource; meaning all pods are acting in sync. Edited the Opensihft service to serve traffic from one pod at a time (by changing the label selectors): created a new location again, UI is showing consistent results across all pods. Performed step 1-2-3-4 again, but creating a RBAC Role instead. The results were not consistent and the pods were not aligned. When creating the Role via UI, only the pod that actually created the resource was serving it after. The other pods didn’t show the new role. When creating the role via REST API, only the pod that actually created the role would return it afterwards; the other pods would return a 404 error. The RBAC sync behaviour should have been fixed with this PR ( https://github.com/janus-idp/backstage-plugins/pull/1757 ), but seems like it’s not working with RHDH 1.3.1. Opened a bug to track it: https://issues.redhat.com/browse/RHIDP-4734 As already commented by Kim, DB failover is out of scope for automated testing since it may be difficult to achieve and performance testing is already covered. Cache tests have already covered by https://github.com/janus-idp/backstage-showcase/pull/1480

Kim Tsao added a comment - 2024/10/24 8:07 PM - edited

As discussed in our office hours today, DB failover is out of scope for automated testing since it may be difficult to achieve. We can consider it for manual testing in the future but we'll consider it a lower priority for now

As far as perf testing is concerned, they should have been covered as part of https://issues.redhat.com/browse/RHIDP-641

Kim Tsao added a comment - 2024/10/24 8:07 PM - edited As discussed in our office hours today, DB failover is out of scope for automated testing since it may be difficult to achieve. We can consider it for manual testing in the future but we'll consider it a lower priority for now As far as perf testing is concerned, they should have been covered as part of https://issues.redhat.com/browse/RHIDP-641

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Alessandro Barbarossa added a comment - 2025/03/03 5:24 PM, Edited by Alessandro Barbarossa - 2025/03/03 5:24 PM

Expand comment: Alessandro Barbarossa added a comment - 2025/03/03 5:24 PM, Edited by Alessandro Barbarossa - 2025/03/03 5:24 PM

Collapse comment: Kim Tsao added a comment - 2024/12/08 11:39 PM

Expand comment: Kim Tsao added a comment - 2024/12/08 11:39 PM

Collapse comment: Alessandro Barbarossa added a comment - 2024/10/28 3:36 PM, Edited by Alessandro Barbarossa - 2024/10/28 3:56 PM

Expand comment: Alessandro Barbarossa added a comment - 2024/10/28 3:36 PM, Edited by Alessandro Barbarossa - 2024/10/28 3:56 PM

Collapse comment: Kim Tsao added a comment - 2024/10/24 8:07 PM, Edited by Kim Tsao - 2024/10/24 8:07 PM

Expand comment: Kim Tsao added a comment - 2024/10/24 8:07 PM, Edited by Kim Tsao - 2024/10/24 8:07 PM

People

Dates