-
Sub-task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
None
-
False
Best Practices and Examples
Fault tests for a service should create fault situations at the Cloud Provider, OpenShift, and service level to ensure that the service under test is resilient to these conditions. In addition, the tests should verify that the service generates appropriate alerts and metrics to enable SRE to respond to fault situations.
For example documentation see:
Epic Brief - Red Hat Managed Streams - Data Plane Fault Testing
For example code see:
The RHOSAK Managed Service fault tests produce faults and outages on a provisioned OSD focusing on Kafka components. Tests will wait for the system to recover and check for alerts. Krkn tool is used to recreate some Fault scenarios: https://github.com/chaos-kubox/krkn
Repo: https://gitlab.cee.redhat.com/mk-ci-cd/kas-fault-tests
Pipelines: https://main-jenkins-csb-mas.apps.ocp-c1.prod.psi.redhat.com/job/faults/
Test Framework: bats, a bash scripting framework (https://github.com/bats-core/bats-core)
Steps to Follow to Contribute a New Automated Test:
https://gitlab.cee.redhat.com/mk-ci-cd/kas-fault-tests/-/blob/main/README.md
Results:
Test results are available here: https://reportportal-cloud-services.apps.ocp-c1.prod.psi.redhat.com/ui/#rhosak/dashboard/69
Where to Get Help
QE works closely with the Chaos Engineering team (Naga Ravi Chaitanya Elluri, Lead) in the development of fault tests for managed services especially in the use of the Kraken fault test tool (https://github.com/cloud-bulldozer/kraken) and based on their defined best practices.
Chaos Engineering Best Practices
https://github.com/chaos-kubox/krkn/blob/main/docs/index.md
Chaos Engineering Team
https://docs.google.com/document/d/1AnmEVBR48rKJSR1s2c-hOPgnLrWCkhggNNlwO6Trfkk/edit?usp=sharing