Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-8846 RHOSAK Product Quality Requirements (Planning, Pipelines, Monitoring)
  3. MGDSTRM-8848

Planning: Resiliency and fault tests to ensure that the service is resilient to cloud provider, OpenShift, and application errors

XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • None
    • False

      Best Practices and Examples

      Fault tests for a service should create fault situations at the Cloud Provider, OpenShift, and service level to ensure that the service under test is resilient to these conditions. In addition, the tests should verify that the service generates appropriate alerts and metrics to enable SRE to respond to fault situations. 

      For example documentation see: 

      Epic Brief - Red Hat Managed Streams - Data Plane Fault Testing 

      For example code see:

      The RHOSAK Managed Service fault tests produce faults and outages on a provisioned OSD focusing on Kafka components. Tests will wait for the system to recover and check for alerts. Krkn tool is used to recreate some Fault scenarios: https://github.com/chaos-kubox/krkn 

      Repo: https://gitlab.cee.redhat.com/mk-ci-cd/kas-fault-tests 

      Pipelines: https://main-jenkins-csb-mas.apps.ocp-c1.prod.psi.redhat.com/job/faults/ 

      Test Framework: bats, a bash scripting framework (https://github.com/bats-core/bats-core)

      Steps to Follow to Contribute a New Automated Test:
      https://gitlab.cee.redhat.com/mk-ci-cd/kas-fault-tests/-/blob/main/README.md 
      Results:

      Test results are available here: https://reportportal-cloud-services.apps.ocp-c1.prod.psi.redhat.com/ui/#rhosak/dashboard/69 

      Where to Get Help

      QE works closely with the Chaos Engineering team (Naga Ravi Chaitanya Elluri, Lead) in the development of fault tests for managed services especially in the use of the Kraken fault test tool (https://github.com/cloud-bulldozer/kraken) and based on their defined best practices.

      Chaos Engineering Best Practices
      https://github.com/chaos-kubox/krkn/blob/main/docs/index.md 

      Chaos Engineering Team
      https://docs.google.com/document/d/1AnmEVBR48rKJSR1s2c-hOPgnLrWCkhggNNlwO6Trfkk/edit?usp=sharing 

       

              agullon Alejandro Gullón
              ldimaggi@redhat.com Len DiMaggio
              MK - Super Awesome BU
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: