Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-73848

Downstream Quarantine and Dequarantine Management

XMLWordPrintable

    • downstream-quarantine-management
    • Quality / Stability / Reliability
    • 77
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      The quarantine management system must automatically identify and quarantine downstream tests (both Tier1 and Tier2) based on configurable failure thresholds from scheduled job results, storing quarantine records with full metadata (test name, timestamp, reason, owner, tier, linked Jira ticket) in the database with complete audit trails. The system must generate and send weekly automated reports containing quarantine counts by tier, newly quarantined tests, overdue cases (>30 days), and ownership status to designated team channels, while also notifying individual owners of their assignments. It must automatically monitor quarantined tests for stability improvements and recommend de-quarantine when criteria are met (configurable consecutive passes or pass rate thresholds), with both automated and manual de-quarantine workflows requiring justification and logging. CI/CD pipelines must integrate with the system to query quarantine status and skip quarantined tests without blocking runs, adding <5 seconds overhead and including fail-safe fallback behavior. A dashboard must display current quarantine status with filtering, historical trends (90+ days retention), and metrics including average time in quarantine, de-quarantine success rate, and most frequently quarantined tests, all accessible via API endpoints. The system must include comprehensive documentation for users and technical operations, achieve >80% unit test coverage with end-to-end integration tests, implement proper access control for manual interventions, and demonstrate >90% appropriate quarantine action rate after 2 weeks in production.

      Show
      The quarantine management system must automatically identify and quarantine downstream tests (both Tier1 and Tier2) based on configurable failure thresholds from scheduled job results, storing quarantine records with full metadata (test name, timestamp, reason, owner, tier, linked Jira ticket) in the database with complete audit trails. The system must generate and send weekly automated reports containing quarantine counts by tier, newly quarantined tests, overdue cases (>30 days), and ownership status to designated team channels, while also notifying individual owners of their assignments. It must automatically monitor quarantined tests for stability improvements and recommend de-quarantine when criteria are met (configurable consecutive passes or pass rate thresholds), with both automated and manual de-quarantine workflows requiring justification and logging. CI/CD pipelines must integrate with the system to query quarantine status and skip quarantined tests without blocking runs, adding <5 seconds overhead and including fail-safe fallback behavior. A dashboard must display current quarantine status with filtering, historical trends (90+ days retention), and metrics including average time in quarantine, de-quarantine success rate, and most frequently quarantined tests, all accessible via API endpoints. The system must include comprehensive documentation for users and technical operations, achieve >80% unit test coverage with end-to-end integration tests, implement proper access control for manual interventions, and demonstrate >90% appropriate quarantine action rate after 2 weeks in production.
    • None
    • To Do
    • qe-ack, doc-ack, px-ack
    • dev-ready, doc-ready, po-ready, prodsec-ready, px-ready, qe-ready, ux-ready

      Goal

      Implement a comprehensive quarantine and de-quarantine management system for downstream environments for both Tier1 and Tier2, establishing clear rules and an automated isolation mechanism. It will utilize scheduled job test data to drive quarantine decisions.

      User Stories

      • Define downstream-specific quarantine and de-quarantine rules aligned with upstream principles, adapted to use scheduled job test results.
      • Develop an automated mechanism to identify tests meeting quarantine criteria based on scheduled job data.
      • Establish procedures for placing tests into quarantine, assigning ownership, and managing their lifecycle in the downstream context.
      • Count the quarantined cases, and weekly report sent for quarantined cases to the reminder team. 
      • Create processes for removing tests from quarantine once stability criteria are met, using scheduled job data for validation.
      • Ensure integration with downstream CI/CD pipelines to enforce quarantine status and maintain test suite stability. 

      Non-Requirements

      • List of things not included in this epic, to alleviate any doubt raised during the grooming process.

      Notes

      • Any additional details or decisions made/needed

              dkeler@redhat.com Daniel Keler
              ycui@redhat.com Ying Cui
              Daniel Hiller, Ruth Netser
              Ruth Netser Ruth Netser
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: