Uploaded image for project: 'RH Developer Hub Planning'
  1. RH Developer Hub Planning
  2. RHDHPLAN-840

Improve RHDH and Backstage performance

Create Doc EPIC from R...Prepare for Z ReleasePrepare Test Plan (Y R...XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • RHDHPLAN-839Improve RHDH and Backstage performance
    • 100% To Do, 0% In Progress, 0% Done
    • XL

      Feature Overview (aka. Goal Summary)

      Large enterprise customers are experiencing significant performance degradation in Red Hat Developer Hub (RHDH) as their Software Catalog scales. These issues manifest as unacceptable 10-second page load times, frequent pod restarts (OOMKilled), and database crashes. 

      This feature focuses on defining a rigorous performance testing baseline and implementing critical architectural and database improvements to reduce latency and improve system stability for environments supporting up to 30,000 developers and 70,000 catalog applications.

      Goals (aka. expected user outcomes)

      • Reduced Latency: Large customers should see page load times decrease from 10 seconds to a second or less.
      • System Stability: Elimination of frequent pod restarts and database crashes caused by resource exhaustion at scale.
      • Enterprise Scalability: RHDH must reliably support environments with 30,000+ developers, 500,000+ groups, and 70,000 catalog applications.
      • Orchestrator Scalability: The Orchestrator instance should be able to run 50,000 daily workflows on the RHDH instance.
      • Optimized Resource Utilization: Provide clear, validated documentation on CPU, memory, database sizing and cache requirements for high-load scenarios.

      Requirements (aka. Acceptance Criteria):

      • As a Platform Engineer, I want a validated performance testing baseline (30k developers, 500k groups, 70k apps, 300 concurrent users) so that I can reliably reproduce and measure performance improvements.
      • As a Developer, I want the page to load in under a second, even with a large catalog, so that I can quickly access my developer tools.
      • As an Platform Engineer, I want the RHDH database schema to be optimized with proper indexing and caching so that the database remains stable under high concurrent load.
      • As a Platform Engineer, I want architectural flexibility to deploy RHDH in active-active configurations with multiple replicas so that the system is redundant and performant.
      • As a Security Lead, I want all architectural changes (e.g., increased pods or clusters) to be validated for security implications.

      Customer Considerations (Optional)

      • Large Environment Replication: Some customers currently uses ~20 custom plugins and a high number of dynamic plugins, which significantly impact restart times and memory usage. Reproduction efforts must account for this complex plugin overhead.
      • Networking Latency: Some customers use on-premise application instances with external cloud databases; network latency between these layers must be considered during troubleshooting.
      • Feature Disabling: Customers have had to disable features like "adoption" and "orchestrator" to maintain stability; the goal is to re-enable these without crashing the system.

      Documentation Considerations

      • Sizing Guidelines: Update documentation to reflect higher CPU, memory, and storage requirements for enterprise-scale deployments.
      • Architecture Reference: Provide validated reference architectures for multi-replica and active-active setups.
      • Plugin Management: Include best practices for managing large numbers of dynamic and custom plugins to minimize their performance impact.

              rh-ee-abarbaro Alessandro Barbarossa
              jfargett@redhat.com Christophe Fargette
              RHDH Security
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: