-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
No
-
None
-
1
-
rhel-kernel-rts-time
-
0
-
False
-
False
-
-
None
-
CK Parent Issues In Progress
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
Description:
Runs on Top of Application Profiles
System runner is designed to be application-agnostic. This enables both the extension of additional test suite stacks into this orchestration layer, and allows custom profiles for end-user application stacks. With this approach, multiple test platforms can be deployed, and users can deploy their application on the same topology stack that was measured and characterized.
rteval-runner will serve as the initial profile for developing system-runner.
Background:
System-runner is being implemented to enable a powerful orchestration layer for modern real-time, isolation, and scalability studies across bare metal, Podman, and Kubernetes/OCP environments. The tool should provide advanced options for CPU/core partitioning, container topology, and dynamic system scaling to support a broad range of experimental workloads.
Objective:
Build system-runner with support for:
- Advanced partitioning and topology control (per-container/per-core assignments)
- “Partitioned” vs. “non-partitioned” workload definitions
- A percent-based system scaling option (“scaling knob”) for dynamic control of system resource allocation
- Application profile based deployment for advanced topology management, both for simulations and real applications.
- Support for measuring and comparing both weak (intra-pod) isolation and strict (inter-pod or standalone) isolation:
-
- The tool should allow users to deploy and benchmark scenarios where multiple containers share a pod (and thus, a cgroup), as well as scenarios where each workload is isolated in its own cgroup (standalone container or one workload per pod).
-
- This enables direct measurement and research on the effects of resource sharing and noisy neighbor phenomena within pods versus strong partitioning and cgroup isolation.
Requirements:
- Partitioning and Topology Options:
- Allow configuration for both:
-
- Single-container runs on a specific CPU range
-
- Multiple containers, each pinned to a defined number of cores
- Support “partitioned” (dedicated CPUs for load/measurement) and “non-partitioned” modes
- Let users specify:
-
- container_run_type: single or all
-
- cpu_range
-
- cores_per_container
-
- partitioning: partitioned or nonpartitioned
-
- use_tuna: true/false (for CPU isolation/affinity)
- Percent-Based System Scaling:
- Implement a config/CLI option (e.g., system_scale_percent) to control what percentage of the host’s total CPUs/cores are used for the experiment.
- All resource allocation (container count, cores per container, affinity) should respect this scaling parameter.
- Example: On a 40-core host, system_scale_percent: 25 uses 10 cores for the workload.
- Automated Run Directory and Iteration Management:
- Each experiment run should auto-generate a directory reflecting the run mode, topology, scaling, and iteration count.
- Store config, logs, and all results for traceability and reproducibility.
- Dependency and Resource Checks:
- On startup, check for required host tools (podman, tuna, etc.) and clearly report or fail if missing.
- Dynamic Orchestration Logic:
- Dynamically determine core sets, container counts, and launch containers or processes with correct CPU and memory assignments based on the config and scaling.
- Partition CPU for measurement and load as specified.
- Support both partitioned and non-partitioned measurement, with or without tuna.
- Extensible Config Schema & CLI:
- Update config schema to include all partitioning and scaling options.
- CLI flags should allow overrides or direct specification at runtime.
- Documentation & Examples:
- Document all new features and provide clear example configs and result directory structures for each major mode.
Acceptance Criteria:
- User can fully specify partitioning, scaling, and core assignments in a config file or via CLI.
- system-runner creates the correct run directories and manages resource allocation per the chosen options.
- All artifacts and configs are stored per-run.
- Required dependencies are checked at runtime with clear messaging.
- At least one complete doc/example set is included.
Notes:
This will enable flexible, dynamic, and reproducible experiment orchestration for real-time and partitioning studies, closing a key gap in the research and testing toolchain. This tool is not intended to replace tuned or the node tuning operator. Its meant to operate along sideit.
- split to
-
RHEL-101817 Improve how configs are passed to containers in rteval-runner, making non-pod container workflows more user-friendly.
-
- Closed
-
-
RHEL-101818 Refactor rteval-runner to Separate Runner Framework from rteval Logic
-
- Closed
-
-
RHEL-102633 Implement System Runner Backends: taskset, podman, kubeplay
-
- Closed
-
-
RHEL-104932 Framework for Run Sequence YAML Generation from Base Template
-
- Closed
-
-
RHEL-104936 Implement Basic Command-Line Interface (CLI) Framework
-
- Closed
-
-
RHEL-107127 [Upstream] Fix Bugs Related to Runner refactor and add Build Upstream
-
- Closed
-
-
RHEL-107612 [Design Work] Refactor Topology Management and Per-CPU Allocation in rteval-runner for Scalable, Profile-Based Simulations
-
- Closed
-