Currently ACS has a gap in metrics regarding runtime data, particularly about
process information. We need to close this gap:
- To give users more confidence and understanding about the runtime data.
- To support the "simulation" mode, where the conclusions about the resource
consumption during the experiment will be made based on the collected
metrics.
Add following metrics per cluster (since we know there is a huge variance
between clusters):
- Current number of active processes.
- Amount of processes deleted by pruning.
- Amount of processes deleted for other reasons (e.g after pod deletion or for
indicator filter).
- Histogram of command line argument size.
- Histogram of lineage size.
For all the metrics verify the expected "working range" on the long running
cluster with load.