-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
-
Metrics Collection Optimization
-
False
-
-
False
-
Not Selected
-
Proposed
-
Proposed
-
To Do
-
OSPRH-8213 - OVN observability support for Openstack
-
Proposed
-
Proposed
-
-
-
Networking; Neutron
Requirement 1.1: Resource footprint(compute, storage, network) for monitoring collection must be minimal and the solution design must strive towards minimal or no impact on the workload performance.
This is one of the most important design requirements and will impact the adoption of the observability solution.
Requirement 1.2: Monitoring data aggregation closest to the source of the data should be considered.
Requirement 1.3: Sampling interval for monitoring must be configurable with appropriate boundaries for the sampling interval value. Support for per-flow sampling is a must.
Requirement 1.4: Metric categories for data collection should be configurable to optimize the amount of data that is exported
Requirement 1.5: Data collection and data exporting framework must align with the goals of Open Telemetry Framework (vendor agnostic, data model that is compatible with other vendors that support Open Telemetry, supported data
Requirement 1.6: Data aggregation or processing support at the source (OVS / Compute). Processing may be in the form of min, max, avg for the metrics under consideration. The processors at the source may be activated on demand for troubleshooting scenarios
Requirement 1.7 Flow level metrics are to be collected periodically only for kernel data path. Telco customers are typically the consumers of fast data path and telco deployments typically have their dimensioning for resource allocation with little room for unused CPU or memory. As a result, allocating compute resources for observability is a challenge in such deployments. Flow level metrics collection and correlation requires non-trivial compute resources and such metrics collection is expected to have impact on forwarding performance. However, such flow level metrics may be desired for RCA and troubleshooting e.g. performance degradation due to rogue actors or DDoS, to understand which flows are getting impacted and where. Hence, flow level metrics for fast data path must be supported in on-demand mode and ability to disable periodic flow statistics for fast data path must be supported.