Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Operator
Labels:
None

Work Type:
BU Product Work
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
easier-config
Feature Link:
OCPSTRAT-156 - Netobserv operator: Make configuration simpler
Release Note Text:

Hide
The FlowCollector API is modified as such:
- the setting "processor.metrics.ignoreTags" is deprecated and will be removed in FlowCollector v1beta2
- it is replaced with a new setting "processor.metrics.includeList", which uses the opposite approach: instead of an exclusion list, it is now an inclusion list.

This change will allow smoother transitions in future releases, when new metrics are added, to make sure they will not cause cluster monitoring instability with too many metrics being generated unintentionally.

This change also moves away from the metrics tagging system: instead of relying on tags to include/exclude metrics, which could end up being quite complex, the desired metric names need to be provided directly. The list of available metrics is documented.

If "ignoreTags" is explicitly set in your FlowCollector configuration, it is recommended to remove it and define "includeList" instead, or to move back to using the default values. By not doing so, new metrics might be generated on upgrades and you should make sure they don't cause too much memory consumption increase on Prometheus.

If "ignoreTags" isn't explicitly set and you don't set "includeList", the Operator will keep using the default metrics, which have a more modest impact on Prometheus.

Show
The FlowCollector API is modified as such: - the setting "processor.metrics.ignoreTags" is deprecated and will be removed in FlowCollector v1beta2 - it is replaced with a new setting "processor.metrics.includeList", which uses the opposite approach: instead of an exclusion list, it is now an inclusion list. This change will allow smoother transitions in future releases, when new metrics are added, to make sure they will not cause cluster monitoring instability with too many metrics being generated unintentionally. This change also moves away from the metrics tagging system: instead of relying on tags to include/exclude metrics, which could end up being quite complex, the desired metric names need to be provided directly. The list of available metrics is documented. If "ignoreTags" is explicitly set in your FlowCollector configuration, it is recommended to remove it and define "includeList" instead, or to move back to using the default values. By not doing so, new metrics might be generated on upgrades and you should make sure they don't cause too much memory consumption increase on Prometheus. If "ignoreTags" isn't explicitly set and you don't set "includeList", the Operator will keep using the default metrics, which have a more modest impact on Prometheus.
Intelligence Requested:
Market:

Sprint:
NetObserv - Sprint 242, NetObserv - Sprint 243, NetObserv - Sprint 244

Target Version:

netobserv-1.5

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Currently, metrics configuration uses a black-listing approach with a tags system. Since enabling more and more metrics increases cluster resource usage, it would be better to switch to a white-listing approach, where user only select what they need.

This is also safer during upgrades, when users already have this setting configured explictly, since in that case the new default won't apply and, with black-listing, new metrics could be automatically enabled without the user noticing.

On top of that, it's confusing to have overlap between tags.

We should think about more explicit tags (~~including 'all' mention like 'all_namespaces', or~~ forcing fully qualified names like 'ingress_namespaces_packets')

NOTE FOR QE

You can read the release note text for the user facing changes. One special thing to test will be the upgrade scenario, especially after we add new metrics (such as RTT, drops... e.g. https://github.com/netobserv/network-observability-operator/pull/408) => we need to make sure there isn't any unintended metric generated beyond the defaults. This is kind of a chicken-egg problem as these PRs are bocked by this one, so this particluar thing will have to be tested after both are merged.

blocks

NETOBSERV-1286 Metrics and dashboard enhancements for Lokiless usage