Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
[GA] Ease of alerting and network health display for Network Observability
Story Points:
5

Target Version:
None
Release Blocker:
None
Sprint:
NetObserv - Sprint 276, NetObserv - Sprint 277, NetObserv - Sprint 282, NetObserv - Sprint 283

Run various tests using QE setup (perf-scale NDH/CD, reliability cluster...) and collect alerting data in order to refine the default thresholds setup, the promQL, etc.

Areas to improve:

~~Impact of sampling: e.g. drops or DNS alerts can be impacted by sampling, see also slack thread here: https://redhat-internal.slack.com/archives/C02939DP5L5/p1756911383230359~~
- BUG created at: https://issues.redhat.com/browse/NETOBSERV-2613
There are DNS errors that happen frequently but aren't a real problem (at most, a performance issue): with k8s resolution on domains like `myservice.svc` will be search first as is, then as `myservice.svc.cluster.local`, etc., which triggers DNS domain not found regularly. Not sure how to tackle that. See also: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ , https://medium.com/@GiteshWadhwa/optimizing-dns-resolution-in-kubernetes-best-practices-for-coredns-performance-e3f6ed041bbb
- ~~Maybe create 2 different alert templates: one for NXDomain with "info" severity only and a message telling how to optimize; and another for all other codes~~
- DONE
I think score can be improved, by making some changes in how severity impacts the score: for instance, we could say that critical alerts have a range in [0, 6], warning [4, 8] and info [6, 10] (as an example)
- DONE by https://github.com/netobserv/network-observability-console-plugin/pull/1204

links to

netobserv/network-observability-console-plugin#1204: NETOBSERV-2330 NETOBSERV-2436: Update scoring and explain scoring

Assignee:: Leandro Beretta

Reporter:: Joel Takvorian

QA Contact:: Oliver Smakal

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/07/16 6:53 AM

Updated:: 2026/02/06 10:50 AM

Resolved:: 2026/02/06 10:50 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates