-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
NetObserv - Sprint 223, NetObserv - Sprint 224, NetObserv - Sprint 225, NetObserv - Sprint 226, NetObserv - Sprint 227, NetObserv - Sprint 228
Per NETOBSERV-224 on performance and scalability, it mentions 126 nodes (75th percentile) in Hardware and Software Consideration. This was intentionally left out as one of the testbeds for the first release.
This story adds this as a new testbed - details about the testbed being used and tests being run are detailed below (note these are the initial configurations - see below comments and linked document for updates)
- Cluster and Hardware information
- 120 worker nodes x 64 cores/400Gi memory
- 30 PVs with a max size of 3Ti per PV backed by nvme
- Workers are Dell FC640/R640/R650 models
- NetObserv Operator Information
- 0.2.0 release
- Kafka Deployment Model: 30 replicas and 60 Kafka Topic partitions.
- Testing Information
- Traffic Generation
- Will be done with the projects.sh script
- No sampling
- Traffic Generation
- causes
-
NETOBSERV-677 Need to come up with Kafka logs retention/clean up values.
- Closed
-
NETOBSERV-691 High CPU utilization for certain ebpf agent pods (doc)
- Closed
-
NETOBSERV-674 Alert when loki ingestion limits are hit
- Closed
-
NETOBSERV-710 Investigate: Flow processing drops due to loki ingestion issues
- To Do
-
NETOBSERV-713 Loki out-of-order writes
- To Do
-
NETOBSERV-712 Add a prometheus metric to measure number of flows that are not netobserv internal
- Closed
- links to
- mentioned on