-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
Fixed test seeding was reverted due to believed amplification of test failures like [suite:k8s] tests that had flakes removed.
Due to the timing nearing branch cut / approaching GA we backed the change out to measure the impact on failures and also regroup on the approach.
Slack thread discussing approach to bring the changes back in
- Build tests often show up in the high cpu list. We can split them out into their own group
- High CPU test could receive a new annotation to be separated out as well. Consider a query like
topk(10, sum by (namespace) ( rate(container_cpu_usage_seconds_total{container!="",pod!="",namespace=~"^e2e-.*"}[5m]) ) )
to identify the top cpu consuming namespaces. From there you can review the OTE extension_test_result artifact and search for the namespace to identify the test
The goal is to see if we can identify tests that were 'disruptive' when the seeding was fixed and isolate during the existing job execution.
Using autodl we need to record what the seed value is so that we can go back later and analyze between different seeds.
This work should target 4.21 and not impact 4.20 stability. We want to capture the seed value so that in the future we can introduce other seeds to compare but supporting only a single fixed seed is still the goal for this story.
- links to