-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.13.z, 4.12.z, 4.14.z, 4.15.z, 4.16.0
-
Important
-
None
-
Rejected
-
False
-
-
Release Note Not Required
-
In Progress
-
-
Description of problem:
NROP regression weekly runs sometimes show several tests failing on TAE, specifically the use case that TAS is meant to solve.
Version-Release number of selected component (if applicable):
can happen in any version
How reproducible:
Intermittently. Still unclear if this is automation issue or product, thus it needs more investigation.
Steps to Reproduce:
1.install the oprator and create CRs for NROP and the scheduler 2.run the regressions; some tests are likely to hit the error than others like: podburst tests (https://github.com/openshift-kni/numaresources-operator/blob/main/test/e2e/serial/tests/scheduler_cache.go#L119)
Actual results:
hitting TAE
Expected results:
should avoid TAE and keep pod pending if not enough resources on any of the single numas.
Additional info:
test run example: https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-nrop-tests/825/ must-gather is attached here: https://drive.google.com/file/d/1Xg_rBZOQ-p_ozkLM9mG22Yic8AL1ivn9/view?usp=sharing