-
Story
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
5
-
OpenShift SPLAT - Sprint 268
Story:
- As a OpenShift engineer I would like to create a mechanism to prevent e2e tests failing when randomly trying to schedule in the OPCT dedicated node, preventing false-positive in the final conformance results so that we can improve the visibility and decrease the maintenance friction in the OPCT architecture to run conformance workflow in-cluster
- As Partner I would like to get more visibility of tests failing exclusively for my test environment, so that I can self-review, and isolate failures easily.
- As Cert-Ops Engineer I would like to review a partner result quickly with non-OPCT environmental failures causing false-positives, so that we can provide a feedback quickly to the partner
Goal:
- Create a Controller to watch pods failed to schedule caused by OPCT dedicated node (a worker node with NoSchedule taints), mutate those adding toleration allowing to schedule in the dedicated node
Why:
- Dedicated node has been created to prevent disruptions in the test environment by the scheduler forcing to evict the test services (Sonobuoy aggregator, workflow jobs, etc).
Disclaimer: We can't deliver, at least in the current architecture, a stable solution without a dedicated node, or need to change the architecture to run the test environment off-cluster (envtest?). But it will require more time to research, and stable connectivity between client and cluster (which sometimes is a challenge - it was in the early partner applications).
Acceptance Criteria:
- Controller created under the OPCT project repo
- The controller entry point command must be 'opct adm e2e-dedicated controller' (or anything under 'opct adm e2e-dedicated', see also
OPCT-346) - The version must be validated with e2e tests (currently there is not periodic, you can push your image and run a rehearsal job in the release repo - see also
OPCT-348) - The PR in OPCT project must be merged
References:
- Spike: https://github.com/redhat-openshift-ecosystem/opct/pull/160/files
- Issues related (false-positive failed tests) attached to the epic
OPCT-332