Uploaded image for project: 'OPCT - OpenShift Provider Compatibility Tool'
  1. OPCT - OpenShift Provider Compatibility Tool
  2. OPCT-347

Create a controller to handle pods failing to schedule in the dedicated node

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • CLI
    • None
    • 5
    • OpenShift SPLAT - Sprint 268

      Story:

      • As a OpenShift engineer I would like to create a mechanism to prevent e2e tests failing when randomly trying to schedule in the OPCT dedicated node, preventing false-positive in the final conformance results so that we can improve the visibility and decrease the maintenance friction in the OPCT architecture to run conformance workflow in-cluster
      • As Partner I would like to get more visibility of tests failing exclusively for my test environment, so that I can self-review, and isolate failures easily.
      • As Cert-Ops Engineer I would like to review a partner result quickly with non-OPCT environmental failures causing false-positives, so that we can provide a feedback quickly to the partner

      Goal:

      • Create a Controller to watch pods failed to schedule caused by OPCT dedicated node (a worker node with NoSchedule taints), mutate those adding toleration allowing to schedule in the dedicated node

       

      Why:

      • Dedicated node has been created to prevent disruptions in the test environment by the scheduler forcing to evict the test services (Sonobuoy aggregator, workflow jobs, etc). 

       

      Disclaimer: We can't deliver, at least in the current architecture, a stable solution without a dedicated node, or need to change the architecture to run the test environment off-cluster (envtest?). But it will require more time to research, and stable connectivity between client and cluster (which sometimes is a challenge - it was in the early partner applications).

       

      Acceptance Criteria:

      • Controller created under the OPCT project repo
      • The controller entry point command must be 'opct adm e2e-dedicated controller' (or anything under 'opct adm e2e-dedicated', see also OPCT-346)
      • The version must be validated with e2e tests (currently there is not periodic, you can push your image and run a rehearsal job in the release repo - see also OPCT-348 )
      • The PR in OPCT project must be merged

       

      References:

              rhn-support-mrbraga Marco Braga
              rhn-support-mrbraga Marco Braga
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 4 hours
                  1d 4h