-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
BU Product Work
-
3
-
False
-
None
-
False
-
OCPSTRAT-384 - Control Plane Machine Set Testing
-
CLOUD Sprint 227
Background
We wish to have a periodic that tests the rolling update strategy end to end.
This test should be executed only as a periodic because it is expensive to run and takes a long time.
Motivation
We want to be able to prove that we can replace the whole Control Plane without interruption and without degrading the cluster. This doesn’t run as a presubmit because it is too long running. This test in itself will take around 1 hour to execute. |
We can also simulate this with an integration test to test the surging behaviour of CPMS.|
Steps
- Create a test that:
- Checks the cluster operators are all stable/waits for them to stabilise
- Edits the CPMS spec to increase the control plane instance size
- Monitors the number of control plane machines to ensure the replicas never goes above 4 (checking surge)
- Checks that the CPMS first replaces index 0, then 1, then 2
- Checks naming of new machines
- Checks old machines aren't marked for deletion while the new Machine's phase is not Running
- Checks new machines report the correct/updated instance size
- Waits until all replacements are complete, ie CPMS status reports replicas == updatedReplicas
- Waits until cluster operators stabilise again
Note, a number of these checks will want to be done in parallel, eg a poll could wait until replicas == updatedReplicas while another go routine checks the surge, and another goroutine watches the Machines and checks their state
Stakeholders
- Cluster Infra
Definition of Done
- Periodic test is included in the repository and is proven to work
- Docs
- N/A
- Testing
- N/A
- is blocked by
-
OCPCLOUD-1735 Bootstrap E2E test suite
- Closed
- links to