Uploaded image for project: 'OpenShift Cloud'
  1. OpenShift Cloud
  2. OCPCLOUD-2167

CPMS tests should cover unhealthy node cases

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • 3
    • False
    • None
    • False
    • 3
    • CLOUD Sprint 241, CLOUD Sprint 242, CLOUD Sprint 243, CLOUD Sprint 244, CLOUD Sprint 253

      User Story

      As a developer of CPMS I want to ensure unhealthy nodes can be replaced so that we can recommend to users to use CPMS

      Background

      QE have some manual test cases that test a couple of unhappy scenarios for the CPMS, that should result in automatic recovery.

      I would like to see these automated as part of the periodic suite for CPMS.

      The behaviour itself isn't really dependent on CPMS, but, the whole workflow is.
      The behaviour is primarily based on other components and how they react, but block CPMS from operating as expected.

      The two cases I would like to see added are:

      • Terminate an instance on the cloud provider
        • Once terminated, the node object should get removed
        • Once the node object is removed, the machine should enter a failed state
        • Terminate the Machine
        • Eventually a new Machine comes up
        • Eventually the old Machine goes away
        • Eventually the cluster stabilises
      • Terminate the kubelet on the node
        • SSH to the node and terminate kubelet
        • Eventually the node will go into unready (condition)
        • Delete the Machine object (MHC would do this in the real world)
        • Eventually a new Machine becomes ready
        • Eventually the old Machine goes away
        • Eventually the cluster stabilises

      Steps

      • Review the previous bug and Daniel's work to understand what got broken
      • Understand how to terminate an instance in the cloud from a test
      • Understand how to stop kubelet from a test (oc debug?, SSH creds in cluster?)
      • Write the tests as described above
      • Ensure the tests actually pass

      Stakeholders

      • Cluster Infra
      • TRT
      • etcd

      Definition of Done

      • Tests are added to the existing periodic suite

            joelspeed Joel Speed
            joelspeed Joel Speed
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: