Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-143

ROSA/OSD: e2e path for AWS Maintenance events

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • XCMSTRAT-225Managed Services Operational Excellence
    • 100% To Do, 0% In Progress, 0% Done
    • 0

      User Story

      As a customer of managed OpenShift clusters, AWS maintenance events affect me and I do not have a high confidence or understanding of if/how my cluster is affected. 

      As Red Hat, offering the cluster to this customer, we should use AWS maintenance notification information to navigate a customer's cluster through AWS maintenance events, including
       - notifications (service logs)
       - tools for worker node management if applicable
       - ensuring control plane is unaffected

      Open Question: Do we offer to manage worker nodes around AWS maintenance events, or leave that for customers?
      (I'd argue we should offer a checkbox for customers that want us to manage workers around maintenance for them)

      Acceptance Criteria

      • Customers are made aware of AWS maintenance events and if their clusters are affected
      • Customers are informed enough to feel confident that AWS maintenance events will not affect their workloads
      • Customer cluster control plane is managed around the AWS maintenance events.

      Default Done Criteria

      • All existing/affected SOPs have been updated.
      • New SOPs have been written.
      • Internal training has been developed and delivered.
      • The feature has both unit and end to end tests passing in all test
        pipelines and through upgrades.
      • If the feature requires QE involvement, QE has signed off.
      • The feature exposes metrics necessary to manage it (VALET/RED).
      • The feature has had a security review.* Contract impact assessment.
      • Service Definition is updated if needed.* Documentation is complete.
      • Product Manager signed off on staging/beta implementation.

      Dates

      Integration Testing:
      Beta:
      GA:

      Current Status

      GREEN | YELLOW | RED
      GREEN = On track, minimal risk to target date.
      YELLOW = Moderate risk to target date.
      RED = High risk to target date, or blocked and need to highlight potential
      risk to stakeholders.

      References

      Links to Gdocs, github, and any other relevant information about this epic.

            Unassigned Unassigned
            rh-ee-adejong Aaren de Jong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: