Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-2536

[Spike] Investigate how to handle exceeded resource limit in AWS

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • RHOSi Sprint 11

      https://docs.google.com/document/d/1acQOZLae907lVFwNQsKnrexDPXJHQab0NqUxOwKzWPg/edit?pli=1#heading=h.c8ocg5xgr33u

       

      It may happen to have dangling resources on AWS which brings to quota limit / nat limit error while creating new clusters. We should investigate how to recover from this situation (existent scripts vs new scripts to be created by us). 

       

      Some reasons why this situation can happen are:

      1. Too many clusters created
      2. Some clusters have been deleted without uninstalling add-ons like RHODS

       

      Some initials items to start the investigation:

      • https://github.com/integr8ly/cluster-service (useful for point 2 in description)
      • Evaluate usage of the awslimitchecker script. It could be useful to check the different quotas at the beginning of the jenkins pipelines, in order to cancel the job at the very beginning if quotas are not met.  
      • Another options could be to use aws-get-service-quota or CloudWatch alarms per-limit to notify you when you approach your limits
      • Investigate the OSD feature to re-use VPCs for new cluster provisions.  This may allow you to check for known VPCs and use those of cluster provisions

       

      Sub-team: pablo-rhods rhn-support-bdattoma 

              pablo-rhods Pablo Felix (Inactive)
              sanandpa@redhat.com Sweta Anandpara
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: