Uploaded image for project: 'Database as a Service'
  1. Database as a Service
  2. DBAAS-291

Mitigate race conditions among different OpenShift clusters/namespaces for the same Atlas cluster

XMLWordPrintable

    • False
    • False
    • No

      Atlas Operator uses MongoDBAtlasInstance custom resource for cluster provisioning. Its controller uses the AtlasCluster custom resource to actually manage the cluster in Atlas. It will create, update or delete the cluster in Atlas according to AtlasCluster CR's spec. In other words, the AtlasCluster CR becomes the source of truth for the cluster, even if the cluster was already created in Atlas directly.

      This brings up race conditions, however, because it is possible that there are multiple AtlasCluster CRs located in different namespaces or in different OpenShift clusters. and they are targeting the same remote cluster in Atlas. When these CRs have different spec settings, the operator(s) will try to reconcile the CRs based on each CR's spec, and hence lead to race conditions because they try to manage the same remote cluster in Atlas. The problem can also happen when one of the CRs is deleted. The remote cluster will get deleted by this CR, however other CRs will try to recreate the cluster (the problem for deletion can be mitigated by setting the CRs in such a way that the remote cluster will be kept in Atlas when the CR is deleted, but users need to know which CR should be configured to allow the cluster deletion).

      A complete solution for this issue will likely require enhancements in Atlas API. This task is to implement a simple approach that can mitigate such race conditions. Such issues can still happen but occurrences will be rare.

      This quick fix is to have MongoDBAtlasInstance controller to check if the cluster already exists in Atlas before it creates AtlasCluster CR. If the cluster already exists, the provisioning request is abandoned and the MongoDBAtlasInstance CR status is updated with Phase = "Failed".

      A similar solution should be implemented in other provider operators as well.

       

       

       

              jianrzha@redhat.com Jianrong Zhang
              jianrzha@redhat.com Jianrong Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: