Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-11400

MongoDB as a 3-member ReplicaSet with AZ anti-affinity

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • AIPCC Productization
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      MongoDB HA Migration – Replace Single Pod with ReplicaSet

      Context

      MongoDB currently runs as a single Deployment pod, which creates a single point of failure.

      During the upcoming ITUPIAD2 resilience test (March 9–13, 2026) an AZ loss scenario will be simulated. If MongoDB remains single-instance, the application will lose its database during the test.

      To ensure service continuity, MongoDB must be migrated to a 3-member ReplicaSet deployed via StatefulSet, allowing automatic failover if a zone becomes unavailable.

      Reference:
      https://source.redhat.com/departments/it/datacenter_infrastructure/itcloudservices/itocp/it_paas_blog/itupiad2_resilience_quarterly_test_cy26_q1_will_take_place_on_march_9th_march_13th_2026

      Goal

      Replace the existing single MongoDB Deployment with a 3-member ReplicaSet StatefulSet that:

      • survives AZ loss
      • provides automatic primary election
      • maintains data persistence
      • is reachable through a ReplicaSet connection string

      Implementation Plan

      1. Headless Service

      Create a headless service to provide stable DNS identities for each replica.

      Example pod DNS names:

       

      {{mongodb-0.mongodb-svc
      mongodb-1.mongodb-svc
      mongodb-2.mongodb-svc}}

      This allows MongoDB members to reliably discover each other.

      2. StatefulSet

      Replace the current Deployment with a StatefulSet.

      Key characteristics:

      • Replicas: 3
      • Stable network identity
      • Persistent storage per member
      • Ordered startup

      Each pod will get its own persistent volume via volumeClaimTemplates.

      Example identity:

       

      {{mongodb-0
      mongodb-1
      mongodb-2}}

      3. ReplicaSet Initialization

      A ConfigMap will contain a bootstrap script executed during first startup.

      The script will:

      1. Detect if the replica set already exists.
      1. If not, run:

       

      {{rs.initiate({
      _id: "rs0",
      members: [

      { _id: 0, host: "mongodb-0.mongodb-svc:27017" }

      ,

      { _id: 1, host: "mongodb-1.mongodb-svc:27017" }

      ,

      { _id: 2, host: "mongodb-2.mongodb-svc:27017" }

      ]
      })}}

      Initialization should run only once.

      4. Pod Anti-Affinity

      To ensure resilience during AZ failures:

       

      {{requiredDuringSchedulingIgnoredDuringExecution
      topology.kubernetes.io/zone}}

      This forces Kubernetes to place each MongoDB member in different availability zones.

      If the cluster cannot guarantee 3 zones, fallback to:

       

      preferredDuringSchedulingIgnoredDuringExecution

      with a secondary topology on kubernetes.io/hostname.

      5. Application Connection Update

      Update the MongoDB connection string across all components:

      • API
      • collector
      • agent

      Current (standalone):

       

      mongodb://user:pass@mongodb:27017

      New ReplicaSet URI:

       

      mongodb://user:pass@mongodb-0.mongodb-svc:27017,mongodb-1.mongodb-svc:27017,mongodb-2.mongodb-svc:27017/?replicaSet=rs0

      ReplicaSet URIs allow drivers to automatically:

      • detect the primary
      • reconnect after failover

      6. Data Migration

      Existing data from the single MongoDB instance must be migrated.

      Proposed method:

        1. Deploy the new ReplicaSet alongside the existing standalone                                                                             
        2. Point collectors at the new ReplicaSet, let them run a few cycles to populate it                                                        
        3. Verify the data looks right                                                                                                             
        4. Switch API and agent to the new connection string                                                                                       
        5. Decommission the old standalone pod

      Required Manifests

      The following Kubernetes manifests are required:

      1. Headless Service
        • mongodb-svc
      1. ConfigMap
        • ReplicaSet initialization script
      1. StatefulSet
        • 3 replicas
        • PVC templates
        • anti-affinity rules
        • readiness/liveness probes
      1. Application Updates
        • API deployment
        • collector deployment
        • agent deployment
        • update MONGODB_URI

      Risks

      AZ capacity

      If the cluster cannot schedule pods across 3 zones, the strict anti-affinity rule may prevent scheduling.

      Mitigation:

      • switch to preferredDuringSchedulingIgnoredDuringExecution

      Resource quotas

      Namespace quotas must allow:

      • 3 MongoDB pods
      • 3 persistent volumes

      Storage class compatibility

      The storage class must support:

      • dynamic PVC provisioning
      • StatefulSet volumeClaimTemplates

      Acceptance Criteria

      The migration will be considered successful when:

      • MongoDB runs as a 3-member ReplicaSet
      • Each member runs in separate availability zones
      • Applications connect via ReplicaSet URI
      • Automatic failover works
        • deleting the primary elects a new one within ~10–12 seconds
      • Data persists across pod restarts
      • Existing data successfully migrated
      • Deployment completed before March 9

       

              Unassigned Unassigned
              rhit_jmorenas Jose Angel Morena
              Klara's Team
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: