Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43632

router-default pods in openshift-ingress namespace don't rebalance after zone outage

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.16
    • kube-scheduler
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During disaster recovery testing with stretch cluster, observed the router-default pods were both scheduled to surviving zone (datacenter 1) during outage.   When the down zone (datacenter 2) was restored, the pods were NOT rebalanced and thus subsequent tests where datacenter 1 was down resulted in unexpected outage. 

      Version-Release number of selected component (if applicable):

        4.16   

      How reproducible:

      Occurred always when followed steps outlined below to reproduce 

      Steps to Reproduce:

          1. Setup stretch cluster as defined here : https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.16/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/disaster-recovery-subscriptions_common#disaster-recovery-subscriptions_common     
          2. Simulate outage by taking down datacenter 2 
          3. Wait approx 8 min for eviction to occur and note the router-default pods will both be running on datacenter 1
          4. Bring datacenter 2 up
          5. Take down datacenter 1   

      Actual results:

          Outage occurs

      Expected results:

          HA applications available after minimal (if any) outage as datacenter 2 is up

      Additional info:

       The topologySpreadConstraints for deployment/router-default in openshift-ingress namespace indicates to ScheduleAnyway - pods are NOT re-balanced after zone outages.
      
        topologySpreadConstraints:
        - labelSelector:
            matchExpressions:
            - key: ingresscontroller.operator.openshift.io/hash
              operator: In
              values:
              - 7d6bdccc5
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
         

              aos-workloads-staff Workloads Team Bot Account
              morstad Nancy Heinz
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: