Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-7626

[ovn-operator][kuttl][ovn-db-delete] Random test failures

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • ovn-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Important

      Related slack thread ,

      Log link https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openstack-k8s-operators_ovn-operator/304/pull-ci-openstack-k8s-operators-ovn-operator-main-ovn-operator-build-deploy-kuttl/1800263500052828160

      The issue is non-master pods get's stuck in Terminating state:-
      pod/ovsdbserver-nb-1 1/1 Terminating 0 4m31s
      pod/ovsdbserver-nb-2 1/1 Terminating 0 4m31s
      pod/ovsdbserver-sb-1 1/1 Terminating 0 4m31s
      They will likely be removed once the termination grace period time is over. which is currently set to 5 minutes. These are just symptoms not actual issue.

      As part of the test we delete pods using "oc delete pods -n $NAMESPACE -l service=ovsdbserver-nb"

      And it could be that ovsdbserver-nb-0 and ovsdbserver-sb-0 are deleted first, giving no time to other pods to run cluster leave command and stuck in terminating state.

      There were some warning events seen:-
      4m12s Warning RecreatingFailedPod statefulset/ovsdbserver-nb StatefulSet ovn-kuttl-tests/ovsdbserver-nb is recreating failed Pod ovsdbserver-nb-0
       
      6m46s Warning FailedUpdate statefulset/ovsdbserver-nb update Pod ovsdbserver-nb-0 in StatefulSet ovsdbserver-nb failed error: Could not update claim ovndbcluster-nb-sample-etc-ovn-ovsdbserver-nb-0 for delete policy ownerRefs: Operation cannot be fulfilled on persistentvolumeclaims "ovndbcluster-nb-sample-etc-ovn-ovsdbserver-nb-0": the object has been modified; please apply your changes to the latest version and try again6m45s Warning FailedUpdate statefulset/ovsdbserver-sb update Pod ovsdbserver-sb-0 in StatefulSet ovsdbserver-sb failed error: Could not update claim ovndbcluster-sb-sample-etc-ovn-ovsdbserver-sb-0 for delete policy ownerRefs: Operation cannot be fulfilled on persistentvolumeclaims "ovndbcluster-sb-sample-etc-ovn-ovsdbserver-sb-0": the object has been modified; please apply your changes to the latest version and try again

      The ticket is to identify the cause and fix it. One option may be to have some dummy preStop hook(may be some sleep) also for pod-0 so it do not terminate immediately.

       

      UPD: Note that a workaround that bumped timeout for kuttl landed: https://github.com/openstack-k8s-operators/ovn-operator/pull/356 We'll need to revert it in the scope of this issue before closing it (after confirming the gate is stable).

        1. recreate.sh
          0.9 kB
          Terry Wilson

              twilson@redhat.com Terry Wilson
              ykarel@redhat.com Yatin Karel
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: