Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: ovn-operator
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-connectivity-neutron
Regression:
None
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Related slack thread ,

Log link https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openstack-k8s-operators_ovn-operator/304/pull-ci-openstack-k8s-operators-ovn-operator-main-ovn-operator-build-deploy-kuttl/1800263500052828160

The issue is non-master pods get's stuck in Terminating state:-
pod/ovsdbserver-nb-1 1/1 Terminating 0 4m31s
pod/ovsdbserver-nb-2 1/1 Terminating 0 4m31s
pod/ovsdbserver-sb-1 1/1 Terminating 0 4m31s
They will likely be removed once the termination grace period time is over. which is currently set to 5 minutes. These are just symptoms not actual issue.

As part of the test we delete pods using "oc delete pods -n $NAMESPACE -l service=ovsdbserver-nb"

And it could be that ovsdbserver-nb-0 and ovsdbserver-sb-0 are deleted first, giving no time to other pods to run cluster leave command and stuck in terminating state.

There were some warning events seen:-
4m12s Warning RecreatingFailedPod statefulset/ovsdbserver-nb StatefulSet ovn-kuttl-tests/ovsdbserver-nb is recreating failed Pod ovsdbserver-nb-0

6m46s Warning FailedUpdate statefulset/ovsdbserver-nb update Pod ovsdbserver-nb-0 in StatefulSet ovsdbserver-nb failed error: Could not update claim ovndbcluster-nb-sample-etc-ovn-ovsdbserver-nb-0 for delete policy ownerRefs: Operation cannot be fulfilled on persistentvolumeclaims "ovndbcluster-nb-sample-etc-ovn-ovsdbserver-nb-0": the object has been modified; please apply your changes to the latest version and try again6m45s Warning FailedUpdate statefulset/ovsdbserver-sb update Pod ovsdbserver-sb-0 in StatefulSet ovsdbserver-sb failed error: Could not update claim ovndbcluster-sb-sample-etc-ovn-ovsdbserver-sb-0 for delete policy ownerRefs: Operation cannot be fulfilled on persistentvolumeclaims "ovndbcluster-sb-sample-etc-ovn-ovsdbserver-sb-0": the object has been modified; please apply your changes to the latest version and try again

The ticket is to identify the cause and fix it. One option may be to have some dummy preStop hook(may be some sleep) also for pod-0 so it do not terminate immediately.

UPD: Note that a workaround that bumped timeout for kuttl landed: https://github.com/openstack-k8s-operators/ovn-operator/pull/356 We'll need to revert it in the scope of this issue before closing it (after confirming the gate is stable).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

recreate.sh
0.9 kB
2024/06/13 9:00 PM

is caused by

FDP-662 Multiple cluster/leave calls can result in a leaderless cluster after a downed member returns

Closed

links to

openstack-k8s-operators/ovn-operator#249: Do not call "bash -c" unnecessarily as entry points

openstack-k8s-operators/ovn-operator#356: Bump kuttl tests timeout to 6 minutes

Assignee:: Terry Wilson

Reporter:: Yatin Karel

Team:: rhos-dfg-networking-squad-neutron

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/06/12 6:12 AM

Updated:: 2025/10/24 8:34 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty