Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: rhos-18.0 Feature Release 1 (Nov 2024)
Affects Version/s: rhos-18.0.0
Component/s: ovn-operator
Labels:
- triaged

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
?
Docs Approval:
?
Fixed in Build:
ovn-operator-container-1.0.4-4
PM Approval:
?
QE Approval:
?
Regression:
None
Intelligence Requested:
Market:
Errata Link:
https://errata.engineering.redhat.com/advisory/140345

Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The process is executed as follows: dumb-init --single-child -> start.sh -> ovsdb-server.

When pod exits, k8s will send SIGTERM to dump-init, which will proxy it to start.sh but won't send TERM to ovsdb-server . start.sh will exit without sending SIGTERM to ovsdb-server either. In the end, k8s detects that the main process (start.sh) exited and sends SIGKILLs to remaining processes, incl. ovsdb-server.

We should give the process a change to exit cleanly. I think this can be done with something like:

diff --git a/templates/ovndbcluster/bin/cleanup.sh b/templates/ovndbcluster/bin/cleanup.sh
index bd3588e..fcd82df 100755
--- a/templates/ovndbcluster/bin/cleanup.sh
+++ b/templates/ovndbcluster/bin/cleanup.sh
@@ -44,3 +44,5 @@ if [[ "$(hostname)" != "{{ .SERVICE_NAME }}-0" ]]; then
     # now that we left, the database file is no longer valid
     rm -f /etc/ovn/ovn${DB_TYPE}_db.db
 fi
+
+/usr/share/ovn/scripts/ovn-ctl stop_nb_ovsdb

I don't think this is a critical problem, since the process will exit anyway. Maybe it affects RAFT leadership transition somewhat, since pod0 will not have a chance to let others know that it's exiting (I don't know if it communicates about it though.) But I don't have data to suggest it would e.g. corrupt db file. (I am told that invalid log entries in RAFT log are gracefully ignored by ovsdb-server.)

links to

openstack-k8s-operators/ovn-operator#320: Stop the OVN database during the cleanup script

RHSA-2024:140345 RHOSO OpenStack Podified operator containers security update

Assignee:: Rodolfo Alonso

Reporter:: Ihar Hrachyshka

QA Contact:: Bharath M V

Team:: rhos-dfg-networking-squad-neutron

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/06/28 8:06 PM

Updated:: 2024/11/13 1:17 PM

Resolved:: 2024/11/13 1:17 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty