Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-8118

OVNDbCluster: ovsdb-server process is not exited gracefully

XMLWordPrintable

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ovn-operator-container-1.0.4-4
    • ?
    • ?
    • None
    • Moderate

      The process is executed as follows: dumb-init --single-child -> start.sh -> ovsdb-server.

       

      When pod exits, k8s will send SIGTERM to dump-init, which will proxy it to start.sh but won't send TERM to ovsdb-server . start.sh will exit without sending SIGTERM to ovsdb-server either. In the end, k8s detects that the main process (start.sh) exited and sends SIGKILLs to remaining processes, incl. ovsdb-server.

       

      We should give the process a change to exit cleanly. I think this can be done with something like:

       

      diff --git a/templates/ovndbcluster/bin/cleanup.sh b/templates/ovndbcluster/bin/cleanup.sh
      index bd3588e..fcd82df 100755
      --- a/templates/ovndbcluster/bin/cleanup.sh
      +++ b/templates/ovndbcluster/bin/cleanup.sh
      @@ -44,3 +44,5 @@ if [[ "$(hostname)" != "{{ .SERVICE_NAME }}-0" ]]; then
           # now that we left, the database file is no longer valid
           rm -f /etc/ovn/ovn${DB_TYPE}_db.db
       fi
      +
      +/usr/share/ovn/scripts/ovn-ctl stop_nb_ovsdb

      I don't think this is a critical problem, since the process will exit anyway. Maybe it affects RAFT leadership transition somewhat, since pod0 will not have a chance to let others know that it's exiting (I don't know if it communicates about it though.) But I don't have data to suggest it would e.g. corrupt db file. (I am told that invalid log entries in RAFT log are gracefully ignored by ovsdb-server.)

              rodolfo_alonso Rodolfo Alonso
              ihrachys Ihar Hrachyshka
              Bharath M V Bharath M V
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: