Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-8212

Improve TERM signal handling in OVNDbCluster startup scripts

XMLWordPrintable

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • Important

      OSPCIX-342 revealed that the service will fail to start if the db file is present on disc but empty. The reason why the empty file is on disc is because the previous start of the pod was interrupted in the middle of db file initialization. The interrupt was triggered by configuration change.

      The reason why the empty file is left is because we don't propagate SIGTERM to ovsdb-tool that creates the file. Instead, SIGTERM is caught by the shell script, which makes the script exit before ovsdb-tool is complete. This in turn results in SIGKILL sent to the tool, leaving the file in inconsistent state.

       

      There are several issues to resolve here:

       

      • when an empty db file is present, we should be able to detect it and remove it before proceeding with configuration. (This could also be handled in ovs-lib.in in OVS but would require patching Open vSwitch package.) - This will be handled in https://issues.redhat.com/browse/OSPRH-8117
      • we run dumb-init with --single-child, which doesn't send SIGTERM to children of children, nor we have a SIGTERM handler in the shell start script. The fix should involve sending SIGTERM to ovsdb-tool, and removing --single-child achieves this.
      • Removing --single-child should probably help, but is not enough, because dump-init does not wait for children-of-children to exit (only for the main child). So the script should wait for ovsdb-tool to complete.

       

      A long discussion of this case is here: https://redhat-internal.slack.com/archives/C046JULBVJ7/p1719396554432979

      Some background on how children are handled in containers (for docker but should apply elsewhere): https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

            rodolfo_alonso Rodolfo Alonso
            ihrachys Ihar Hrachyshka
            Maor Blaustein Maor Blaustein
            rhos-dfg-networking-squad-neutron
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: