-
Bug
-
Resolution: Done-Errata
-
Blocker
-
None
-
None
-
False
-
-
False
-
?
-
No Docs Impact
-
?
-
?
-
None
-
Release Note Not Required
-
-
-
Approved
-
Important
OSPCIX-342 revealed that the service will fail to start if the db file is present on disc but empty. The reason why the empty file is on disc is because the previous start of the pod was interrupted in the middle of db file initialization. The interrupt was triggered by configuration change.
The reason why the empty file is left is because we don't propagate SIGTERM to ovsdb-tool that creates the file. Instead, SIGTERM is caught by the shell script, which makes the script exit before ovsdb-tool is complete. This in turn results in SIGKILL sent to the tool, leaving the file in inconsistent state.
There are several issues to resolve here:
- when an empty db file is present, we should be able to detect it and remove it before proceeding with configuration. (This could also be handled in ovs-lib.in in OVS but would require patching Open vSwitch package.)
we run dumb-init with --single-child, which doesn't send SIGTERM to children of children, nor we have a SIGTERM handler in the shell start script. The fix should involve sending SIGTERM to ovsdb-tool, and removing --single-child achieves this. - This will be addressed inhttps://issues.redhat.com/browse/OSPRH-8212Removing --single-child should probably help, but is not enough, because dump-init does not wait for children-of-children to exit (only for the main child). So the script should wait for ovsdb-tool to complete. - This will be addressed inhttps://issues.redhat.com/browse/OSPRH-8212
A long discussion of this case is here: https://redhat-internal.slack.com/archives/C046JULBVJ7/p1719396554432979
Some background on how children are handled in containers (for docker but should apply elsewhere): https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
- blocks
-
OSPRH-691 BZ#2092485 [RFE] [P1] Podified Control Plane : Neutron
- Closed
- is cloned by
-
OSPRH-8212 Improve TERM signal handling in OVNDbCluster startup scripts
- Closed
- is related to
-
OSPRH-10688 ovn-controller-ovs in CrashLoopBackOff due to empty conf.db
- Dev Complete
- links to
-
RHBA-2024:135531 OpenStack Operators
- mentioned on