-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
As OCP user, I want storage operators restarted quickly and newly started operator to start leading immediately without ~3 minute wait.
This means that the old operator should release its leadership after it receives SIGTERM and before it exists. Right now, storage operators fail to release the leadership in ~50% of cases.
Steps to reproduce:
- Delete an operator Pod (`oc delete pod xyz`).
- Wait for a replacement Pod to be created.
- Check logs of the replacement Pod. It should contain "successfully acquired lease XYZ" relatively quickly after the Pod start (+/- 1 second?)
- Go to 1. and retry few times.
This is an hack'n'hustle "work", not tied to any Epic, I'm using it just to get proper QE and tracking what operators are being updated (see linked github PRs).
- links to
(11 links to)