There have been various situations where bugs in WF or in user code have prevented management operations from completing, leaving the ModelController locked permanently, unless the process is stopped. There should be mechanism via which timeouts can be set.
My proposal is to allow a timeout to be set via a management operation header. In addition, there will be a standard timeout that will apply if no header is present.
The standard timeout will be quite lengthy, probably several minutes. The goal is to allow a management process to eventually auto-recover, not to do prompt detection of failures. Users can use the header if they wish prompt detection a failures. A short standard timeout runs the risk of false positives, particularly with large deployments.
The meaning of this timeout is not to be an overall maximum time for operation execution. There would be no valid default for such a timeout in a managed domain, where the time it would take to roll out a change would depend on the size of the domain and the rollout plan.
Rather, this timeout is meant to be the maximum period operation execution threads can block in various points. Blocking for longer than the timeout would result in operation failure.