-
Feature Request
-
Resolution: Done
-
Normal
-
None
-
2.5
-
False
-
-
False
What is the nature and description of the request?
Assuming:
- there is a system generating events
- the events relate to some real piece of infrastructure (a running process, a physical server, a load balancer, a DNS record, a user account, any "resource" ...)
- EDA is being used to take action based on those events
In many such systems, the action being taken by EDA has a direct relation to a resource that spawned the event. For example:
- a fan in a server fails
- a hardware management tool fires an event about the failed fan
- EDA receives the event and spawns a job that begins moving work off of that server
Depending on the nature of the hardware management tool, it's possible that numerous events may be fired related to the failed fan. 1) the fain failed, 2) server temps are rising, 3) the other fan is compensating and exceeded its normal operating RPM range, etc. Each event is related to the same server.
The response for all of those events is the same; move work off of that server. But we would not want EDA to launch multiple jobs concurrently that are all trying to take the same remediating action on the same resource. Thus we need a way for the rulebook to understand how to avoid launching multiple jobs concurrently for the same resource.
A "concurrency key" would allow the rulebook to group events by resource. The rulebook should enable the key to be composed based on information in the event payload. That would enable the rulebook author to create a key with the right level of uniqueness for the circumstance.
EDA would also need to track the state of running jobs to know whether there is already a job running with a given concurrency key.
Why does the customer need this? (List the business requirements here)
We are using EDA to implement a self-service multi-tenant cloud environment with openshift and ACM. The resulting solution will be targeted at numerous local, private, and sovereign cloud providers. That involves lots of events related to infrastructure, and we must be able to ensure that spawned jobs are not conflicting with each other.
I expect that this feature will likewise be valuable to lots of use cases that involve managing infrastructure with EDA.
How would you like to achieve this? (List the functional requirements here)
A concurrency key can be specified in a rulebook and can be composed based on information in the body of the event payload.
When it's determined that a job should not be launched because there is already one with the same concurrency key, the event should remain in queue rather than being discarded.
When multiple events are in the queue for the same concurrency key, they should be de-duplicated. That will avoid the potential for an ever-growing backlog in case events are spawned at a faster rate than the resulting jobs can be completed.
Then the action is run as a Job or similar, the concurrency key should show up as a label on the k8s Job resource so that independent tooling can observe the state of jobs related to defined resources.
List any affected known dependencies: Doc, UI etc..
- The rulebook schema will need to be augmented with this new field
- Documentation will need to describe it
- Release notes will need to mention it