-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
Target Release: Service Preview
Context
With the increased complexity of applications, the need for reactive monitoring is growing faster than ever. The value of data diminishes sharply the more time you take to act on it, having the ability to trigger Ansible remediation based on the processing of streams of events bring immediate value to customers looking to automate the remediation of IT top issues to save not only money but to provide a better experience to their customers.
Stakeholders
- ENG
- BU
- PM
- UX
Job Stories
User Narrative
Acme corporation has a heterogeneous infrastructure in multiple clouds and on-premise. By looking into the cost of operations, they realized that the company spends a huge amount of money manually remediating simple tasks that are already automated, like restarting a server or rotating logs. They want to build an event -driven application that can ingest the alerts coming from their monitoring systems and when an issue could be remediated automatically they want to:
1- Notify the relevant stakeholders using channels like: Slack for SREs and mail for managers.
2- Open a ticket in jira to track the automated work
3- Send the event and the relevant information, like application data, event type , usage metrics, status, etc. To ansible tower so it can trigger the appropriate playbook.
Because RHOSE is a cloud service and the information of the alert could contain sensitive information, it is required to integrate with SSO for access control, encrypt the data and the communication channel, support mTLS certificate authentication and external PKI verification.
Once the event is sent to Ansible Tower RHOSE should reply back to the customer , with a response notification if everything went well or an error message.
Goals
- Connect Ansible Tower as an Action to Red Hat Openshift Smart Events following the same usage pattern as the other actions
- Have retry mechanisms when trying to communicate with Ansible Tower
- Handle errors and send meaningful messages to the customer when they occur.
- Include Ansible Tower in the catalog of available actions in RHOSE.
- Support for Ansible Tower clustered environments.
Non-goals
- Replay
- Durable events support
Current Approach
Proposed Solution
…
….
Architecture
The source diagram for the proposed architecture is available here:
- link
Rejected Proposals
….
….