Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: rhacs, rhacs-observability
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

RHACS: Expose Administrative Events as Prometheus Metrics for External Alerting

2. What is the nature and description of the request?

Red Hat Advanced Cluster Security (RHACS) generates "Administrative Events" to log the health and status of the RHACS system itself. These events include critical errors suchas "image scan errors," database issues, or sensor communication failures.

Currently, these events are only visible within the RHACS UI and are not exposed as metrics. This RFE requests that RHACS expose these administrative events, particularly those with a level of ERROR or WARN, as Prometheus metrics.

A new metric, for example, rox_central_administrative_events_total, could be introduced. This metric should use labels to differentiate the events, such as:

type: (e.g., image_scan_error, db_error, sensor_comm_failure)

level: (e.g., info, warn, error)

This would allow external Prometheus instances to scrape these metrics and build alerts based on them.

An Alternative Solution might be to have some Integration which alerts for administrative events like there is for policy violations.

3. Why does the customer need this? (List the business requirements here)

Proactive Incident Management: The customer's primary requirement is to automatically generate investigation tickets when critical system errors occur. They cannot rely on operators manually monitoring the "Administrative Events" UI page.

Ensure Platform Health and Reliability: Failures logged in Administrative Events (like scan errors) represent a silent failure of the security platform. If new images are not being scanned, the organization is exposed to unknown vulnerabilities. Proactive alerting on these failures is critical to maintaining the platform's reliability and security posture.

Reduce Mean Time to Detection (MTTD): Without metrics, a critical error might go unnoticed for hours or days. Automated alerts based on metrics reduce the detection time to minutes, allowing the platform team to investigate and remediate issues before they cause a significant security gap.

4. List any affected packages or components

Metrics Endpoint
RHACS Central

Assignee:: Sabina Aledort

Reporter:: Steffen Lützenkirchen

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/10/30 3:35 PM

Updated:: 2026/01/05 8:20 AM

Target start:: None

Target end:: None

Details

Description

1. Proposed title of this feature request

2. What is the nature and description of the request?

3. Why does the customer need this? (List the business requirements here)

4. List any affected packages or components

Attachments

Easy Agile Planning Poker

Activity

People

Dates