-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
It could be good to have enhanced control and visibility over the health of the containers acting as mirror workers in Quay. Currently, there are no available metrics or health endpoints that provide insight into their operational status, nor is there a way to generate alerts when issues arise.
It would be extremely beneficial for the customers to have metrics available that track the state of the sync processes, whether the mirror workers are functioning correctly, how many mirrors are actively running, and other key performance indicators. Additionally, having a dedicated health endpoint to query the status of these mirrors would be valuable. This would allow for proactive monitoring and the ability to take corrective actions when errors are detected.
This enhancement would improve operational efficiency, allow for better resource management, and provide an early warning system for any issues related to repository mirroring.
------------
Updated Information:
The customer is requesting four new specific metrics to monitor repository mirroring, as the existing "quay_repository_rows_unmirrored" metric is considered not sufficient by the customer.
The new metrics required are:
- Tags pending synchronization: Total number of tags not yet synchronized for each mirrored repository.
- Status of the last synchronization: An indicator (e.g., success/fail/in-progress) of the latest synchronization attempt per repository.
- Complete synchronization per repository: A boolean/0/1 metric indicating if a specific mirrored repository has had all its tags successfully synchronized since the last run.
- Synchronization failure counter: A key metric for alerting that accumulates the total number of mirroring failures for a repository.
- is incorporated by
-
RFE-7439 Additional Prometheus metrics for repo-level push/pull errors, latency tracking, and system availability monitoring
-
- Refinement
-
- relates to
-
RFE-7439 Additional Prometheus metrics for repo-level push/pull errors, latency tracking, and system availability monitoring
-
- Refinement
-
- links to