Uploaded image for project: 'CKI Project'
  1. CKI Project
  2. CKI-7126

Improve Pipeline Infrastructure Stability

XMLWordPrintable

      Identify what leads users to retry jobs in GitLab pipelines, and resolve their causes to reduce user-visible failures.

      AC:

      • Provide metrics about how often jobs were retried by non-CKI users, instead of herder
      • Alert when users retry jobs
      • Improve alerts about jobs failing (recognize jobs failing for similar reasons and escalate to sentry/alertmanager)
      • Update documentation, regarding how to convert the new alerts to the pipeline-herder rules

      Jira: CKI-7126

              Unassigned Unassigned
              rh-ee-tdaapare Tales Lelo da Aparecida
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: