Uploaded image for project: 'Red Hat 3scale API Management'
  1. Red Hat 3scale API Management
  2. THREESCALE-5460

Review Zync Prometheus metrics on Que jobs

    XMLWordPrintable

Details

    • Task
    • Resolution: Done
    • Major
    • 2.9 ER1
    • None
    • Zync
    • None
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Invalid Sprint

    Description

      Ops have requested us to review some of the new Prometheus metrics of Zync, so que jobs are not accounted twice in different types of que_jobs_scheduled_total

      See: https://github.com/3scale/platform/issues/230

      Summary of the improvements requested:

      1. Make "scheduled" not to include failed jobs scheduled for retry
      2. Make "failed" to count failed jobs scheduled for retry and not expired ones
      3. Introduce a new type "expired" for failed jobs that already ran out of attempts to retry and therefore won't be retried again
      4. Remove the type "retried" since it seems not be updated by Que, which handles retries based on the value of the error_count column only.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mcassola Guilherme Cassolato
            Guilherme Cassolato Guilherme Cassolato
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: