Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-1215

IndexManagement Crons can fail with 500 response from ES

    XMLWordPrintable

Details

    • Logging (Explore) - Sprint 198
    • Passed
    • NEW
    • NEW
    • Hide
      * Previously, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This release fixes the issue by updating the cron jobs to be more resilient when they encounter temporary HTTP 500 errors. They will retry a request multiple times first before failing.
      (LOG-1215)
      Show
      * Previously, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This release fixes the issue by updating the cron jobs to be more resilient when they encounter temporary HTTP 500 errors. They will retry a request multiple times first before failing. ( LOG-1215 )

    Description

      While under the es cluster is under load, it is possible for the cron jobs to fail due to receiving a http 500 response from elasticsearch.

       

      To recreate this locally I did the following:

      1) Spin up a 3 node es cluster (I used 1G memory and 500m cpu)

      2) Configure cronjobs to run every minute and rollover after 1 minute and delete after 5

      3) Put a small load on the server (I created a script that would create x number of indices with a small message) and could usually see crons error due to a 500 while it was creating 800 indices.

      4) Watch for cronjobs to error out (and confirm it was due to a 500)

      Attachments

        Activity

          People

            ewolinet@redhat.com Eric Wolinetz (Inactive)
            ewolinet@redhat.com Eric Wolinetz (Inactive)
            Giriyamma K R Giriyamma K R (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: