Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-11935

OpenStack APIs take a long time to recover when one of the memcached pods goes down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • rhos-18.0.6
    • rhos-18.0 FR 1 (Nov 2024)
    • infra-operator
    • PIDONE 18.0.4, PIDONE 18.0.5, PIDONE 18.0.6, PIDONE 18.0.7
    • 4
    • Critical

      Whenever one of the mecached pods disappears (because of a rolling restart during a minor update or as result of a failure) APIs take a long time to detect that the pod went away and keep trying to reconnect.

      From a quick round of tests we saw that the API downtime was ~150s or so.

      Looking at the services (mainly keystone and nova-api) config files we saw that there are a couple of parameters that could be useful:

      enable_retry_client=true
      retry_attempts=X
      retry_delay=Y

      We tested with retry_attempts=2 and retry_delay=0 and the APIs recovered much faster.

              rhn-support-lmiccini Luca Miccini
              rhn-support-lmiccini Luca Miccini
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: