Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48560

[DCM] 503 error on persistent connection after service update with idle-close-on-response enabled

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 2
    • Important
    • None
    • None
    • None
    • Rejected
    • NI&D Sprint 268
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When a service is updated on a route, the change is processed as two separate events: route removal and route addition (https://github.com/openshift/router/blob/7a688b0eab5a27fe13988a21022c774cdeb964b2/pkg/router/template/router.go#L1090-L1108).
      In a DCM-enabled router, route removal is handled by disabling all associated servers while the route addition cannot be done dynamically and results in the router reload. If the idle-close-on-response option is enabled, this can lead to a 503 error when a client attempts to reuse a persistent connection.
      The issue occurs because idle connections are not reset immediately but are deferred until the last request is handled (as per HAProxy's idle-close-on-response behavior). This causes the old HAProxy process to remain active with all servers in maintenance mode for the affected route. As a result, any subsequent request on the persistent connection to this route receives a 503 error, while all later requests correctly reach the new service endpoint(s).

      Version-Release number of selected component (if applicable):

      4.18+ (when TechPreview featureset is used)  

      How reproducible:

      Always

      Steps to Reproduce:

      1. Make sure to use the router with Dynamic Configuration Manager enabled.
      2. Make sure idle-close-on-response option is set in the router's configuration.
      3. Make sure to use an http client which reuses connections.
      4. Send an http request to a route.
      5. Change the router's "to" service to another one with a ready endpoint.
      6. Send another http request to the same route.
          

      Actual results:

      503 error for the first request. All subsequent requests succeed and come from the new endpoint.

      Expected results:

      The first request should go to the old endpoint. All subsequent requests succeed and come from the new endpoint.

      Additional info:

      Potential fix: fallback to the router reload for the route update event.

              alebedev@redhat.com Andrey Lebedev
              alebedev@redhat.com Andrey Lebedev
              None
              None
              Shudi Li Shudi Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: