Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-6009

Request for fetch traces is timeouted after 30s even though I set a higher timeout in external_services.tracing.query_timeout

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • OSSM 2.6.0
    • OSSM 2.5.0
    • Kiali
    • None
    • False
    • None
    • False
    • Hide
      The default tracing time out query in Kiali is 30 seconds. It is possible to increase the Tracing timeout with the following configuration:

      `.spec.external_services.tracing.query_timeout: 40`
      `.spec.server.write_timeout: 40`
      openshift has default 30s timeout on routes. The Kiali route needs this annotation `haproxy.router.openshift.io/timeout=50s` (https://docs.openshift.com/container-platform/4.16/networking/routes/route-configuration.html#nw-configuring-route-timeouts_route-configuration)
      Show
      The default tracing time out query in Kiali is 30 seconds. It is possible to increase the Tracing timeout with the following configuration: `.spec.external_services.tracing.query_timeout: 40` `.spec.server.write_timeout: 40` openshift has default 30s timeout on routes. The Kiali route needs this annotation `haproxy.router.openshift.io/timeout=50s` ( https://docs.openshift.com/container-platform/4.16/networking/routes/route-configuration.html#nw-configuring-route-timeouts_route-configuration )
    • Hide
      1. Set up OSSM 2.5 with tempo tracing
      2. Setup bookinfo with kiali traffic generator
      3. Wait at least 15minutes to generate traffic
      4. Open Kiali UI
      5. Go to Application->httpbin->httpbin->Outbound Metrics or Traces
      Show
      Set up OSSM 2.5 with tempo tracing Setup bookinfo with kiali traffic generator Wait at least 15minutes to generate traffic Open Kiali UI Go to Application->httpbin->httpbin->Outbound Metrics or Traces

      Request for fetch traces in Kiali time outed (with Tempo) after 30 seconds even though I set a higher timeout in `external_services.tracing.query_timeout: 60`. It looks like the request between Kiali UI <-> Kiali API is timeouted. Can I set also timeout for that?

      E.g.
      when I don't set any external_services.tracing.query_timeout, the default one is used (5s). So, in that case, the connection between Kiali UI -> Kiali API works but a request between Kiali API -> Jaeger API is timeouted.
      When I look in the browser console, there is a request to Kiali API

      GET https://kiali-istio-system.apps.user-rhos-d-2.servicemesh.rhqeaws.com/api/namespaces/mkralik/apps/httpbin/spans?startMicros=1709125095129000
      

      which returns

      error	'Get "http://tempo-sample-query-frontend.tracing-system.svc.cluster.local:16686/api/traces?end=1709128675000000&limit=100&service=httpbin.mkralik&start=1709125095000000": net/http: request canceled (Client.Timeout exceeded while awaiting headers)'
      

      to workaround that, I have set `external_services.tracing.query_timeout: 60` which helps when the request takes lower than 30s. But it doesn't work for requests that take longer.

            [OSSM-6009] Request for fetch traces is timeouted after 30s even though I set a higher timeout in external_services.tracing.query_timeout

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: Red Hat OpenShift Service Mesh Containers for 2.6.0 security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:5094

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: Red Hat OpenShift Service Mesh Containers for 2.6.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:5094

            yes gmonahan, that is a good place for this information. Maybe add there also info that the other timeout must be higher then query_timeout:

            ` The default `query_timeout` is 30. If you set it higher than 30, you need to update `.spec.server.write_timeout` in Kiali CR and add the annotation `haproxy.router.openshift.io/timeout=50s` to the Kiali route. Note that `.spec.server.write_timeout` and `haproxy.router.openshift.io/timeout=` must be higher then `query_timeout`

            Matej Kralik added a comment - yes gmonahan , that is a good place for this information. Maybe add there also info that the other timeout must be higher then query_timeout: ` The default `query_timeout` is 30. If you set it higher than 30, you need to update `.spec.server.write_timeout` in Kiali CR and add the annotation `haproxy.router.openshift.io/timeout=50s` to the Kiali route. Note that `.spec.server.write_timeout` and `haproxy.router.openshift.io/timeout=` must be higher then `query_timeout`

            OK. The "Type" is "Bug" so wanted to check.

            It might make more sense to add a callout to `query_timeout`https://docs.openshift.com/container-platform/4.15/service_mesh/v2x/ossm-observability.html#ossm-configuring-distr-tracing-tempo_observability  so users see it.

            Something like this:

            apiVersion: kiali.io/v1alpha1
            kind: Kiali

            1. ...
              spec:
                external_services:
                  tracing:
                    query_timeout: 30 <1> 
                    enabled: true
                    in_cluster_url: 'http://tempo-sample-query-frontend.tracing-system.svc.cluster.local:16685'
                    url: '[Tempo query frontend Route url]'
                    use_grpc: true <2>

            <1> The default `query_timeout` is 30. If you set it higher than 30, you need to update `.spec.server.write_timeout` and add the annotation `haproxy.router.openshift.io/timeout=50s` to the Kiali route.
            <2>If you are not using the default HTTP or gRPC port for Jaeger or Tempo, replace the `in_cluster_url:` port with your custom port

            Gwynne Monahan added a comment - OK. The "Type" is "Bug" so wanted to check. It might make more sense to add a callout to `query_timeout` https://docs.openshift.com/container-platform/4.15/service_mesh/v2x/ossm-observability.html#ossm-configuring-distr-tracing-tempo_observability   so users see it. Something like this: apiVersion: kiali.io/v1alpha1 kind: Kiali ... spec:   external_services:     tracing:       query_timeout: 30 <1>        enabled: true       in_cluster_url: 'http://tempo-sample-query-frontend.tracing-system.svc.cluster.local:16685'       url: ' [Tempo query frontend Route url] '       use_grpc: true <2> <1> The default `query_timeout` is 30. If you set it higher than 30, you need to update `.spec.server.write_timeout` and add the annotation `haproxy.router.openshift.io/timeout=50s` to the Kiali route. <2>If you are not using the default HTTP or gRPC port for Jaeger or Tempo, replace the `in_cluster_url:` port with your custom port

            I think it would be more accurate to considered this issue as an improvement more than a bug as this is a workaround for when the Tracing service is very slow, there is not really any bugfix in Kiali. What do you think, mkralik@redhat.com?

            Josune Cordoba Torrecilla added a comment - I think it would be more accurate to considered this issue as an improvement more than a bug as this is a workaround for when the Tracing service is very slow, there is not really any bugfix in Kiali. What do you think, mkralik@redhat.com ?

            Hey mkralik@redhat.com ,

            Is this a bug to be included in 2.6 Rel Notes? If so, the "Release Notes Text" field must follow the bug description template: "Previously, <X problem> caused <Y situation>  Now, <fix> resolves the issue  <agent> can <perform operation> successfully."

            Thanks!

            Gwynne Monahan added a comment - Hey mkralik@redhat.com , Is this a bug to be included in 2.6 Rel Notes? If so, the "Release Notes Text" field must follow the bug description template: "Previously, <X problem> caused <Y situation>  Now, <fix> resolves the issue  <agent> can <perform operation> successfully." Thanks!

            This is related to this warning:

            As there is already a warning here, we could add a note to indicate that it could be also possible to increase the server time out.

             

            rhn-engineering-jshaughn sounds good about updating the description in kiali.io. This specific page is updated with a Kiali operator update? I've updated the operator description: https://github.com/kiali/kiali-operator/pull/794 (For what I understand, this is copied for the kiali.io release bot)

            Josune Cordoba Torrecilla added a comment - - edited This is related to this warning: As there is already a warning here, we could add a note to indicate that it could be also possible to increase the server time out.   rhn-engineering-jshaughn sounds good about updating the description in kiali.io. This specific page is updated with a Kiali operator update? I've updated the operator description: https://github.com/kiali/kiali-operator/pull/794 (For what I understand, this is copied for the kiali.io release bot)

            So...does this need to be added to 2.6 Rel Notes as a "Bug fix" or as a "Kiali known issue"? I do agree with Jay that it will get lost in Rel Notes, esp with the way 2.x Rel Notes are structured.

            Also, does the Note for the updated OSSM 2.6 Tempo instructions in this PR, https://github.com/openshift/openshift-docs/pull/77583 , need to be updated? The Kiali Note is line 149.

            Or is this just being documented upstream?

            Gwynne Monahan added a comment - So...does this need to be added to 2.6 Rel Notes as a "Bug fix" or as a "Kiali known issue"? I do agree with Jay that it will get lost in Rel Notes, esp with the way 2.x Rel Notes are structured. Also, does the Note for the updated OSSM 2.6 Tempo instructions in this PR, https://github.com/openshift/openshift-docs/pull/77583 , need to be updated? The Kiali Note is line 149. Or is this just being documented upstream?

            rh-ee-jcordoba , I'm not opposed to adding a Jira release note for this issue.  But relnotes can get lost and are for a targeted audience.  I think it might be nice to add an openshift-specific note here, about the route annotation:

            https://kiali.io/docs/configuration/kialis.kiali.io/#.spec.server.write_timeout

            That way anyone setting that config would know to also update the route.

            Jay Shaughnessy added a comment - rh-ee-jcordoba , I'm not opposed to adding a Jira release note for this issue.  But relnotes can get lost and are for a targeted audience.  I think it might be nice to add an openshift-specific note here, about the route annotation: https://kiali.io/docs/configuration/kialis.kiali.io/#.spec.server.write_timeout That way anyone setting that config would know to also update the route.

            rhn-engineering-jshaughn as this is in OpenShift, maybe it can be included in the Red Hat Kiali documentation as a note?

            Josune Cordoba Torrecilla added a comment - rhn-engineering-jshaughn as this is in OpenShift, maybe it can be included in the Red Hat Kiali documentation as a note?

            Thanks mkralik@redhat.com , that seems like really important information.  We should document this,  rh-ee-jcordoba , where do you think we should put this information?

            Jay Shaughnessy added a comment - Thanks mkralik@redhat.com , that seems like really important information.  We should document this,  rh-ee-jcordoba , where do you think we should put this information?

              rh-ee-jcordoba Josune Cordoba Torrecilla
              mkralik@redhat.com Matej Kralik
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: