Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2884

LokiRequestErrors alert not Firing on 10 '502' query_range response for 15 mins

    XMLWordPrintable

Details

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, when requests to an unavailable pod was sent through the gateway, no alert would fire to warn of the disruption. With this update, the individual alerts will fire if the gateway has issues completing a write or read request respectively.
      Show
      Before this update, when requests to an unavailable pod was sent through the gateway, no alert would fire to warn of the disruption. With this update, the individual alerts will fire if the gateway has issues completing a write or read request respectively.
    • Log Storage - Sprint 223, Log Storage - Sprint 224, Log Storage - Sprint 225

    Description

      Description:

      LokiRequestErrors alert is not firing after taking down querier in LokiStack unmanaged mode. Need to run query/query_range queries to initiate 5xx errors.

      Steps to Reproduce:
      1) Deploy CLO and Loki Operator
      2) Create LokiStack CR and forward logs to gateway
      3) Edit the LokiStack and go 'unmanaged'
      4) Delete querier deployment (Querier should not be running at this point)

      $ oc get deployments lokistack-dev-querier -n openshift-logging
      Error from server (NotFound): deployments.apps "lokistack-dev-querier" not found
      

      5) Fire some queries

      logcli -o raw --tls-skip-verify --bearer-token="$(oc whoami -t)" --addr "https://lokistack-dev-openshift-logging.apps.kbharti-410-100.qe.devcluster.openshift.com/api/logs/v1/application" query '{log_type="application"}'
      logcli -o raw --tls-skip-verify --bearer-token="$(oc whoami -t)" --addr "https://lokistack-dev-openshift-logging.apps.kbharti-410-100.qe.devcluster.openshift.com/api/logs/v1/audit" query '{log_type="audit"}'
      logcli -o raw --tls-skip-verify --bearer-token="$(oc whoami -t)" --addr "https://lokistack-dev-openshift-logging.apps.kbharti-410-100.qe.devcluster.openshift.com/api/logs/v1/infrastructure" query '{log_type="infrastructure"}'

      Queries would be responded by timeout error.

      Error response from server: <html><body><h1>504 Gateway Time-out</h1>
      The server didn't respond in time.
      </body></html>
       (<nil>) attempts remaining: 0
      Query failed: Run out of attempts while querying the server
      

      Logs on Gateway:

      level=warn name=lokistack-gateway ts=2022-08-04T17:38:53.784246849Z caller=stdlib.go:105 caller=reverseproxy.go:489 msg="http: proxy error: context canceled" level=warn name=lokistack-gateway ts=2022-08-04T17:38:53.784304735Z caller=instrumentation.go:33 request=lokistack-dev-gateway-84996dbb9-6dxwl/2jSiMJWyj5-010955 proto=HTTP/1.1 method=GET status=502 content= path=/api/logs/v1/infrastructure/loki/api/v1/query_range duration=30.001070673s bytes=0 level=warn name=lokistack-gateway ts=2022-08-04T17:41:27.417207863Z caller=stdlib.go:105 caller=reverseproxy.go:489 msg="http: proxy error: context canceled" level=warn name=lokistack-gateway ts=2022-08-04T17:41:27.41727024Z caller=instrumentation.go:33 request=lokistack-dev-gateway-84996dbb9-6dxwl/2jSiMJWyj5-012258 proto=HTTP/1.1 method=GET status=502 content= path=/api/logs/v1/infrastructure/loki/api/v1/query_range duration=30.001531582s bytes=0 level=warn name=lokistack-gateway ts=2022-08-04T22:28:43.749167657Z caller=stdlib.go:105 caller=reverseproxy.go:489 msg="http: proxy error: context canceled" level=warn name=lokistack-gateway ts=2022-08-04T22:28:43.749228224Z caller=instrumentation.go:33 request=lokistack-dev-gateway-84996dbb9-6dxwl/2jSiMJWyj5-186110 proto=HTTP/1.1 method=GET status=502 content= path=/api/logs/v1/infrastructure/loki/api/v1/query_range duration=30.001512461s bytes=0

      OCP Version: 4.10

      How reproducible: Always

      Actual Result:
      Alert is not firing

      Expected Result:
      LokiRequestErrors Alert should be firing when most requests are responded by 5xx error

      Attachments

        Issue Links

          Activity

            People

              gvanloo Gerard Vanloo (Inactive)
              rhn-support-kbharti Kabir Bharti
              Kabir Bharti Kabir Bharti
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: