Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-1867

OCP 4.7 ES failing with Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Undefined
    • None
    • None
    • Log Storage
    • None
    • False
    • False
    • NEW
    • NEW

    Description

      ES fails to ingest logs from Fluentd

      ~~~
      2021-09-29T17:25:42.313046562Z at java.lang.Thread.run(Thread.java:829) ~[?:?]
      2021-09-29T17:25:43.981581201Z [2021-09-29T17:25:43,981][INFO ][o.e.m.j.JvmGcMonitorService] [gc][16008] overhead, spent [383ms] collecting in the last [1s]
      2021-09-29T17:25:48.846574017Z [2021-09-29T17:25:48,845][DEBUG][o.e.a.s.TransportSearchAction] [app-000172][1], node[EWwKdOjYSlWlE0RugF1AnQ], [P], s[STARTED], a[id=aFnyBYUxT6ec9YnhcQpJYQ]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[audit-000009, audit-000008, audit-000007, audit-000006, infra-000018, infra-000019, audit-000005, infra-000014, infra-000015, infra-000016, infra-000017, infra-000021, infra-000022, infra-000023, infra-000024, infra-000020, app-000170, app-000171, app-000176, app-000172, app-000173, app-000174, app-000175, app-000160, app-000165, audit-000023, app-000166, audit-000022, app-000167, audit-000021, app-000168, audit-000020, app-000161, app-000162, app-000163, app-000164, audit-000024, app-000169, audit-000019, audit-000018, audit-000017, infra-000007, audit-000012, infra-000008, audit-000011, infra-000009, audit-000010, app-000157, audit-000016, audit-000015, audit-000014, infra-000005, audit-000013, infra-000006, infra-000010, infra-000011, infra-000012, infra-000013, app-000158, app-000159], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=false, scroll=null, maxConcurrentShardRequests=25, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, source={"size":0,"query":{"range":{"@timestamp":

      {"from":"now-24h","to":"now","include_lower":true,"include_upper":true,"boost":1.0}

      }},"aggregations":{"Histogram":{"date_histogram":{"field":"@timestamp","interval":"hour","offset":0,"order":

      {"_key":"asc"}

      ,"keyed":false,"min_doc_count":0},"aggregations":{"top_namespaces":{"terms":{"field":"kubernetes.namespace_name","size":1000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[

      {"_count":"desc"}

      ,{"_key":"asc"}]}},"namespace_count":{"cardinality":

      {"field":"kubernetes.namespace_name"}

      }}}}}}] lastShard [true]
      2021-09-29T17:25:48.846574017Z org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cd-yw22zn1v-1][10.131.8.111:9300][indices:data/read/search[phase/query]]
      2021-09-29T17:25:48.846574017Z Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@21f9aa10 on QueueResizingEsThreadPoolExecutor[name = elasticsearch-cd-yw22zn1v-1/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 554.1ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@7052fb9[Running, pool size = 2, active threads = 2, queued tasks = 1298, completed tasks = 245348]]
      2021-09-29T17:25:48.846574017Z at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.1.jar:6.8.1.redhat-00007]
      ~~~

      ~~~> cat es/cluster-elasticsearch/healthepoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
      1632936727 17:32:07 elasticsearch green 5 5 620 310 0 0 0 0 - 100.0%
      ~~~

      ~~~
      logStore:
      elasticsearch:
      nodeCount: 5
      proxy:
      resources:
      limits:
      memory: 256Mi
      requests:
      memory: 256Mi
      redundancyPolicy: SingleRedundancy
      resources:
      limits:
      memory: 36G
      requests:
      memory: 36G
      storage:
      size: 500G
      storageClassName: perf-no-snap
      ~~~

      Fluentd Queue
      ~~~
      $ for i in $(oc get pods -l component=fluentd | awk '/fluentd/ { print $1 }') ; do oc exec $i – du -sh /var/lib/fluentd ; done

      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-2kwkc -n openshift-logging' to see all of the containers in this pod.
      40G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-57htx -n openshift-logging' to see all of the containers in this pod.
      25G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-7gc7c -n openshift-logging' to see all of the containers in this pod.
      17G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-8kbh2 -n openshift-logging' to see all of the containers in this pod.
      17G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-bqp2g -n openshift-logging' to see all of the containers in this pod.
      40G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-c94br -n openshift-logging' to see all of the containers in this pod.
      23G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-hstqg -n openshift-logging' to see all of the containers in this pod.
      21G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-j4rcx -n openshift-logging' to see all of the containers in this pod.
      23G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-jxhq4 -n openshift-logging' to see all of the containers in this pod.
      15G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-mmtdr -n openshift-logging' to see all of the containers in this pod.
      20G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-mql67 -n openshift-logging' to see all of the containers in this pod.
      23G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-ph4vz -n openshift-logging' to see all of the containers in this pod.
      22G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-rqcp2 -n openshift-logging' to see all of the containers in this pod.
      21G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-vknsb -n openshift-logging' to see all of the containers in this pod.
      16G /var/lib/fluentd
      Defaulting container name to fluentd.
      Use 'oc describe pod/fluentd-wqvqh -n openshift-logging' to see all of the containers in this pod.
      22G /var/lib/fluentd
      ~~~

      Version-Release number of selected component (if applicable):

      installedCSV: cluster-logging.5.2.1-5
      installedCSV: elasticsearch-operator.5.2.1-5

      OCP 4.7

      How reproducible:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Fluentd queues are huge due to ES not ingesting the logs as per the error above. And Kibana does not show logs.

      Expected results:
      Kibana to show logs

      Additional info:

      Thread pool
      ~~~
      node_name name active queue rejected
      elasticsearch-cdm-o72himb8-2 search 2 206 103124
      elasticsearch-cd-yw22zn1v-2 search 2 1155 85794
      elasticsearch-cdm-o72himb8-3 search 2 1056 520289
      ~~~

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhn-support-hgomes Hevellyn Gomes
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: