-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
False
-
None
-
False
-
NEW
-
NEW
After the 5.4.4 release, multiple clusters in same environment begin getting Failed to execute SearchRequest messages:
2022-08-10T10:42:35.406811400Z [2022-08-10T10:42:35,404][DEBUG][o.e.a.s.TransportSearchAction] [elasticsearch-cdm-isje6t8a-2] [infra-001144][1], node[4Ey6F_9tQOedMTFU8vdfpA], [P], s[STARTED], a[id=qz-dShohRL65LnxY9JsVZw]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[app-000781, app-000782, app-000783, app-000784, app-000780, app-000789, app-000785, app-000786, app-000787, app-000788, audit-001167, audit-001166, app-000772, audit-001165, audit-001164, app-000773, audit-001169, audit-001168, app-000778, infra-001166, app-000779, infra-001167, infra-001168, audit-001163, app-000774, app-000775, audit-001162, app-000776, audit-001161, app-000777, audit-001160, infra-001169, audit-001159, audit-001158, infra-001134, infra-001135, infra-001170, app-000792, audit-001145, app-000793, audit-001144, audit-001143, audit-001142, infra-001136, infra-001137, app-000790, infra-001138, infra-001139, audit-001146, app-000791, infra-001143, infra-001144, infra-001145, audit-001141, infra-001140, audit-001140, infra-001141, infra-001142], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=15, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, source={"size":0,"query":{"range":{"@timestamp":{"from":"now-24h","to":"now","include_lower":true,"include_upper":true,"boost":1.0}}},"aggregations":{"top_namespaces":{"terms":{"field":"kubernetes.namespace_name","size":1000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"namespace_count":{"cardinality":{"field":"kubernetes.namespace_name"}}}}}] lastShard [true] 2022-08-10T10:42:35.406811400Z org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-isje6t8a-3][172.20.8.45:9300][indices:data/read/search[phase/query]] 2022-08-10T10:42:35.406811400Z Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@5661a4e8 on QueueResizingEsThreadPoolExecutor[name = elasticsearch-cdm-isje6t8a-3/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 1.6s, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@3ca2ad2f[Running, pool size = 5, active threads = 5, queued tasks = 1103, completed tasks = 6307660]] 2022-08-10T10:42:35.406811400Z at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.1.jar:6.8.1.redhat-00020] 2022-08-10T10:42:35.406811400Z at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?] 2022-08-10T10:42:35.406811400Z at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355) ~[?:?]
And lots of rejections in the thread pools:
elasticsearch-cdm-isje6t8a-2 search 5 683 48364 elasticsearch-cdm-isje6t8a-3 search 5 782 51023 elasticsearch-cdm-isje6t8a-1 search 5 863 80009
Very high load seen:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 172.20.6.40 27 97 99 59.34 40.26 37.92 mdi * elasticsearch-cdm-isje6t8a-2 172.20.8.45 47 87 99 69.38 39.96 35.58 mdi - elasticsearch-cdm-isje6t8a-3 172.20.11.139 25 90 100 13.39 19.14 20.83 mdi - elasticsearch-cdm-isje6t8a-1
This behavior seems very similar to https://issues.redhat.com/browse/LOG-1867 which was addressed in https://issues.redhat.com/browse/LOG-1897
It looks like the issue began when the operators auto-updated to 5.4.4.