[LOG-2922] Failed to execute SearchRequest and high rejected thread pools - Red Hat Issue Tracker

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: Logging 5.4.z
Affects Version/s: None
Component/s: Log Storage
Labels:
- devel_ack-

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

After the 5.4.4 release, multiple clusters in same environment begin getting Failed to execute SearchRequest messages:

2022-08-10T10:42:35.406811400Z [2022-08-10T10:42:35,404][DEBUG][o.e.a.s.TransportSearchAction] [elasticsearch-cdm-isje6t8a-2] [infra-001144][1], node[4Ey6F_9tQOedMTFU8vdfpA], [P], s[STARTED], a[id=qz-dShohRL65LnxY9JsVZw]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[app-000781, app-000782, app-000783, app-000784, app-000780, app-000789, app-000785, app-000786, app-000787, app-000788, audit-001167, audit-001166, app-000772, audit-001165, audit-001164, app-000773, audit-001169, audit-001168, app-000778, infra-001166, app-000779, infra-001167, infra-001168, audit-001163, app-000774, app-000775, audit-001162, app-000776, audit-001161, app-000777, audit-001160, infra-001169, audit-001159, audit-001158, infra-001134, infra-001135, infra-001170, app-000792, audit-001145, app-000793, audit-001144, audit-001143, audit-001142, infra-001136, infra-001137, app-000790, infra-001138, infra-001139, audit-001146, app-000791, infra-001143, infra-001144, infra-001145, audit-001141, infra-001140, audit-001140, infra-001141, infra-001142], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=15, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, source={"size":0,"query":{"range":{"@timestamp":{"from":"now-24h","to":"now","include_lower":true,"include_upper":true,"boost":1.0}}},"aggregations":{"top_namespaces":{"terms":{"field":"kubernetes.namespace_name","size":1000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"namespace_count":{"cardinality":{"field":"kubernetes.namespace_name"}}}}}] lastShard [true]
2022-08-10T10:42:35.406811400Z org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-isje6t8a-3][172.20.8.45:9300][indices:data/read/search[phase/query]]
2022-08-10T10:42:35.406811400Z Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@5661a4e8 on QueueResizingEsThreadPoolExecutor[name = elasticsearch-cdm-isje6t8a-3/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 1.6s, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@3ca2ad2f[Running, pool size = 5, active threads = 5, queued tasks = 1103, completed tasks = 6307660]]
2022-08-10T10:42:35.406811400Z     at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.1.jar:6.8.1.redhat-00020]
2022-08-10T10:42:35.406811400Z     at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?]
2022-08-10T10:42:35.406811400Z     at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355) ~[?:?]

And lots of rejections in the thread pools:

elasticsearch-cdm-isje6t8a-2 search                   5   683    48364
elasticsearch-cdm-isje6t8a-3 search                   5   782    51023
elasticsearch-cdm-isje6t8a-1 search                   5   863    80009

Very high load seen:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.20.6.40             27          97  99   59.34   40.26    37.92 mdi       *      elasticsearch-cdm-isje6t8a-2
172.20.8.45             47          87  99   69.38   39.96    35.58 mdi       -      elasticsearch-cdm-isje6t8a-3
172.20.11.139           25          90 100   13.39   19.14    20.83 mdi       -      elasticsearch-cdm-isje6t8a-1

This behavior seems very similar to https://issues.redhat.com/browse/LOG-1867 which was addressed in https://issues.redhat.com/browse/LOG-1897

It looks like the issue began when the operators auto-updated to 5.4.4.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide