-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Sentry Issue:
Bread Crumbs:
trino.exceptions.TrinoExternalError: TrinoExternalError(type=EXTERNAL, name=HIVE_CANNOT_OPEN_SPLIT, message="Error opening Hive split s3a://hccm-prod-s3/data/parquet/daily/7281533/AWS/raw/source=065de94b-66ff-4f38-9e76-2083f82af801/year=2023/month=08/2023-08-17_29_4_daily_0.parquet (offset=0, length=41426): Incorrect file size (41426) for file (end of stream not reached): s3a://hccm-prod-s3/data/parquet/daily/7281533/AWS/raw/source=065de94b-66ff-4f38-9e76-2083f82af801/year=2023/month=08/2023-08-17_29_4_daily_0.parquet", query_id=20230818_180748_86270_dwhwd)
Initial Research: Amazon Docs
Key Point:
"""
This message can occur when a file has changed between query planning and query execution. It usually occurs when a file on Amazon S3 is replaced in-place (for example, a PUT is performed on a key where an object already exists). Athena does not support deleting or replacing the contents of a file when a query is running. To avoid this error, schedule jobs that overwrite or delete files at times when queries do not run, or only write data to new files or partitions.
"""
My findings:
This occurs when attempt to replace a file while a query is executing. I do think this means that the file wasn't replaced though.
Additional Notes
It looks to have started on August 18th, but the errors we saw today are all related to the same account. The timing could indicate it is daily archive related; however, I would suspect it would be a bigger problem than one account. Could also just be a timing issue. Further investigation is needed, and seeing if this error keeps popping up.