-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
5
-
False
-
None
-
False
-
-
A new bug has started popping up in the alert channel; which I have been able to replicate locally by running make delete-test-customer-data command. A snip it of the error from my local flow:
koku-worker-1 | [2024-08-22 14:19:30,160] ERROR be8c8b88-3eac-42e1-b257-088d7a4b3f9a 43 {'message': 'failed trino sql execution', 'tracing_id': '', 'log_ref': 'delete_hive_partitions_by_source for d4d4a361-71a1-4220-88af-4f379bbb5ae4'} koku-worker-1 | Traceback (most recent call last): koku-worker-1 | File "/koku/koku/masu/database/report_db_accessor_base.py", line 147, in _execute_trino_raw_sql_query_with_description koku-worker-1 | results = trino_cur.fetchall() koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/dbapi.py", line 689, in fetchall koku-worker-1 | return list(iter(self.fetchone, None)) koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/dbapi.py", line 631, in fetchone koku-worker-1 | return next(self._iterator) koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/client.py", line 716, in __iter__ koku-worker-1 | next_rows = self._query.fetch() if not self._query.finished else None koku-worker-1 | ^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/client.py", line 839, in fetch koku-worker-1 | status = self._request.process(response) koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/client.py", line 611, in process koku-worker-1 | raise self._process_error(response["error"], response.get("id")) koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/opt/koku/.venv/lib/python3.11/site-packages/trino/client.py", line 581, in _process_error koku-worker-1 | raise exceptions.TrinoExternalError(error, query_id) koku-worker-1 | trino.exceptions.TrinoExternalError: TrinoExternalError(type=EXTERNAL, name=HIVE_METASTORE_ERROR, message="The transaction didn't commit cleanly. All operations other than the following delete operations were completed: drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 27]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 28]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 16]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 17]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 15]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 25]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 29]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 23]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 14]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 10]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 21]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 24]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 19]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 18]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 11]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 12]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 22]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 26]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 13]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 20]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 1]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 2]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 8, 7]; drop partition org1234567.reporting_ocpusagelineitem_daily_summary [d4d4a361-71a1-4220-88af-4f379bbb5ae4, 2024, 7, 16]", query_id=20240822_141701_00871_viktv)
The log ref points us to the delete_hive_partition_by_source which is only called during a source deletion. The first appearance of the issue happened on August 19th, which is the same day we did a release.
Release notes from the August 19th release: