-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
1
-
False
-
None
-
False
-
-
When we check required columns in the Azure customer filtered flow, we expect only column names specified in INGRESS_REQUIRED_COLUMNS (v1 reports) or INGRESS_ALT_COLUMNS (v2 reports)
def check_ingress_required_columns(self, col_names): """ Checks the required columns for ingress. """ if not set(col_names).issuperset(INGRESS_REQUIRED_COLUMNS): if not set(col_names).issuperset(INGRESS_ALT_COLUMNS): missing_columns = [x for x in INGRESS_REQUIRED_COLUMNS if x not in col_names] return missing_columns return None
However, different versions of Azure v2 exports can have different column names and can also have first letter capitalized or not (BillingAccountId vs. billingAccountId). Consequently check_ingress_required_columns steps fail with the following error (note that the error msg always show diff to INGRESS_REQUIRED_COLUMNS, so it is not very informative as we needs to see missing INGRESS_ALT_COLUMNS (you can change it in the code for debugging):
WARNING 94011ada-1805-45a3-a3c6-53aad9529a13 1333 {'message': 'could not write parquet to temp file', 'tracing_id': '0ff56174-443f-4047-9c84-f6c0d1a7cc67', 'account': 'org1234567', 'provider_uuid': '0d430bde-19d1-46ef-82e6-484bf8960db3', 'provider_type': 'Azure', 'file_name': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-08-01_manifestid-17_basefile-costreport_1f0e71d6-cc81-458d-8ee6-a32a310cd13b_batch-0.csv')} koku-worker-1 | Traceback (most recent call last): koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet koku-worker-1 | self.check_required_columns_for_ingress_reports(col_names) koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports koku-worker-1 | raise ValidationError(message, code="Missing_columns") koku-worker-1 | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['MeterSubcategory', 'Currency', 'PreTaxCost', 'ResourceGroup', 'UsageQuantity', 'ResourceType', 'ResourceRate', 'UsageDateTime', 'InstanceId', 'ServiceTier', 'SubscriptionGuid', 'ServiceName].", code='Missing_columns')]
We need to allow additional alternative column names (see below) in check_ingress_required_columns ** and also make the check case insensitive
additional v2 alternative: INGRESS_ALT_COLUMNS (both lowercased) ------------------------------------------------------------------- "subscriptionguid": "subscriptionid" "billingcurrency": "billingcurrencycode" "resourcegroupname": "resourcegroup" "instancename": "resourceid" "product": "productname"
Steps to reproduce:
- make sure you use the latest Nise version (3.6.5) - this version is used by the latest iqe plugin, so if you use it, you should be fine. Note that you may need to pip install -e . latest plugin in local env to update Nise version)
- for local env, set local_sources: False in iqe_cost_management/conf/cost_management.default.yaml
- run test_api_azure_hcs_customer_filtered_report with enabled vault (if you have issues with vault access, you can run the test in eph env, where the vault is enabled by default)
DYNACONF_IQE_VAULT_LOADER_ENABLED=true ENV_FOR_DYNACONF=local iqe tests plugin cost_management -k test_api_azure_hcs_customer -vv --pdb
- once the test fails, check koku logs - you will see error similar to this
koku-worker-1 | Traceback (most recent call last): koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet koku-worker-1 | self.check_required_columns_for_ingress_reports(col_names) koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports koku-worker-1 | raise ValidationError(message, code="Missing_columns") koku-worker-1 | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['billingcurrencycode'].", code='Missing_columns')] koku-worker-1 | [2024-08-02 08:47:54,080] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'account': 'org1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'failed_file': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-07-01_manifestid-20_basefile-costreport_ee06a580-21a5-45cd-9089-e1b92f13e577_batch-0.csv')} koku-worker-1 | [2024-08-02 08:47:54,128] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Report processing error: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None} koku-worker-1 | [2024-08-02 08:47:55,154] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 Unable to get celery inspect instance. koku-worker-1 | [2024-08-02 08:47:55,154] INFO 4fab4274-3421-49ed-864d-f739a493c67b 1545 Removing old worker: koku-worker-1 koku-worker-1 | [2024-08-02 08:47:55,179] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Unknown downloader exception: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None} koku-worker-1 | Traceback (most recent call last): koku-worker-1 | File "/koku/koku/masu/processor/report_processor.py", line 119, in process koku-worker-1 | parquet_base_filename, daily_data_frames = self._processor.process() koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 705, in process koku-worker-1 | parquet_base_filename, daily_data_frames = self.convert_to_parquet() koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 505, in convert_to_parquet koku-worker-1 | raise ParquetReportProcessorError(msg) koku-worker-1 | masu.processor.parquet.parquet_report_processor.ParquetReportProcessorError: failed to convert files to parquet koku-worker-1 | koku-worker-1 | The above exception was the direct cause of the following exception: koku-worker-1 | koku-worker-1 | Traceback (most recent call last): koku-worker-1 | File "/koku/koku/masu/processor/tasks.py", line 268, in get_report_files koku-worker-1 | raise processing_error koku-worker-1 | File "/koku/koku/masu/processor/tasks.py", line 257, in get_report_files koku-worker-1 | result = _process_report_file( koku-worker-1 | ^^^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/koku/koku/masu/processor/_tasks/process.py", line 80, in _process_report_file koku-worker-1 | raise processing_error koku-worker-1 | File "/koku/koku/masu/processor/_tasks/process.py", line 75, in _process_report_file koku-worker-1 | result = processor.process() koku-worker-1 | ^^^^^^^^^^^^^^^^^^^ koku-worker-1 | File "/koku/koku/masu/processor/report_processor.py", line 137, in process koku-worker-1 | raise ReportProcessorError(f"Unknown processor error: {err}") from err koku-worker-1 | masu.processor.report_processor.ReportProcessorError: Unknown processor error: failed to convert files to parquet