Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 2024-Aug-23
Affects Version/s: None
Component/s: Data Pipeline
Labels:
None

Story Points:
1
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

When we check required columns in the Azure customer filtered flow, we expect only column names specified in INGRESS_REQUIRED_COLUMNS (v1 reports) or INGRESS_ALT_COLUMNS (v2 reports)

def check_ingress_required_columns(self, col_names):
    """
    Checks the required columns for ingress.
    """
    if not set(col_names).issuperset(INGRESS_REQUIRED_COLUMNS):
        if not set(col_names).issuperset(INGRESS_ALT_COLUMNS):
            missing_columns = [x for x in INGRESS_REQUIRED_COLUMNS if x not in col_names]
            return missing_columns
    return None

However, different versions of Azure v2 exports can have different column names and can also have first letter capitalized or not (BillingAccountId vs. billingAccountId). Consequently check_ingress_required_columns steps fail with the following error (note that the error msg always show diff to INGRESS_REQUIRED_COLUMNS, so it is not very informative as we needs to see missing INGRESS_ALT_COLUMNS (you can change it in the code for debugging):

WARNING 94011ada-1805-45a3-a3c6-53aad9529a13 1333 {'message': 'could not write parquet to temp file', 'tracing_id': '0ff56174-443f-4047-9c84-f6c0d1a7cc67', 'account': 'org1234567', 'provider_uuid': '0d430bde-19d1-46ef-82e6-484bf8960db3', 'provider_type': 'Azure', 'file_name': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-08-01_manifestid-17_basefile-costreport_1f0e71d6-cc81-458d-8ee6-a32a310cd13b_batch-0.csv')}
koku-worker-1  | Traceback (most recent call last):
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet
koku-worker-1  |     self.check_required_columns_for_ingress_reports(col_names)
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports
koku-worker-1  |     raise ValidationError(message, code="Missing_columns")
koku-worker-1  | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['MeterSubcategory', 'Currency', 'PreTaxCost', 'ResourceGroup', 'UsageQuantity', 'ResourceType', 'ResourceRate', 'UsageDateTime', 'InstanceId', 'ServiceTier', 'SubscriptionGuid', 'ServiceName].", code='Missing_columns')]

We need to allow additional alternative column names (see below) in check_ingress_required_columns ** and also make the check case insensitive

additional v2 alternative: INGRESS_ALT_COLUMNS (both lowercased)
------------------------------------------------------------------- 
"subscriptionguid": "subscriptionid"
"billingcurrency": "billingcurrencycode"
"resourcegroupname": "resourcegroup"
"instancename": "resourceid"
"product": "productname"

Steps to reproduce:

make sure you use the latest Nise version (3.6.5) - this version is used by the latest iqe plugin, so if you use it, you should be fine. Note that you may need to pip install -e . latest plugin in local env to update Nise version)
for local env, set local_sources: False in iqe_cost_management/conf/cost_management.default.yaml
run test_api_azure_hcs_customer_filtered_report with enabled vault (if you have issues with vault access, you can run the test in eph env, where the vault is enabled by default)

DYNACONF_IQE_VAULT_LOADER_ENABLED=true ENV_FOR_DYNACONF=local iqe tests plugin cost_management -k test_api_azure_hcs_customer -vv --pdb

once the test fails, check koku logs - you will see error similar to this

 koku-worker-1  | Traceback (most recent call last):
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet
koku-worker-1  |     self.check_required_columns_for_ingress_reports(col_names)
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports
koku-worker-1  |     raise ValidationError(message, code="Missing_columns")
koku-worker-1  | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['billingcurrencycode'].", code='Missing_columns')]
koku-worker-1  | [2024-08-02 08:47:54,080] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'account': 'org1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'failed_file': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-07-01_manifestid-20_basefile-costreport_ee06a580-21a5-45cd-9089-e1b92f13e577_batch-0.csv')}
koku-worker-1  | [2024-08-02 08:47:54,128] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Report processing error: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None}
koku-worker-1  | [2024-08-02 08:47:55,154] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 Unable to get celery inspect instance.
koku-worker-1  | [2024-08-02 08:47:55,154] INFO 4fab4274-3421-49ed-864d-f739a493c67b 1545 Removing old worker: koku-worker-1
koku-worker-1  | [2024-08-02 08:47:55,179] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Unknown downloader exception: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None}
koku-worker-1  | Traceback (most recent call last):
koku-worker-1  |   File "/koku/koku/masu/processor/report_processor.py", line 119, in process
koku-worker-1  |     parquet_base_filename, daily_data_frames = self._processor.process()
koku-worker-1  |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 705, in process
koku-worker-1  |     parquet_base_filename, daily_data_frames = self.convert_to_parquet()
koku-worker-1  |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 505, in convert_to_parquet
koku-worker-1  |     raise ParquetReportProcessorError(msg)
koku-worker-1  | masu.processor.parquet.parquet_report_processor.ParquetReportProcessorError: failed to convert files to parquet
koku-worker-1  | 
koku-worker-1  | The above exception was the direct cause of the following exception:
koku-worker-1  | 
koku-worker-1  | Traceback (most recent call last):
koku-worker-1  |   File "/koku/koku/masu/processor/tasks.py", line 268, in get_report_files
koku-worker-1  |     raise processing_error
koku-worker-1  |   File "/koku/koku/masu/processor/tasks.py", line 257, in get_report_files
koku-worker-1  |     result = _process_report_file(
koku-worker-1  |              ^^^^^^^^^^^^^^^^^^^^^
koku-worker-1  |   File "/koku/koku/masu/processor/_tasks/process.py", line 80, in _process_report_file
koku-worker-1  |     raise processing_error
koku-worker-1  |   File "/koku/koku/masu/processor/_tasks/process.py", line 75, in _process_report_file
koku-worker-1  |     result = processor.process()
koku-worker-1  |              ^^^^^^^^^^^^^^^^^^^
koku-worker-1  |   File "/koku/koku/masu/processor/report_processor.py", line 137, in process
koku-worker-1  |     raise ReportProcessorError(f"Unknown processor error: {err}") from err
koku-worker-1  | masu.processor.report_processor.ReportProcessorError: Unknown processor error: failed to convert files to parquet

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates