Uploaded image for project: 'Cost Management'
  1. Cost Management
  2. COST-5367

Azure customer filtered flow doesn't handle all possible v2 column alternatives

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • 2024-Aug-23
    • None
    • Data Pipeline
    • None
    • 1
    • False
    • None
    • False

      When we check required columns in the Azure customer filtered flow, we expect only column names specified in INGRESS_REQUIRED_COLUMNS (v1 reports) or INGRESS_ALT_COLUMNS (v2 reports)

      def check_ingress_required_columns(self, col_names):
          """
          Checks the required columns for ingress.
          """
          if not set(col_names).issuperset(INGRESS_REQUIRED_COLUMNS):
              if not set(col_names).issuperset(INGRESS_ALT_COLUMNS):
                  missing_columns = [x for x in INGRESS_REQUIRED_COLUMNS if x not in col_names]
                  return missing_columns
          return None

      However, different versions of Azure v2 exports can have different column names and can also have first letter capitalized or not (BillingAccountId vs. billingAccountId). Consequently check_ingress_required_columns steps fail with the following error (note that the error msg always show diff to INGRESS_REQUIRED_COLUMNS, so it is not very informative as we needs to see missing INGRESS_ALT_COLUMNS (you can change it in the code for debugging):

      WARNING 94011ada-1805-45a3-a3c6-53aad9529a13 1333 {'message': 'could not write parquet to temp file', 'tracing_id': '0ff56174-443f-4047-9c84-f6c0d1a7cc67', 'account': 'org1234567', 'provider_uuid': '0d430bde-19d1-46ef-82e6-484bf8960db3', 'provider_type': 'Azure', 'file_name': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-08-01_manifestid-17_basefile-costreport_1f0e71d6-cc81-458d-8ee6-a32a310cd13b_batch-0.csv')}
      koku-worker-1  | Traceback (most recent call last):
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet
      koku-worker-1  |     self.check_required_columns_for_ingress_reports(col_names)
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports
      koku-worker-1  |     raise ValidationError(message, code="Missing_columns")
      koku-worker-1  | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['MeterSubcategory', 'Currency', 'PreTaxCost', 'ResourceGroup', 'UsageQuantity', 'ResourceType', 'ResourceRate', 'UsageDateTime', 'InstanceId', 'ServiceTier', 'SubscriptionGuid', 'ServiceName].", code='Missing_columns')]
      

       

      We need to allow additional alternative column names (see below) in check_ingress_required_columns ** and also make the check case insensitive

       

      additional v2 alternative: INGRESS_ALT_COLUMNS (both lowercased)
      ------------------------------------------------------------------- 
      "subscriptionguid": "subscriptionid"
      "billingcurrency": "billingcurrencycode"
      "resourcegroupname": "resourcegroup"
      "instancename": "resourceid"
      "product": "productname"

       

      Steps to reproduce:

      • make sure you use the latest Nise version (3.6.5) - this version is used by the latest iqe plugin, so if you use it, you should be fine. Note that you may need to pip install -e . latest plugin in local env to update Nise version)
      • for local env, set  local_sources: False in iqe_cost_management/conf/cost_management.default.yaml
      • run test_api_azure_hcs_customer_filtered_report with enabled vault (if you have issues with vault access, you can run the test in eph env, where the vault is enabled by default)
      DYNACONF_IQE_VAULT_LOADER_ENABLED=true ENV_FOR_DYNACONF=local iqe tests plugin cost_management -k test_api_azure_hcs_customer -vv --pdb
      
      • once the test fails, check koku logs - you will see error similar to this
       koku-worker-1  | Traceback (most recent call last):
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 549, in convert_csv_to_parquet
      koku-worker-1  |     self.check_required_columns_for_ingress_reports(col_names)
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 531, in check_required_columns_for_ingress_reports
      koku-worker-1  |     raise ValidationError(message, code="Missing_columns")
      koku-worker-1  | rest_framework.exceptions.ValidationError: [ErrorDetail(string="Unable to process file(s) due to missing required columns: ['billingcurrencycode'].", code='Missing_columns')]
      koku-worker-1  | [2024-08-02 08:47:54,080] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'account': 'org1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'failed_file': PosixPath('/testing/data/processing/org1234567/azure/exports/2024-07-01_manifestid-20_basefile-costreport_ee06a580-21a5-45cd-9089-e1b92f13e577_batch-0.csv')}
      koku-worker-1  | [2024-08-02 08:47:54,128] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Report processing error: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None}
      koku-worker-1  | [2024-08-02 08:47:55,154] WARNING 4fab4274-3421-49ed-864d-f739a493c67b 1545 Unable to get celery inspect instance.
      koku-worker-1  | [2024-08-02 08:47:55,154] INFO 4fab4274-3421-49ed-864d-f739a493c67b 1545 Removing old worker: koku-worker-1
      koku-worker-1  | [2024-08-02 08:47:55,179] ERROR 4fab4274-3421-49ed-864d-f739a493c67b 1545 {'message': 'Unknown downloader exception: Unknown processor error: failed to convert files to parquet', 'tracing_id': 'bf6815e7-9e14-4a86-9ca0-11b82765807e', 'schema': 'org1234567', 'org_id': '1234567', 'provider_uuid': '0bd62547-8d70-464b-8e93-e699c28f5b86', 'provider_type': 'Azure', 'file': '/testing/data/processing/org1234567/azure/exports/costreport_ee06a580-21a5-45cd-9089-e1b92f13e577.csv', 'invoice_month': None}
      koku-worker-1  | Traceback (most recent call last):
      koku-worker-1  |   File "/koku/koku/masu/processor/report_processor.py", line 119, in process
      koku-worker-1  |     parquet_base_filename, daily_data_frames = self._processor.process()
      koku-worker-1  |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 705, in process
      koku-worker-1  |     parquet_base_filename, daily_data_frames = self.convert_to_parquet()
      koku-worker-1  |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
      koku-worker-1  |   File "/koku/koku/masu/processor/parquet/parquet_report_processor.py", line 505, in convert_to_parquet
      koku-worker-1  |     raise ParquetReportProcessorError(msg)
      koku-worker-1  | masu.processor.parquet.parquet_report_processor.ParquetReportProcessorError: failed to convert files to parquet
      koku-worker-1  | 
      koku-worker-1  | The above exception was the direct cause of the following exception:
      koku-worker-1  | 
      koku-worker-1  | Traceback (most recent call last):
      koku-worker-1  |   File "/koku/koku/masu/processor/tasks.py", line 268, in get_report_files
      koku-worker-1  |     raise processing_error
      koku-worker-1  |   File "/koku/koku/masu/processor/tasks.py", line 257, in get_report_files
      koku-worker-1  |     result = _process_report_file(
      koku-worker-1  |              ^^^^^^^^^^^^^^^^^^^^^
      koku-worker-1  |   File "/koku/koku/masu/processor/_tasks/process.py", line 80, in _process_report_file
      koku-worker-1  |     raise processing_error
      koku-worker-1  |   File "/koku/koku/masu/processor/_tasks/process.py", line 75, in _process_report_file
      koku-worker-1  |     result = processor.process()
      koku-worker-1  |              ^^^^^^^^^^^^^^^^^^^
      koku-worker-1  |   File "/koku/koku/masu/processor/report_processor.py", line 137, in process
      koku-worker-1  |     raise ReportProcessorError(f"Unknown processor error: {err}") from err
      koku-worker-1  | masu.processor.report_processor.ReportProcessorError: Unknown processor error: failed to convert files to parquet
      

       

              rhn-support-lcouzens Luke Couzens
              rhn-support-esebesto Eva Šebestová
              Eva Šebestová Eva Šebestová
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: