Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-2821

ROS-OCP Processor fails with "CSV file does not have all the required columns" error when processing cost management operator uploads

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Environment

      • Platform: OpenShift cluster (insights.qe.lab.redhat.com)
      • Namespace: ros-ocp
      • Deployment: insights-on-prem Resource Optimization Service
      • Components: Cost Management Operator → ROS Ingress → Kafka → ROS Processor → Kruize
      • Version: IOP-POC-0.1

      Issue Description

      The ROS-OCP processor service fails to process CSV files uploaded by the cost management operator, logging the error "CSV file does not have all the required columns" and rejecting the upload. This prevents recommendations from being generated for data from the cost management operator.

      Timeline of Events

      18:38:12 UTC - Cost Management Operator Upload:

      • Ingress service received upload (request_id: 133e2c40-bb28-4826-b57b-f0ed09ad2a5e)
      • 2 CSV files uploaded to ODF S3 storage successfully:
        __ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv
        __ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv
      • Kafka message published to hccm.ros.events topic successfully
      • Processor consumed message from Kafka
        _ _ERROR*: Processor rejected files with error message
      • No recommendations generated or stored in database

      Error Logs

      Processor Log (ros-ocp-rosocp-processor):

      time="2025-10-26T18:38:14Z" level=info msg="Message received from kafka hccm.ros.events[1]@0: {...}"
      time="2025-10-26T18:38:14Z" level=info msg="DB initialization complete" 
        account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd 
        org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e
      time="2025-10-26T18:38:14Z" level=error msg="Error: CSV file does not have all the required columns" 
        func=github.com/redhatinsights/ros-ocp-backend/internal/services.ProcessReport 
        file="/go/src/app/internal/services/report_processor.go:80" 
        account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd 
        org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e
      

      Ingress Log (ros-ocp-ingress):

      time="2025-10-26T18:38:12.565Z" level=info msg="Received upload request" 
        request_id="133e2c40-bb28-4826-b57b-f0ed09ad2a5e"
      time="2025-10-26T18:38:12.640Z" level=info msg="Successfully identified ROS files" 
        ros_files_found=2
      time="2025-10-26T18:38:13.901Z" level=info msg="Successfully uploaded ROS file" 
        file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv" 
        size=879
      time="2025-10-26T18:38:14.051Z" level=info msg="Successfully uploaded ROS file" 
        file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv" 
        size=8705
      time="2025-10-26T18:38:14.205Z" level=info msg="Successfully sent ROS event message" 
        topic="hccm.ros.events" uploaded_files=2
      
      • Steps to Reproduce
      • Deploy insights-on-prem ROS stack on OpenShift cluster
      • Deploy and configure cost management operator to send optimization data to ROS ingress endpoint
      • Configure cost management operator with valid Keycloak JWT authentication
      • Wait for cost management operator to collect metrics and upload data (typically every 15 minutes)
      • Monitor processor logs: oc logs -n ros-ocp deploy/ros-ocp-rosocp-processor --follow
      • Observe error: "CSV file does not have all the required columns"
      • Query database for recommendations: oc exec -n ros-ocp ros-ocp-db-ros-0 – psql -U postgres -d postgres -c "SELECT COUNT FROM workloads WHERE cluster_uuid='[cluster_uuid]'"
      • Verify no workloads or recommendations exist for the uploaded cluster

        Actual Result

      • Processor logs error and rejects upload
      • No workloads created in database
      • No recommendations generated
      • Cost management operator data is completely ignored

      Expected Result

      • CSV files should be processed successfully
      • Workload records should be created in database
      • Recommendations should be generated by Kruize
      • Recommendations should be stored and available via API

      Database Verification

      Query executed:

      SELECT c.cluster_uuid, w.workload_name, w.namespace, COUNT(rs.*) as rec_count 
      FROM workloads w 
      LEFT JOIN recommendation_sets rs ON w.id = rs.workload_id 
      LEFT JOIN clusters c ON w.cluster_id = c.id 
      GROUP BY c.cluster_uuid, w.workload_name, w.namespace 
      ORDER BY c.cluster_uuid;
      

      Result: 0 recommendations for cluster f7340e1e-7392-4e0b-ba1d-03cab55a1bbd (the cost management operator cluster)

      Workaround / Additional Evidence

      A test upload with a single CSV file succeeded completely, demonstrating the end-to-end flow works:

      18:48:52 UTC - Test Upload (Single File):

      • File: openshift_usage_report.csv (37 columns)
      • Result: ✓ Successfully processed
      • Result: ✓ Sent to Kruize
      • Result: ✓ Recommendations generated
        _ Result: ✓ _1 recommendation stored in database* for cluster test-cluster-1761504529

      This proves:

      • The ingress service works correctly
      • The Kafka messaging works correctly
      • The processor CAN validate and process CSVs with correct format
      • Kruize integration works correctly
      • The recommendation-poller works correctly
      • The database storage works correctly

      Technical Details

      CSV File Analysis:

      • Namespace CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv):
        __ Contains 25 columns
        __ Columns: report_period_start, report_period_end, interval_start, interval_end, namespace, cpu_request_namespace_sum, cpu_limit_namespace_sum, cpu_usage_namespace_avg, memory_usage_namespace_avg, namespace_running_pods_max, etc.
        __ Appears to be namespace-level aggregated metrics
      • Container CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv):
        __ Contains 37 columns
        __ Columns: report_period_start, report_period_end, interval_start, interval_end, container_name, pod, owner_name, owner_kind, workload, workload_type, namespace, image_name, node, resource_id, cpu_request_container_avg, cpu_limit_container_avg, memory_request_container_avg, memory_rss_usage_container_avg, etc.
        __ Appears to match the expected schema defined in internal/types/csvColumnMapping.go

      Code Reference:

      • Error thrown from: ros-ocp-backend/internal/services/report_processor.go:80
      • Validation logic: ros-ocp-backend/internal/utils/aggregator.go:182-194
      • Expected columns defined: ros-ocp-backend/internal/types/csvColumnMapping.go (37 columns)

      Git History:

      • Relevant code last modified: February 12, 2024
      • Commit: bf619560770a73f2c8870fe9dbe2afc8a97ae6a1
      • Author: Suraj Patil
      • Issue: RHINENG-8204 - "Check for required columns in CSV"
      • This commit added CSV column validation with error handling

      Potential Root Cause Analysis

      Note: This is a hypothesis that requires confirmation through code review and additional testing.

      The cost management operator uploads 2 CSV files per report (namespace-level and container-level), while the test upload used only 1 CSV file (container-level). The processor code at report_processor.go:67-82 contains a loop that processes multiple files:

      for _, file := range kafkaMsg.Files {
          data, err := utils.ReadCSVFromUrl(file)
          // ...
          df, err = utils.Aggregate_data(df)
          if err != nil {
              log.Errorf("Error: %s", err)
              return  // Exits entire function
          }
      

      This code potentially:

      • Processes the namespace CSV first (which has 25 columns)
      • Fails validation because it expects 37 columns
      • Executes return, which exits the entire function
      • Never processes the container CSV (which has the correct 37 columns)

      If this hypothesis is correct, the issue could be that the processor uses return instead of continue on line 81, causing it to stop processing all files when encountering any invalid file, rather than skipping the invalid file and continuing to process remaining valid files.

      This would explain why:

      • Multi-file uploads (2 CSVs from cost mgmt operator) fail completely
      • Single-file uploads (1 CSV from test) succeed
      • The container CSV has correct format but never gets processed
        _ The error message doesn't indicate _which* file failed validation

      Impact

      _ _Critical/Blocker*: Complete failure of cost management operator integration

      • No optimization recommendations can be generated for production workloads
      • Cost management operator data is silently ignored with only an error log entry
      • Affects all deployments using cost management operator for ROS data collection

              jgil@redhat.com Jordi Gil
              chadcrum Chad Crum
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: