Loading...

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: v0.1.0
Component/s: insights-on-prem, insights-on-prem-qe
Labels:

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Environment

Platform: OpenShift cluster (insights.qe.lab.redhat.com)
Namespace: ros-ocp
Deployment: insights-on-prem Resource Optimization Service
Components: Cost Management Operator → ROS Ingress → Kafka → ROS Processor → Kruize
Version: IOP-POC-0.1

Issue Description

The ROS-OCP processor service fails to process CSV files uploaded by the cost management operator, logging the error "CSV file does not have all the required columns" and rejecting the upload. This prevents recommendations from being generated for data from the cost management operator.

Timeline of Events

18:38:12 UTC - Cost Management Operator Upload:

Ingress service received upload (request_id: 133e2c40-bb28-4826-b57b-f0ed09ad2a5e)
2 CSV files uploaded to ODF S3 storage successfully:
__ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv
__ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv
Kafka message published to hccm.ros.events topic successfully
Processor consumed message from Kafka
_ _ERROR*: Processor rejected files with error message
No recommendations generated or stored in database

Error Logs

Processor Log (ros-ocp-rosocp-processor):

time="2025-10-26T18:38:14Z" level=info msg="Message received from kafka hccm.ros.events[1]@0: {...}"
time="2025-10-26T18:38:14Z" level=info msg="DB initialization complete" 
  account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd 
  org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e
time="2025-10-26T18:38:14Z" level=error msg="Error: CSV file does not have all the required columns" 
  func=github.com/redhatinsights/ros-ocp-backend/internal/services.ProcessReport 
  file="/go/src/app/internal/services/report_processor.go:80" 
  account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd 
  org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e

Ingress Log (ros-ocp-ingress):

time="2025-10-26T18:38:12.565Z" level=info msg="Received upload request" 
  request_id="133e2c40-bb28-4826-b57b-f0ed09ad2a5e"
time="2025-10-26T18:38:12.640Z" level=info msg="Successfully identified ROS files" 
  ros_files_found=2
time="2025-10-26T18:38:13.901Z" level=info msg="Successfully uploaded ROS file" 
  file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv" 
  size=879
time="2025-10-26T18:38:14.051Z" level=info msg="Successfully uploaded ROS file" 
  file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv" 
  size=8705
time="2025-10-26T18:38:14.205Z" level=info msg="Successfully sent ROS event message" 
  topic="hccm.ros.events" uploaded_files=2

Steps to Reproduce
Deploy insights-on-prem ROS stack on OpenShift cluster
Deploy and configure cost management operator to send optimization data to ROS ingress endpoint
Configure cost management operator with valid Keycloak JWT authentication
Wait for cost management operator to collect metrics and upload data (typically every 15 minutes)
Monitor processor logs: oc logs -n ros-ocp deploy/ros-ocp-rosocp-processor --follow
Observe error: "CSV file does not have all the required columns"
Query database for recommendations: oc exec -n ros-ocp ros-ocp-db-ros-0 – psql -U postgres -d postgres -c "SELECT COUNT FROM workloads WHERE cluster_uuid='[cluster_uuid]'"
Verify no workloads or recommendations exist for the uploaded cluster
Actual Result
Processor logs error and rejects upload
No workloads created in database
No recommendations generated
Cost management operator data is completely ignored

Expected Result

CSV files should be processed successfully
Workload records should be created in database
Recommendations should be generated by Kruize
Recommendations should be stored and available via API

Database Verification

Query executed:

SELECT c.cluster_uuid, w.workload_name, w.namespace, COUNT(rs.*) as rec_count 
FROM workloads w 
LEFT JOIN recommendation_sets rs ON w.id = rs.workload_id 
LEFT JOIN clusters c ON w.cluster_id = c.id 
GROUP BY c.cluster_uuid, w.workload_name, w.namespace 
ORDER BY c.cluster_uuid;

Result: 0 recommendations for cluster f7340e1e-7392-4e0b-ba1d-03cab55a1bbd (the cost management operator cluster)

Workaround / Additional Evidence

A test upload with a single CSV file succeeded completely, demonstrating the end-to-end flow works:

18:48:52 UTC - Test Upload (Single File):

File: openshift_usage_report.csv (37 columns)
Result: ✓ Successfully processed
Result: ✓ Sent to Kruize
Result: ✓ Recommendations generated
_ Result: ✓ _1 recommendation stored in database* for cluster test-cluster-1761504529

This proves:

The ingress service works correctly
The Kafka messaging works correctly
The processor CAN validate and process CSVs with correct format
Kruize integration works correctly
The recommendation-poller works correctly
The database storage works correctly

Technical Details

CSV File Analysis:

Namespace CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv):
__ Contains 25 columns
__ Columns: report_period_start, report_period_end, interval_start, interval_end, namespace, cpu_request_namespace_sum, cpu_limit_namespace_sum, cpu_usage_namespace_avg, memory_usage_namespace_avg, namespace_running_pods_max, etc.
__ Appears to be namespace-level aggregated metrics

Container CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv):
__ Contains 37 columns
__ Columns: report_period_start, report_period_end, interval_start, interval_end, container_name, pod, owner_name, owner_kind, workload, workload_type, namespace, image_name, node, resource_id, cpu_request_container_avg, cpu_limit_container_avg, memory_request_container_avg, memory_rss_usage_container_avg, etc.
__ Appears to match the expected schema defined in internal/types/csvColumnMapping.go

Code Reference:

Error thrown from: ros-ocp-backend/internal/services/report_processor.go:80
Validation logic: ros-ocp-backend/internal/utils/aggregator.go:182-194
Expected columns defined: ros-ocp-backend/internal/types/csvColumnMapping.go (37 columns)

Git History:

Relevant code last modified: February 12, 2024
Commit: bf619560770a73f2c8870fe9dbe2afc8a97ae6a1
Author: Suraj Patil
Issue: RHINENG-8204 - "Check for required columns in CSV"
This commit added CSV column validation with error handling

Potential Root Cause Analysis

Note: This is a hypothesis that requires confirmation through code review and additional testing.

The cost management operator uploads 2 CSV files per report (namespace-level and container-level), while the test upload used only 1 CSV file (container-level). The processor code at report_processor.go:67-82 contains a loop that processes multiple files:

for _, file := range kafkaMsg.Files {
    data, err := utils.ReadCSVFromUrl(file)
    // ...
    df, err = utils.Aggregate_data(df)
    if err != nil {
        log.Errorf("Error: %s", err)
        return  // Exits entire function
    }

This code potentially:

Processes the namespace CSV first (which has 25 columns)
Fails validation because it expects 37 columns
Executes return, which exits the entire function
Never processes the container CSV (which has the correct 37 columns)

If this hypothesis is correct, the issue could be that the processor uses return instead of continue on line 81, causing it to stop processing all files when encountering any invalid file, rather than skipping the invalid file and continuing to process remaining valid files.

This would explain why:

Multi-file uploads (2 CSVs from cost mgmt operator) fail completely
Single-file uploads (1 CSV from test) succeed
The container CSV has correct format but never gets processed
_ The error message doesn't indicate _which* file failed validation

Impact

_ _Critical/Blocker*: Complete failure of cost management operator integration

No optimization recommendations can be generated for production workloads
Cost management operator data is silently ignored with only an error log entry
Affects all deployments using cost management operator for ROS data collection

Details

Description

Environment

Issue Description

Timeline of Events

Error Logs

Actual Result

Expected Result

Database Verification

Workaround / Additional Evidence

Technical Details

Potential Root Cause Analysis

Impact

Attachments

Easy Agile Planning Poker

Activity

People

Dates