-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
v0.1.0
-
False
-
-
False
-
-
Environment
- Platform: OpenShift cluster (insights.qe.lab.redhat.com)
- Namespace: ros-ocp
- Deployment: insights-on-prem Resource Optimization Service
- Components: Cost Management Operator → ROS Ingress → Kafka → ROS Processor → Kruize
- Version: IOP-POC-0.1
Issue Description
The ROS-OCP processor service fails to process CSV files uploaded by the cost management operator, logging the error "CSV file does not have all the required columns" and rejecting the upload. This prevents recommendations from being generated for data from the cost management operator.
Timeline of Events
18:38:12 UTC - Cost Management Operator Upload:
- Ingress service received upload (request_id: 133e2c40-bb28-4826-b57b-f0ed09ad2a5e)
- 2 CSV files uploaded to ODF S3 storage successfully:
__ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv
__ 83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv - Kafka message published to hccm.ros.events topic successfully
- Processor consumed message from Kafka
_ _ERROR*: Processor rejected files with error message - No recommendations generated or stored in database
Error Logs
Processor Log (ros-ocp-rosocp-processor):
time="2025-10-26T18:38:14Z" level=info msg="Message received from kafka hccm.ros.events[1]@0: {...}" time="2025-10-26T18:38:14Z" level=info msg="DB initialization complete" account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e time="2025-10-26T18:38:14Z" level=error msg="Error: CSV file does not have all the required columns" func=github.com/redhatinsights/ros-ocp-backend/internal/services.ProcessReport file="/go/src/app/internal/services/report_processor.go:80" account=7890123 cluster_uuid=f7340e1e-7392-4e0b-ba1d-03cab55a1bbd org_id=12345 request_id=133e2c40-bb28-4826-b57b-f0ed09ad2a5e
Ingress Log (ros-ocp-ingress):
time="2025-10-26T18:38:12.565Z" level=info msg="Received upload request" request_id="133e2c40-bb28-4826-b57b-f0ed09ad2a5e" time="2025-10-26T18:38:12.640Z" level=info msg="Successfully identified ROS files" ros_files_found=2 time="2025-10-26T18:38:13.901Z" level=info msg="Successfully uploaded ROS file" file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv" size=879 time="2025-10-26T18:38:14.051Z" level=info msg="Successfully uploaded ROS file" file_name="83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv" size=8705 time="2025-10-26T18:38:14.205Z" level=info msg="Successfully sent ROS event message" topic="hccm.ros.events" uploaded_files=2
- Steps to Reproduce
- Deploy insights-on-prem ROS stack on OpenShift cluster
- Deploy and configure cost management operator to send optimization data to ROS ingress endpoint
- Configure cost management operator with valid Keycloak JWT authentication
- Wait for cost management operator to collect metrics and upload data (typically every 15 minutes)
- Monitor processor logs: oc logs -n ros-ocp deploy/ros-ocp-rosocp-processor --follow
- Observe error: "CSV file does not have all the required columns"
- Query database for recommendations: oc exec -n ros-ocp ros-ocp-db-ros-0 – psql -U postgres -d postgres -c "SELECT COUNT
FROM workloads WHERE cluster_uuid='[cluster_uuid]'" - Verify no workloads or recommendations exist for the uploaded cluster
Actual Result
- Processor logs error and rejects upload
- No workloads created in database
- No recommendations generated
- Cost management operator data is completely ignored
Expected Result
- CSV files should be processed successfully
- Workload records should be created in database
- Recommendations should be generated by Kruize
- Recommendations should be stored and available via API
Database Verification
Query executed:
SELECT c.cluster_uuid, w.workload_name, w.namespace, COUNT(rs.*) as rec_count FROM workloads w LEFT JOIN recommendation_sets rs ON w.id = rs.workload_id LEFT JOIN clusters c ON w.cluster_id = c.id GROUP BY c.cluster_uuid, w.workload_name, w.namespace ORDER BY c.cluster_uuid;
Result: 0 recommendations for cluster f7340e1e-7392-4e0b-ba1d-03cab55a1bbd (the cost management operator cluster)
Workaround / Additional Evidence
A test upload with a single CSV file succeeded completely, demonstrating the end-to-end flow works:
18:48:52 UTC - Test Upload (Single File):
- File: openshift_usage_report.csv (37 columns)
- Result: ✓ Successfully processed
- Result: ✓ Sent to Kruize
- Result: ✓ Recommendations generated
_ Result: ✓ _1 recommendation stored in database* for cluster test-cluster-1761504529
This proves:
- The ingress service works correctly
- The Kafka messaging works correctly
- The processor CAN validate and process CSVs with correct format
- Kruize integration works correctly
- The recommendation-poller works correctly
- The database storage works correctly
Technical Details
CSV File Analysis:
- Namespace CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-namespace-202510.6.csv):
__ Contains 25 columns
__ Columns: report_period_start, report_period_end, interval_start, interval_end, namespace, cpu_request_namespace_sum, cpu_limit_namespace_sum, cpu_usage_namespace_avg, memory_usage_namespace_avg, namespace_running_pods_max, etc.
__ Appears to be namespace-level aggregated metrics
- Container CSV (83ca0485-92ec-4c56-9479-2e25e06c4039-ros-openshift-container-202510.5.csv):
__ Contains 37 columns
__ Columns: report_period_start, report_period_end, interval_start, interval_end, container_name, pod, owner_name, owner_kind, workload, workload_type, namespace, image_name, node, resource_id, cpu_request_container_avg, cpu_limit_container_avg, memory_request_container_avg, memory_rss_usage_container_avg, etc.
__ Appears to match the expected schema defined in internal/types/csvColumnMapping.go
Code Reference:
- Error thrown from: ros-ocp-backend/internal/services/report_processor.go:80
- Validation logic: ros-ocp-backend/internal/utils/aggregator.go:182-194
- Expected columns defined: ros-ocp-backend/internal/types/csvColumnMapping.go (37 columns)
Git History:
- Relevant code last modified: February 12, 2024
- Commit: bf619560770a73f2c8870fe9dbe2afc8a97ae6a1
- Author: Suraj Patil
- Issue: RHINENG-8204 - "Check for required columns in CSV"
- This commit added CSV column validation with error handling
Potential Root Cause Analysis
Note: This is a hypothesis that requires confirmation through code review and additional testing.
The cost management operator uploads 2 CSV files per report (namespace-level and container-level), while the test upload used only 1 CSV file (container-level). The processor code at report_processor.go:67-82 contains a loop that processes multiple files:
for _, file := range kafkaMsg.Files { data, err := utils.ReadCSVFromUrl(file) // ... df, err = utils.Aggregate_data(df) if err != nil { log.Errorf("Error: %s", err) return // Exits entire function }
This code potentially:
- Processes the namespace CSV first (which has 25 columns)
- Fails validation because it expects 37 columns
- Executes return, which exits the entire function
- Never processes the container CSV (which has the correct 37 columns)
If this hypothesis is correct, the issue could be that the processor uses return instead of continue on line 81, causing it to stop processing all files when encountering any invalid file, rather than skipping the invalid file and continuing to process remaining valid files.
This would explain why:
- Multi-file uploads (2 CSVs from cost mgmt operator) fail completely
- Single-file uploads (1 CSV from test) succeed
- The container CSV has correct format but never gets processed
_ The error message doesn't indicate _which* file failed validation
Impact
_ _Critical/Blocker*: Complete failure of cost management operator integration
- No optimization recommendations can be generated for production workloads
- Cost management operator data is silently ignored with only an error log entry
- Affects all deployments using cost management operator for ROS data collection