Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: v0.2.1
Affects Version/s: v0.1.8
Component/s: insights-on-prem
Labels:
- cost-onprem-0.2
- qe

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description

User Workload Monitoring is a critical prerequisite for the Cost Management On-Premise stack to function on OpenShift, but it is not documented in the deployment guides and not enabled by the Helm chart. This causes a *silent failure* where the deployment appears successful but the data pipeline produces no metrics/recommendations.

—

Impact

*Severity*: High - Blocks the entire ROS data pipeline from working
*User Experience*:
- Deployment completes successfully with all pods running
- ServiceMonitors are created but cannot scrape metrics
- ROS CSV files are empty or missing data
- No recommendations are generated
- Silent failure - everything looks healthy but produces no data
*Affected Documentation*:
- docs/force-operator-upload.md - Contains broken reference to installation.md
- docs/installation.md - Does not document this prerequisite
- Helm chart NOTES.txt - Does not warn about this requirement
*Installation/Testing Outcome*: Deployment succeeds but data pipeline is non-functional

—

Root Cause

OpenShift User Workload Monitoring must be explicitly enabled for Prometheus to scrape ServiceMonitors in user namespaces. The Helm chart successfully deploys ServiceMonitors, but without user workload monitoring enabled, no Prometheus instance exists to read them.

Current State

What Exists:
- ✅ ServiceMonitors are deployed by Helm chart (kruize, rosocp-api, processor, recommendation-poller)
- ✅ openshift-user-workload-monitoring namespace exists (created automatically by OpenShift)
- ❌ No prometheus-user-workload pods running in that namespace
- ❌ No documentation in deployment guides
- ❌ No automation in Helm chart to enable it

Broken Documentation Reference:
- docs/force-operator-upload.md line 58 says: "User-workload monitoring is enabled (see installation.md)"
- docs/installation.md does NOT document this prerequisite

—

Evidence

ServiceMonitors Created Successfully

$ oc get servicemonitors -n cost-onprem
NAME                                                                 AGE
cost-onprem-ros-ocp-kruize                                           22m
cost-onprem-ros-ocp-rosocp-api                                       22m
cost-onprem-ros-ocp-rosocp-processor                                 22m
cost-onprem-ros-ocp-rosocp-recommendation-poller                     22m

But No Prometheus Pods to Read Them

$ oc get pods -n openshift-user-workload-monitoring
No resources found in openshift-user-workload-monitoring namespace.

$ oc get configmap cluster-monitoring-config -n openshift-monitoring
Error from server (NotFound): configmaps "cluster-monitoring-config" not found

This proves:

ServiceMonitors were deployed by the chart
User workload monitoring was never enabled
No Prometheus instance exists to scrape the ServiceMonitors

—

Expected Behavior

Users should be guided to enable user workload monitoring as part of the deployment process, either through:

Option 1: Documentation (Minimum Fix)

Update docs/installation.md to document this prerequisite
Add clear instructions on how to enable it
Include verification steps

Option 2: Helm Chart Automation (Recommended)

Helm chart could automatically create the cluster-monitoring-config ConfigMap
Template in cost-onprem/templates/monitoring/ directory
Conditional on .Values.platform being OpenShift
Include in Helm NOTES.txt output to inform users it was enabled

Example Helm template:

{{- if eq (include "cost-onprem.platform.isOpenShift" .) "true" -}}
apiVersion: v1
kind: ConfigMap
metadata: 
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data: 
  config.yaml: |
    enableUserWorkload: true
{{- end }}

—

Current Workaround

Users must manually enable user workload monitoring:

cat > /tmp/enable-user-workload-monitoring.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true
EOF

oc apply -f /tmp/enable-user-workload-monitoring.yaml

h3. Verify
oc get pods -n openshift-user-workload-monitoring
Should show: prometheus-user-workload-0, prometheus-user-workload-1, etc.

—

Steps to Reproduce

Deploy Without User Workload Monitoring

Deploy RHBK
Deploy Strimzi
Deploy cost-onprem chart with export JWT_AUTH_ENABLED=true
Observe: All pods running, ServiceMonitors created
Observe: openshift-user-workload-monitoring namespace has no pods
Observe: ROS data pipeline produces no metrics

—

Proposed Fix

Documentation Updates

Update docs/installation.md:\
Add a new section "Prerequisites for OpenShift" that includes:
- Enabling user workload monitoring
- Verification steps
- Expected resource creation

Update docs/force-operator-upload.md:\
Fix the broken reference or provide inline instructions instead of referencing installation.md

Update Helm NOTES.txt:\
Add a section for OpenShift deployments warning about this requirement

Helm Chart Enhancement (Alternative/Additional)

Create cost-onprem/templates/monitoring/cluster-monitoring-config.yaml that:

Conditionally deploys on OpenShift only
Creates the cluster-monitoring-config ConfigMap in openshift-monitoring namespace
Enables enableUserWorkload: true
Includes appropriate annotations and labels

—

Environment Details

*Repository*: https://github.com/insights-onprem/cost-onprem-chart
*Chart Version*: v0.2.0
*Git Commit*: 2ee0206
*OpenShift Version*: 4.18.26
*Kubernetes Version*: v1.31.13
*Deployment Method*: ./scripts/install-helm-chart.sh with JWT_AUTH_ENABLED=true
*ServiceMonitors Created*: Yes (4 ServiceMonitors deployed successfully)
*User Workload Monitoring Enabled*: No (missing ConfigMap and pods)
*Result*: Silent failure - deployment healthy but no data pipeline functionality

links to

PR

Details

Description

Description

Impact

Root Cause

Current State

Evidence

ServiceMonitors Created Successfully

But No Prometheus Pods to Read Them

Expected Behavior

Option 1: Documentation (Minimum Fix)

Option 2: Helm Chart Automation (Recommended)

Current Workaround

Steps to Reproduce

Deploy Without User Workload Monitoring

Proposed Fix

Documentation Updates

Helm Chart Enhancement (Alternative/Additional)

Environment Details

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates