-
Bug
-
Resolution: Unresolved
-
Undefined
-
v0.1.8
-
False
-
-
False
-
-
Description
User Workload Monitoring is a critical prerequisite for the Cost Management On-Premise stack to function on OpenShift, but it is not documented in the deployment guides and not enabled by the Helm chart. This causes a *silent failure* where the deployment appears successful but the data pipeline produces no metrics/recommendations.
—
Impact
- *Severity*: High - Blocks the entire ROS data pipeline from working
- *User Experience*:
- Deployment completes successfully with all pods running
- ServiceMonitors are created but cannot scrape metrics
- ROS CSV files are empty or missing data
- No recommendations are generated
- Silent failure - everything looks healthy but produces no data
- *Affected Documentation*:
- docs/force-operator-upload.md - Contains broken reference to installation.md
- docs/installation.md - Does not document this prerequisite
- Helm chart NOTES.txt - Does not warn about this requirement
- *Installation/Testing Outcome*: Deployment succeeds but data pipeline is non-functional
—
Root Cause
OpenShift User Workload Monitoring must be explicitly enabled for Prometheus to scrape ServiceMonitors in user namespaces. The Helm chart successfully deploys ServiceMonitors, but without user workload monitoring enabled, no Prometheus instance exists to read them.
Current State
- What Exists:
- ✅ ServiceMonitors are deployed by Helm chart (kruize, rosocp-api, processor, recommendation-poller)
- ✅ openshift-user-workload-monitoring namespace exists (created automatically by OpenShift)
- ❌ No prometheus-user-workload pods running in that namespace
- ❌ No documentation in deployment guides
- ❌ No automation in Helm chart to enable it
- Broken Documentation Reference:
- docs/force-operator-upload.md line 58 says: "User-workload monitoring is enabled (see installation.md)"
- docs/installation.md does NOT document this prerequisite
—
Evidence
ServiceMonitors Created Successfully
$ oc get servicemonitors -n cost-onprem NAME AGE cost-onprem-ros-ocp-kruize 22m cost-onprem-ros-ocp-rosocp-api 22m cost-onprem-ros-ocp-rosocp-processor 22m cost-onprem-ros-ocp-rosocp-recommendation-poller 22m
But No Prometheus Pods to Read Them
$ oc get pods -n openshift-user-workload-monitoring No resources found in openshift-user-workload-monitoring namespace. $ oc get configmap cluster-monitoring-config -n openshift-monitoring Error from server (NotFound): configmaps "cluster-monitoring-config" not found
This proves:
- ServiceMonitors were deployed by the chart
- User workload monitoring was never enabled
- No Prometheus instance exists to scrape the ServiceMonitors
—
Expected Behavior
Users should be guided to enable user workload monitoring as part of the deployment process, either through:
Option 1: Documentation (Minimum Fix)
- Update docs/installation.md to document this prerequisite
- Add clear instructions on how to enable it
- Include verification steps
Option 2: Helm Chart Automation (Recommended)
- Helm chart could automatically create the cluster-monitoring-config ConfigMap
- Template in cost-onprem/templates/monitoring/ directory
- Conditional on .Values.platform being OpenShift
- Include in Helm NOTES.txt output to inform users it was enabled
Example Helm template:
{{- if eq (include "cost-onprem.platform.isOpenShift" .) "true" -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
{{- end }}
—
Current Workaround
Users must manually enable user workload monitoring:
cat > /tmp/enable-user-workload-monitoring.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
EOF
oc apply -f /tmp/enable-user-workload-monitoring.yaml
h3. Verify
oc get pods -n openshift-user-workload-monitoring
Should show: prometheus-user-workload-0, prometheus-user-workload-1, etc.
—
Steps to Reproduce
Deploy Without User Workload Monitoring
- Deploy RHBK
- Deploy Strimzi
- Deploy cost-onprem chart with export JWT_AUTH_ENABLED=true
- Observe: All pods running, ServiceMonitors created
- Observe: openshift-user-workload-monitoring namespace has no pods
- Observe: ROS data pipeline produces no metrics
—
Proposed Fix
Documentation Updates
- Update docs/installation.md:\
Add a new section "Prerequisites for OpenShift" that includes:- Enabling user workload monitoring
- Verification steps
- Expected resource creation
- Update docs/force-operator-upload.md:\
Fix the broken reference or provide inline instructions instead of referencing installation.md
- Update Helm NOTES.txt:\
Add a section for OpenShift deployments warning about this requirement
Helm Chart Enhancement (Alternative/Additional)
Create cost-onprem/templates/monitoring/cluster-monitoring-config.yaml that:
- Conditionally deploys on OpenShift only
- Creates the cluster-monitoring-config ConfigMap in openshift-monitoring namespace
- Enables enableUserWorkload: true
- Includes appropriate annotations and labels
—
Environment Details
- *Repository*: https://github.com/insights-onprem/cost-onprem-chart
- *Chart Version*: v0.2.0
- *Git Commit*: 2ee0206
- *OpenShift Version*: 4.18.26
- *Kubernetes Version*: v1.31.13
- *Deployment Method*: ./scripts/install-helm-chart.sh with JWT_AUTH_ENABLED=true
- *ServiceMonitors Created*: Yes (4 ServiceMonitors deployed successfully)
- *User Workload Monitoring Enabled*: No (missing ConfigMap and pods)
- *Result*: Silent failure - deployment healthy but no data pipeline functionality
- links to