-
Bug
-
Resolution: Done
-
Undefined
-
v0.1.8
-
False
-
-
False
-
-
Summary of Issues
- ROS API Pod Label Mismatch (install-helm-chart.sh line 841)
- Kruize Service Name Mismatch (install-helm-chart.sh line 892)
- Service URL Construction - Missing
ros-ocpinfix (test-ocp-dataflow-jwt.sh lines 457-497) - Database Pod Label Selector - Wrong label (test-ocp-dataflow-jwt.sh line 746, query-kruize.sh line 50)
- Database Credentials Secret Name - Missing
ros-ocpinfix (test-ocp-dataflow-jwt.sh line 755, query-kruize.sh line 12) - Database Username Extraction - Trying to extract hardcoded value from secret (test-ocp-dataflow-jwt.sh lines 756-766, query-kruize.sh line 13)
Description
Multiple scripts in the cost-onprem-chart repository use incorrect resource naming patterns, causing both false-positive health check failures and complete test failures. These issues appear to be related to incomplete updates following the chart rename from "ros-helm-chart" to "cost-onprem-chart".
Impact
- Severity: High - Test scripts completely fail due to incorrect service names and database access
- User Experience:
- Health checks report false failures but deployment actually succeeds
- End-to-end test scripts fail with HTTP 000 errors (cannot reach services)
- Database verification queries fail (cannot find database pod or credentials)
- Confusing error messages during installation/testing
- Affected Scripts:
- scripts/install-helm-chart.sh - Health check functions
- scripts/test-ocp-dataflow-jwt.sh - End-to-end dataflow testing and database verification
- scripts/query-kruize.sh - Kruize database query utility
- Installation/Testing Outcome: Deployments succeed but health checks fail; test scripts completely fail
Root Cause
The scripts use outdated naming patterns that don't match the actual Helm chart resource names after the chart was renamed. The Helm fullname template now generates names with ros-ocp infix, but scripts weren't updated.
Affected Scripts and Issues
install-helm-chart.sh Issues
Issue 1: ROS API Pod Label Mismatch (Line 841)
- Script searches for: app.kubernetes.io/name=ros-api
- Actual pod label: app.kubernetes.io/name=rosocp-api
- Result: "ROS API pod not found" error
Issue 2: Kruize Service Name Mismatch (Line 892)
- Script searches for: svc/cost-onprem-kruize
- Actual service name: svc/cost-onprem-ros-ocp-kruize
- Result: "Kruize API service is not responding" error
test-ocp-dataflow-jwt.sh Issues
Service URL Construction Issues
Issue 3: Service URL Construction (get_service_url() function, lines 457-497)
- Function constructs: $HELM_RELEASE_NAME-$service_name
- Should construct: $HELM_RELEASE_NAME-ros-ocp-$service_name
- Examples:
- Tries to find: cost-onprem-ingress
- Actually exists: cost-onprem-ros-ocp-ingress
- Tries to find: cost-onprem-rosocp-api
- Actually exists: cost-onprem-ros-ocp-rosocp-api
- Result: HTTP 000 errors (cannot connect), complete test failure
Database Access Issues (Discovered during dataflow test)
Issue 4: Database Pod Label Selector (Line 746)
- Script searches for: app.kubernetes.io/name=database
- Actual pod label: app.kubernetes.io/name=db-kruize
- Result: "Database pod not found" error
Issue 5: Database Credentials Secret Name (Line 755)
- Script searches for: cost-onprem-db-credentials
- Actual secret name: cost-onprem-ros-ocp-db-credentials
- Result: "Unable to retrieve Kruize database credentials" error
Issue 6: Database Username Extraction (Lines 756-766)
- Script tries to extract: kruize-user from secret
- Reality: Secret only contains passwords (kruize-password, ros-password, sources-password), username is hardcoded in Helm chart as "postgres"
- Result: Username variable is empty, database connection fails
query-kruize.sh Issues
The same database access issues (Issues 4, 5, 6) exist in query-kruize.sh:
Issue 4: Database Pod Label Selector (Line 50)
- Script searches for: app.kubernetes.io/name=database
- Actual pod label: app.kubernetes.io/name=db-kruize
- Result: "Kruize database pod not found" error
Issue 5: Database Credentials Secret Name (Line 12)
- Script searches for: cost-onprem-db-credentials
- Actual secret name: cost-onprem-ros-ocp-db-credentials
- Result: "Unable to retrieve Kruize database credentials" error
Issue 6: Database Username Extraction (Line 13)
- Script tries to extract: kruize-user from secret
- Reality: Secret only contains passwords, username is hardcoded to "postgres"
- Result: Script fails with error about missing credentials
Evidence
Health Check False Positives (install-helm-chart.sh)
[INFO] Running health checks...
[INFO] Testing internal service connectivity...
[ERROR] ✗ ROS API pod not found
[INFO] Testing services via port-forwarding (OpenShift approach)...
[INFO] Testing Ingress API via port-forward...
[SUCCESS] ✓ Ingress API service is healthy (port-forward, internal port 8081)
[INFO] Testing Kruize API via port-forward...
[ERROR] ✗ Kruize API service is not responding (port-forward)
[INFO] Testing external route accessibility (informational)...
[SUCCESS] → ROS API externally accessible: http://cost-onprem-ros-ocp-main-cost-onprem.apps.ocp-test.qe.lab.redhat.com/status
[SUCCESS] → 1 route(s) externally accessible
[ERROR] 2 core service check(s) failed
Test Script Complete Failure (test-ocp-dataflow-jwt.sh)
[INFO] === Preflight: JWT Authentication Validation === [INFO] Testing ingress at: http://cost-onprem-ingress.cost-onprem.svc.cluster.local:8080 [INFO] Test 1: Request without JWT token [WARNING] ⚠ Expected 401, got 000000 (may indicate route/service issue) [INFO] === STEP 1: Upload Test Data with JWT Authentication ==== [INFO] Uploading to: http://cost-onprem-ingress.cost-onprem.svc.cluster.local:8080/v1/upload [ERROR] Upload failed with HTTP 000 [ERROR] Response: [ERROR] Upload with JWT authentication failed
Database Access Failure
[ERROR] Database pod not found
[ERROR] Unable to retrieve Kruize database credentials from secret 'cost-onprem-db-credentials'
Actual Service Names (What Really Exists)
$ oc get svc -n cost-onprem | grep -E "api|ingress|kruize"
cost-onprem-ros-ocp-db-kruize ClusterIP 172.30.14.216 [none] 5432/TCP 3h10m
cost-onprem-ros-ocp-ingress ClusterIP 172.30.82.140 [none] 8080/TCP,9901/TCP 3h10m
cost-onprem-ros-ocp-kruize ClusterIP 172.30.53.128 [none] 8080/TCP 3h10m
cost-onprem-ros-ocp-rosocp-api ClusterIP 172.30.105.65 [none] 8000/TCP,9901/TCP,9000/TCP 3h10m
Actual Route Names
$ oc get routes -n cost-onprem NAME HOST/PORT cost-onprem-ros-ocp-ingress cost-onprem-ros-ocp-ingress-cost-onprem.apps.ocp-test.qe.lab.redhat.com cost-onprem-ros-ocp-main cost-onprem-ros-ocp-main-cost-onprem.apps.ocp-test.qe.lab.redhat.com
Actual Pod Status (All Healthy)
$ oc get pods -n cost-onprem NAME READY STATUS RESTARTS AGE cost-onprem-ros-ocp-rosocp-api-758c6c65c8-lk482 2/2 Running 0 3h12m cost-onprem-ros-ocp-ingress-889b69c5c-59697 2/2 Running 0 3h12m cost-onprem-ros-ocp-kruize-64644796fd-v7hck 1/1 Running 0 3h12m [... all 12 pods running successfully ...]
Actual Pod Labels
ROS API Pod:
$ oc get pods -n cost-onprem cost-onprem-ros-ocp-rosocp-api-758c6c65c8-lk482 -o jsonpath='{.metadata.labels.app\.kubernetes\.io/name}'
rosocp-api
Database Pod:
$ oc get pods -n cost-onprem -l app.kubernetes.io/name=db-kruize -o jsonpath='{.items[0].metadata.name}'
cost-onprem-ros-ocp-db-kruize-0
Actual Database Secret
$ oc get secret cost-onprem-ros-ocp-db-credentials -n cost-onprem -o jsonpath='{.data}' {"kruize-password":"...","ros-password":"...","sources-password":"..."}
Note: Secret contains only passwords, not usernames (usernames are hardcoded in Helm chart).
Verification that Services Actually Work
# ROS API works internally $ oc exec -n cost-onprem cost-onprem-ros-ocp-rosocp-api-758c6c65c8-lk482 -c rosocp-api -- curl -s http://localhost:8000/status {"api-server":"working"} # Kruize works internally $ oc exec -n cost-onprem cost-onprem-ros-ocp-kruize-64644796fd-v7hck -- curl -s http://localhost:8080/listPerformanceProfiles | head -5 [ { "name": "resource-optimization-openshift", "profile_version": 1.0, "k8s_type": "openshift",
Steps to Reproduce
Health Check Issue
- Deploy on OpenShift cluster with JWT authentication enabled
- Run: export JWT_AUTH_ENABLED=true && ./install-helm-chart.sh
- Wait for installation to complete
- Observe health check output shows false errors despite successful deployment
Test Script Issue
- Complete successful deployment as above
- Create service account with admin access
- Login as service account: oc login --token=... --server=...
- Run: ./scripts/test-ocp-dataflow-jwt.sh
- Observe HTTP 000 errors when trying to connect to services
- Observe database access errors when trying to verify Kruize recommendations
- Verify actual services exist with correct
ros-ocpnaming
Current Buggy Code
install-helm-chart.sh
Line 841 - Wrong Pod Label:
local api_pod=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=ros-api -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 892 - Wrong Service Name:
kubectl port-forward -n "$NAMESPACE" svc/cost-onprem-kruize 18081:8080 --request-timeout=90s >/dev/null 2>&1 &
test-ocp-dataflow-jwt.sh
Lines 462 & 494 - Wrong Service/Route Name Construction:
get_service_url() {
local service_name="$1"
local path="$2"
# Try to get OpenShift route first
local route_name="$HELM_RELEASE_NAME-$service_name" # WRONG
# ... route lookup ...
# Fallback to service (for port-forward or internal access)
local service_port=$(oc get svc "$HELM_RELEASE_NAME-$service_name" -n "$NAMESPACE" -o jsonpath='{.spec.ports[0].port}' 2>/dev/null || echo "8080") # WRONG
echo "http://$HELM_RELEASE_NAME-$service_name.$NAMESPACE.svc.cluster.local:${service_port}$path" # WRONG
}
Line 746 - Wrong Database Pod Label:
local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=database" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 755 - Wrong Database Credentials Secret Name:
local db_secret_name="cost-onprem-db-credentials"
Lines 756-757 - Wrong Database Username Extraction:
local kruize_user=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-user}' 2>/dev/null | base64 -d) local kruize_password=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d)
query-kruize.sh
Line 50 - Wrong Database Pod Label:
local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=database" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 12 - Wrong Database Credentials Secret Name:
DB_SECRET_NAME="${DB_SECRET_NAME:-cost-onprem-db-credentials}"
Lines 13-14 - Wrong Database Username Extraction:
KRUIZE_USER=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-user}' 2>/dev/null | base64 -d) KRUIZE_PASSWORD=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d)
Expected/Fixed Code
install-helm-chart.sh
Line 841 - Correct Pod Label:
local api_pod=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=rosocp-api -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 892 - Correct Service Name:
kubectl port-forward -n "$NAMESPACE" svc/cost-onprem-ros-ocp-kruize 18081:8080 --request-timeout=90s >/dev/null 2>&1 &
test-ocp-dataflow-jwt.sh
Lines 462 & 494 - Correct Service/Route Name Construction:
get_service_url() {
local service_name="$1"
local path="$2"
# Try to get OpenShift route first
local route_name="$HELM_RELEASE_NAME-ros-ocp-$service_name" # FIXED
# ... route lookup ...
# Fallback to service (for port-forward or internal access)
local service_port=$(oc get svc "$HELM_RELEASE_NAME-ros-ocp-$service_name" -n "$NAMESPACE" -o jsonpath='{.spec.ports[0].port}' 2>/dev/null || echo "8080") # FIXED
echo "http://$HELM_RELEASE_NAME-ros-ocp-$service_name.$NAMESPACE.svc.cluster.local:${service_port}$path" # FIXED
}
Note: The special case handling for "ros-api" → "main" route (lines 464-476) also needs similar updates to use ros-ocp prefix.
Line 746 - Correct Database Pod Label:
local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=db-kruize" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 755 - Correct Database Credentials Secret Name:
local db_secret_name="${HELM_RELEASE_NAME}-ros-ocp-db-credentials"
Lines 756-766 - Correct Database Credentials Extraction:
# Extract Kruize database credentials from secret local db_secret_name="${HELM_RELEASE_NAME}-ros-ocp-db-credentials" local kruize_password=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) # Username is hardcoded to "postgres" in the Helm chart local kruize_user="postgres" local kruize_db="kruize_db" if [ -z "$kruize_password" ]; then echo_error "Unable to retrieve Kruize database password from secret '$db_secret_name'" echo_info "Use './query-kruize.sh --cluster $cluster_id' to check recommendations later" return 1 fi
query-kruize.sh
Line 50 - Correct Database Pod Label:
local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=db-kruize" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Line 12 - Correct Database Credentials Secret Name:
DB_SECRET_NAME="${DB_SECRET_NAME:-cost-onprem-ros-ocp-db-credentials}"
Lines 13-22 - Correct Database Credentials Extraction:
# Username is hardcoded to "postgres" in the Helm chart KRUIZE_USER="postgres" KRUIZE_PASSWORD=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) KRUIZE_DB="kruize_db" if [ -z "$KRUIZE_PASSWORD" ]; then echo "Error: Unable to retrieve Kruize database password from secret '$DB_SECRET_NAME'" >&2 echo "Ensure the secret exists and contains 'kruize-password' key" >&2 exit 1 fi
Workaround
install-helm-chart.sh
The health check failures are informational only and can be ignored. All services are actually working correctly despite the error messages. Users can verify deployment success by:
# Check all pods are running oc get pods -n cost-onprem # Test ROS API oc exec -n cost-onprem -l app.kubernetes.io/name=rosocp-api -c rosocp-api -- curl -s http://localhost:8000/status # Test Kruize API oc exec -n cost-onprem -l app.kubernetes.io/name=kruize -- curl -s http://localhost:8080/listPerformanceProfiles
test-ocp-dataflow-jwt.sh
Manually test the services using the correct names:
# Test ingress route directly curl -k https://cost-onprem-ros-ocp-ingress-cost-onprem.apps.ocp-test.qe.lab.redhat.com/ready # Or use port-forward to test upload oc port-forward -n cost-onprem svc/cost-onprem-ros-ocp-ingress 8080:8080 & curl -X POST -F "file=@test.tar.gz" -H "Authorization: Bearer $JWT_TOKEN" http://localhost:8080/v1/upload # Verify Kruize database manually oc exec -n cost-onprem cost-onprem-ros-ocp-db-kruize-0 -- \ psql -U postgres -d kruize_db -c "SELECT COUNT(*) FROM kruize_experiments;"
History
Previous Partial Fix (Incomplete):
- Commit 7ca133e (Nov 10, 2025): "Fix wrong ros-api service name"
- Fixed typo: ross-api → ros-api
- Still Incorrect: Should have been changed to rosocp-api
This suggests the naming pattern was updated incompletely during the chart rename work.
Related Issues
FLPATH-2891: Storage credentials secret naming regression (same root cause - chart rename)- Commit 291b59f: "Rename chart name to cost-onprem"
- Commit 187108f: "Use cost-onprem as chart name and default namespace"
All bugs (FLPATH-2891, FLPATH-2892) stem from incomplete resource name updates following the chart rename from "ros-helm-chart" to "cost-onprem-chart".
Environment Details
- Repository: https://github.com/insights-onprem/cost-onprem-chart
- Git Commit: e5d6a2d1b82d0fcaf8594cfae217f61209c10cc2
- Chart Version: v0.1.8-41-ge5d6a2d
- OpenShift Version: 4.18.26
- Kubernetes Version: v1.31.13
- Helm Release Name: cost-onprem (default)
- Namespace: cost-onprem (default)
- Platform: OpenShift
- JWT Auth: Enabled
- Affected Scripts:
- scripts/install-helm-chart.sh
- scripts/test-ocp-dataflow-jwt.sh
- scripts/query-kruize.sh
Proposed Fix
Update all three scripts to use the correct ros-ocp naming convention:
install-helm-chart.sh
# Line 841: Change pod label search - local api_pod=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=ros-api -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + local api_pod=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=rosocp-api -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) # Line 892: Change service name - kubectl port-forward -n "$NAMESPACE" svc/cost-onprem-kruize 18081:8080 --request-timeout=90s >/dev/null 2>&1 & + kubectl port-forward -n "$NAMESPACE" svc/cost-onprem-ros-ocp-kruize 18081:8080 --request-timeout=90s >/dev/null 2>&1 &
test-ocp-dataflow-jwt.sh
# Update get_service_url() function to add -ros-ocp- infix # Line 462: Update route name construction - local route_name="$HELM_RELEASE_NAME-$service_name" + local route_name="$HELM_RELEASE_NAME-ros-ocp-$service_name" # Line 467: Update "main" route special case - route_name="$HELM_RELEASE_NAME-main" + route_name="$HELM_RELEASE_NAME-ros-ocp-main" # Line 474: Update ros-api route fallback - route_name="$HELM_RELEASE_NAME-ros-api" + route_name="$HELM_RELEASE_NAME-ros-ocp-rosocp-api" # Line 494: Update service name construction - local service_port=$(oc get svc "$HELM_RELEASE_NAME-$service_name" -n "$NAMESPACE" -o jsonpath='{.spec.ports[0].port}' 2>/dev/null || echo "8080") + local service_port=$(oc get svc "$HELM_RELEASE_NAME-ros-ocp-$service_name" -n "$NAMESPACE" -o jsonpath='{.spec.ports[0].port}' 2>/dev/null || echo "8080") # Line 495: Update service URL construction - echo "http://$HELM_RELEASE_NAME-$service_name.$NAMESPACE.svc.cluster.local:${service_port}$path" + echo "http://$HELM_RELEASE_NAME-ros-ocp-$service_name.$NAMESPACE.svc.cluster.local:${service_port}$path" # Line 746: Fix database pod label -local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=database" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) +local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=db-kruize" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) # Line 755: Fix database credentials secret name -local db_secret_name="cost-onprem-db-credentials" +local db_secret_name="${HELM_RELEASE_NAME}-ros-ocp-db-credentials" # Lines 756-766: Fix database credentials extraction -local kruize_user=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-user}' 2>/dev/null | base64 -d) -local kruize_password=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) +# Extract Kruize database credentials from secret +local db_secret_name="${HELM_RELEASE_NAME}-ros-ocp-db-credentials" +local kruize_password=$(oc get secret -n "$NAMESPACE" "$db_secret_name" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) + +# Username is hardcoded to "postgres" in the Helm chart +local kruize_user="postgres" +local kruize_db="kruize_db" + +if [ -z "$kruize_password" ]; then + echo_error "Unable to retrieve Kruize database password from secret '$db_secret_name'" + echo_info "Use './query-kruize.sh --cluster $cluster_id' to check recommendations later" + return 1 +fi
query-kruize.sh
# Line 50: Fix database pod label - local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=database" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + local db_pod=$(oc get pods -n "$NAMESPACE" -l "app.kubernetes.io/name=db-kruize" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) # Line 12: Fix database credentials secret name -DB_SECRET_NAME="${DB_SECRET_NAME:-cost-onprem-db-credentials}" +DB_SECRET_NAME="${DB_SECRET_NAME:-cost-onprem-ros-ocp-db-credentials}" # Lines 13-22: Fix database credentials extraction -KRUIZE_USER=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-user}' 2>/dev/null | base64 -d) -KRUIZE_PASSWORD=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) -KRUIZE_DB="kruize_db" - -if [ -z "$KRUIZE_USER" ] || [ -z "$KRUIZE_PASSWORD" ]; then - echo "Error: Unable to retrieve Kruize database credentials from secret '$DB_SECRET_NAME'" >&2 - echo "Ensure the secret exists and contains 'kruize-user' and 'kruize-password' keys" >&2 - exit 1 -fi +# Username is hardcoded to "postgres" in the Helm chart +KRUIZE_USER="postgres" +KRUIZE_PASSWORD=$(oc get secret -n "$NAMESPACE" "$DB_SECRET_NAME" -o jsonpath='{.data.kruize-password}' 2>/dev/null | base64 -d) +KRUIZE_DB="kruize_db" + +if [ -z "$KRUIZE_PASSWORD" ]; then + echo "Error: Unable to retrieve Kruize database password from secret '$DB_SECRET_NAME'" >&2 + echo "Ensure the secret exists and contains 'kruize-password' key" >&2 + exit 1 +fi
These changes align all three scripts with the actual Helm chart resource naming conventions established by the fullname template.