-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
False
-
-
False
-
-
Resource Optimization plugin displays recommendations for containers/workloads that no longer exist on the cluster, causing workflow execution failures when users attempt to apply them.
Description
The Resource Optimization plugin UI displays recommendations from the Cost Management API without validating whether the recommended resources (deployments, containers) still exist on the target cluster. When users click "Apply recommendations" for a stale recommendation, the `patch-k8s-resource` workflow attempts to patch a non-existent Kubernetes resource, resulting in HTTP 404 (Not Found) or HTTP 403 (Forbidden) errors.
The Cost Management API returns recommendations based on historical data that may include resources that have been deleted from the cluster. The plugin does not perform client-side or server-side validation to check resource existence before:
- Displaying recommendations in the UI
- Allowing users to click "Apply" button
- Executing the workflow
Environment
- RHDH Version: 1.8 STABLE-RC
- Resource Optimization Plugin Version: 1.2.1
- Orchestrator Plugin Version: 1.8.0-rc.3
- Workflow: patch-k8s-resource (quay.io/orchestrator/serverless-workflow-patch-k8s-resource:latest)
- Cluster: OpenShift (tested on ocp-edge73-0)
- Cluster ID: ocp-edge73-0-prq7c
- Namespace: rhdh-operator
Steps to Reproduce
- Deploy RHDH 1.8 STABLE-RC with Resource Optimization plugin
- Navigate to Optimizations tab in Backstage UI
- Observe recommendations displayed in the table
- Identify a recommendation for a resource that has been deleted from the cluster
- Example: Recommendation shows namespace=ros-payloads, workload=http-client, container=client
- Verification: `oc get deployment http-client -n ros-payloads` returns "NotFound"
- Click "Apply" button on the stale recommendation
- Result: Workflow execution fails with HTTP 404 or HTTP 403 error
Expected Behavior
- The plugin should validate resource existence before displaying recommendations
- Recommendations for non-existent resources should either:
- Not be displayed in the UI, OR
- Be displayed with a warning/disclaimer, OR
- Have the "Apply" button disabled with a clear message
- If a user attempts to apply a recommendation for a non-existent resource:
- The workflow should validate resource existence before attempting to patch
- A clear error message should be returned: "Resource {namespace}/{workload} not found on cluster. This recommendation may be stale."
- The error should be user-friendly and actionable
Actual Behavior
- Stale recommendations are displayed in the UI without any indication they are non-actionable
- The "Apply" button is enabled for all recommendations, including stale ones
- Clicking "Apply" on a stale recommendation triggers workflow execution
- The workflow attempts to PATCH the non-existent resource via Kubernetes API
- Workflow fails with HTTP 404 (resource not found) or HTTP 403 (forbidden)
- Error messages are technical and not user-friendly
Error Details
Workflow Pod Logs
2025-10-30 20:30:40,348 ERROR [org.jbp.wor.ins.imp.WorkflowProcessInstanceImpl]
Unexpected error while executing node patch in process instance 3010435b-3531-4bef-978f-afbc878534f5:
org.jbpm.workflow.instance.WorkflowRuntimeException: [patch-k8s-resource:3010435b-3531-4bef-978f-afbc878534f5 - patch:[uuid=10]] -- HTTP 403 Forbidden
Caused by: WorkItemExecutionError [errorCode=404]
at org.kie.kogito.serverless.workflow.openapi.OpenApiWorkItemHandler.internalExecute(OpenApiWorkItemHandler.java:76)
Workflow Parameters (from error log)
parameters{
Parameter={
"clusterName":"ocp-edge73-0-prq7c",
"resourceType":"deployment",
"resourceNamespace":"ros-payloads",
"resourceName":"http-client",
"containerName":"client",
"resourceApiVersion":"apis/apps/v1"
},
apiVersion=apis/apps/v1,
kind=deployments,
name=http-client,
namespace=ros-payloads
}
Cluster Verification
# Check if namespace exists $ oc get namespace ros-payloads NAME STATUS AGE ros-payloads Active 11m # Check if deployment exists in target namespace $ oc get deployment http-client -n ros-payloads Error from server (NotFound): deployments.apps "http-client" not found # Check if deployment exists anywhere on cluster $ oc get deployments --all-namespaces | grep http-client # Result: NOT FOUND
Root Cause Analysis
The issue stems from a data freshness problem between the Cost Management API and the actual cluster state:
- Stale Data Source: Cost Management API returns recommendations based on historical metrics/data that may include resources that no longer exist
- No Validation: The Resource Optimization plugin does not validate resource existence before:
-
- Displaying recommendations
- Allowing users to interact with recommendations
- Executing workflows
-
- Workflow Assumes Existence: The `patch-k8s-resource` workflow attempts to patch resources without first verifying they exist
- Poor Error Handling: When resources don't exist, the workflow returns technical HTTP errors (404/403) instead of user-friendly messages
Impact
- Severity: Medium-High
- User Experience: Poor - Users see actionable recommendations that cannot be applied
- Workflow Reliability: Workflow executions fail unexpectedly for valid-seeming recommendations
- Data Quality: The plugin presents non-actionable data as actionable
- Frequency: Depends on how frequently resources are deleted and how often Cost Management API data is refreshed
Recommended Fixes
1. Frontend Validation (Quick Win)
Add client-side validation in the Resource Optimization plugin UI:
- Before displaying "Apply" button, validate resource exists via Kubernetes API
- Disable "Apply" button and show warning: "This resource no longer exists on the cluster"
- Add visual indicator (e.g., icon/warning badge) for potentially stale recommendations
2. Backend Validation (Preferred)
Add server-side validation in the workflow execution:
- Before attempting PATCH, verify resource exists via Kubernetes API GET request
- Return clear error message if resource doesn't exist: "Resource {namespace}/{workload} not found. This recommendation may be stale."
- Handle 404/403 errors gracefully with user-friendly messages
3. API-Level Filtering (Long-term)
Work with Cost Management API team to:
- Filter out recommendations for resources that no longer exist
- Add "resource_exists" or "resource_status" field to recommendation response
- Implement time-based expiration for recommendations (e.g., resources deleted > 7 days ago)
4. Workflow Enhancement
Update `patch-k8s-resource` workflow to:
- Add pre-flight validation step to check resource existence
- Handle missing resources gracefully with informative error messages
- Optionally auto-skip/disable recommendations for non-existent resources
Workaround
- Before applying recommendations, manually verify resources exist:
- `oc get deployment {workload} -n {namespace}`
- `oc get {resourceType} {workload} -n {namespace}`
- If resource doesn't exist, skip that recommendation
- Wait for Cost Management API data refresh cycle for stale recommendations to be filtered out
Additional Notes
- This issue highlights a broader data quality/freshness problem between Cost Management API and cluster state
- Similar issues may affect other resource types (StatefulSets, DaemonSets, etc.)
- The workflow failure mode (404/403) may vary depending on:
-
- Whether the namespace exists
- Whether the resource type exists
- RBAC permissions when checking non-existent resources
-
- Investigation revealed the API URL configuration issue was separate (needed Kubernetes API server, not Backstage URL)
- Investigation also revealed Kie Flyway database migration issue was separate (missing `kie.flyway.enabled=true` property)
Related Issues
- FLPATH-2832: Cost Management proxy timeout issues
FLPATH-2833: Missing correlation_instances table (Kie Flyway database migration issue)