Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-2833

Missing correlation_instances table - patch-k8s-resource workflow database migration failure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • optimization-plugin
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      patch-k8s-resource workflow fails when attempting to apply resource optimization recommendations, throwing a PSQLException: ERROR: relation "correlation_instances" does not exist. The workflow schema only contains the flyway_schema_history table, missing required Kogito persistence tables because Kie Flyway initialization is disabled.

      Description

      When executing the resource optimization workflow (patch-k8s-resource), the workflow attempts to reconnect a process instance and queries the correlation_instances table. This table does not exist in the patch-k8s-resource schema, causing the workflow execution to fail with a database error.

      The workflow uses dbMigrationStrategy: service which relies on Flyway for schema management. While Flyway creates the flyway_schema_history table successfully, Kie Flyway (Kogito's database migration tool) is disabled during startup, preventing the creation of required Kogito persistence tables:

      • correlation_instances
      • process_instances
      • business_key_mapping
      • kie_flyway_history_runtime_persistence

      Environment

      • RHDH Version: 1.8 STABLE-RC
      • Orchestrator Plugin Version: 1.8.0-rc.3
      • Resource Optimization Plugin Version: 1.2.1
      • Workflow: patch-k8s-resource (latest from quay.io/orchestrator/serverless-workflow-patch-k8s-resource:latest)
      • Database: PostgreSQL 15.14
      • Database Name: backstage_plugin_orchestrator
      • Database Schema: patch-k8s-resource
      • Migration Strategy: service (dbMigrationStrategy: service)
      • Cluster: OpenShift (tested on ocp-edge73-0)
      • Namespace: rhdh-operator

      Steps to Reproduce

      1. Deploy RHDH 1.8 STABLE-RC with orchestrator plugins
      2. Deploy resource optimization plugin with valid ROS_CLIENT_ID and ROS_CLIENT_SECRET
      3. Deploy patch-k8s-resource workflow using deploy-resource-optimization.sh script
      4. Wait for workflow pod to be ready
      5. Navigate to Optimizations tab in Backstage UI
      6. Wait for optimization data to load
      7. Click "Apply" on a resource optimization recommendation
      8. Result: Workflow execution fails with database error

      API Calls When Clicking "Apply"

      When clicking "Apply" on a resource optimization recommendation, two API calls are made:

      API Call #1: Workflow Execute (POST)

      Endpoint: POST /api/orchestrator/v2/workflows/patch-k8s-resource/execute

      Request Body:

      {
        "inputData": {
          "clusterName": "ocp-edge73-0-prq7c",
          "resourceType": "deployment",
          "resourceNamespace": "ros-payloads",
          "resourceName": "http-client",
          "containerName": "client",
          "containerResources": {
            "limits": \{"memory": 11167334},
            "requests": \{"memory": 11167334}
          }
        }
      }
      

      HTTP Response: 500 Internal Server Error

      Error Flow:

      1. Orchestrator backend proxies request to workflow pod
      2. Workflow pod attempts to create process instance
      3. During creation, AbstractProcessInstance.reconnect() is called
      4. reconnect() queries correlation_instances table
      5. Table doesn't exist → PSQLException
      6. Error propagates as UnhandledException → HTTP 500

      Result: No workflow instance ID returned, request fails completely

      API Call #2: Workflow Instance Status (GET)

      Endpoint: GET /api/orchestrator/v2/workflows/instances/undefined

      HTTP Response: 400/404 (invalid instance ID)

      Error Flow:

      1. Frontend expects instance ID from first call
      2. First call failed, so instance ID is undefined
      3. Frontend tries to poll with undefined as instance ID
      4. Backend rejects invalid instance ID

      Root Cause: Cascading failure from API call #1

      Key Findings

      • First API call fails at HTTP 500 due to missing correlation_instances table
      • No instance is created, so no valid instance ID exists
      • Second API call fails because frontend uses undefined instance ID
      • Both failures are symptoms of the same root cause: missing database tables from disabled Kie Flyway initialization

      Expected Behavior

      The workflow should execute successfully, applying the resource optimization recommendation to the target Kubernetes resource. The database schema should contain all required Kogito persistence tables:

      • correlation_instances
      • process_instances
      • business_key_mapping
      • kie_flyway_history_runtime_persistence

      Actual Behavior

      Workflow execution fails with:

      org.postgresql.util.PSQLException: ERROR: relation "correlation_instances" does not exist
      Position: 49
      

      The error occurs when the workflow attempts to reconnect a process instance:

      • JDBCCorrelationRepository.findByCorrelatedId() queries the correlation_instances table
      • AbstractProcessInstance.reconnect() is called during workflow execution
      • The table does not exist, causing the query to fail

      Error Logs

      2025-10-30 18:22:47,265 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-96) HTTP Request to /patch-k8s-resource failed, error id: d3ba99b9-62c2-4d2c-a95e-78fe789a2fad-1: org.jboss.resteasy.spi.UnhandledException: java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: relation "correlation_instances" does not exist
        Position: 49
      
      Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: relation "correlation_instances" does not exist
        Position: 49
              at org.kie.kogito.persistence.jdbc.correlation.JDBCCorrelationRepository.findByCorrelatedId(JDBCCorrelationRepository.java:108)
              at org.kie.kogito.persistence.jdbc.correlation.JDBCCorrelationService.findByCorrelatedId(JDBCCorrelationService.java:55)
              at org.kie.kogito.process.impl.AbstractProcessInstance.reconnect(AbstractProcessInstance.java:220)
              at org.kie.kogito.process.impl.AbstractProcessInstance.internalLoadProcessInstanceState(AbstractProcessInstance.java:202)
      

      Startup Logs

      2025-10-30 16:35:15,257 INFO  [org.fly.cor.FlywayExecutor] (main) Database: jdbc:postgresql://backstage-psql-backstage.rhdh-operator:5432/backstage_plugin_orchestrator?currentSchema=patch-k8s-resource (PostgreSQL 15.14)
      2025-10-30 16:35:15,290 INFO  [org.fly.cor.int.dat.bas.Schema] (main) Creating schema "patch-k8s-resource" ...
      2025-10-30 16:35:15,301 INFO  [org.fly.cor.int.sch.JdbcTableSchemaHistory] (main) Creating Schema History table "patch-k8s-resource"."flyway_schema_history" ...
      2025-10-30 16:35:15,440 INFO  [org.fly.cor.int.com.DbMigrate] (main) Current version of schema "patch-k8s-resource": null
      2025-10-30 16:35:15,443 INFO  [org.fly.cor.int.com.DbMigrate] (main) Schema "patch-k8s-resource" is up to date. No migration necessary.
      2025-10-30 16:35:15,455 WARN  [org.kie.fly.int.KieFlywayRunner] (main) Kie Flyway is disabled, skipping initialization.
      

      Root Cause Analysis

      1. Flyway vs Kie Flyway: The workflow uses two migration mechanisms:
        • Standard Flyway: Creates flyway_schema_history table (works correctly)
        • Kie Flyway: Should create Kogito persistence tables (disabled/failing)
      2. Migration Strategy: The workflow uses dbMigrationStrategy: service, which indicates schema migrations should be handled by a migration service. However, Kie Flyway should still run to initialize Kogito-specific tables.
      3. Disabled Kie Flyway: The log shows "Kie Flyway is disabled, skipping initialization", which prevents creation of required tables:
        • correlation_instances
        • process_instances
        • business_key_mapping
        • kie_flyway_history_runtime_persistence
      4. Comparison with Working Workflow: The greeting workflow (same migration strategy) successfully creates all required tables. This suggests:
        • The workflow image or configuration for patch-k8s-resource may have Kie Flyway disabled
        • There may be a configuration difference between working and failing workflows
        • The migration service may not be handling patch-k8s-resource workflow correctly

      Database Schema Comparison

      • patch-k8s-resource schema (FAILING): Only contains flyway_schema_history. Missing: correlation_instances, process_instances, business_key_mapping, kie_flyway_history_runtime_persistence
      • greeting schema (WORKING): Contains all required tables: flyway_schema_history, correlation_instances, process_instances, business_key_mapping, kie_flyway_history_runtime_persistence

      Workflow Configuration

      spec: 
        persistence: 
          dbMigrationStrategy: service
          postgresql: 
            secretRef: 
              name: backstage-psql-secret-backstage
              passwordKey: POSTGRES_PASSWORD
              userKey: POSTGRES_USER
            serviceRef: 
              databaseName: backstage_plugin_orchestrator
              databaseSchema: patch-k8s-resource
              name: backstage-psql-backstage
              port: 5432
      

      Workaround (this didn't solve the overall issue of the workflow not getting triggered)

      1. Manually create the missing tables by copying the schema from a working workflow (e.g., greeting)
      2. Or change dbMigrationStrategy from service to automated to force local migration execution
      3. Or enable Kie Flyway explicitly in the workflow configuration/environment

      Impact

      • Severity: High
      • User Experience: Broken - Users cannot apply resource optimization recommendations
      • Frequency: Consistent - Affects all attempts to execute the patch-k8s-resource workflow
      • Workaround Available: Yes, but still didn't work (manual table creation or configuration changes)

      Additional Notes

      • The standard Flyway migration works correctly (creates flyway_schema_history)
      • The issue is specific to Kie Flyway (Kogito persistence tables)
      • Other workflows using the same migration strategy (greeting) work correctly
      • The workflow image: quay.io/orchestrator/serverless-workflow-patch-k8s-resource:latest
      • Feature flags installed in pod: kie-flyway (from log: kie-flyway, kie-addon-persistence-jdbc-extension)
      • However, Kie Flyway is still disabled at runtime

      Technical Details

      • Error Location: org.kie.kogito.persistence.jdbc.correlation.JDBCCorrelationRepository.findByCorrelatedId() at line 108
      • Migration Tool: Flyway (standard) + Kie Flyway (Kogito-specific)
      • Database: PostgreSQL 15.14
      • Schema: patch-k8s-resource (PostgreSQL schema, not database)
      • Required Tables: correlation_instances, process_instances, business_key_mapping, kie_flyway_history_runtime_persistence
      • Workflow Endpoint: /patch-k8s-resource
      • Migration Strategy: service (expects migration service to handle schema)

              ydayagi yaron dayagi
              gharden1 Gary Harden
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: