Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-3188

data-index-service enters crash loop after receiving workflow events due to version mismatch

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      The sonataflow-platform-data-index-service enters a crash loop after receiving process instance events from the patch-k8s-resource workflow. The data-index service cannot deserialize the events, which causes its health checks to fail and Kubernetes to restart it repeatedly.

      Root cause

      There is a version mismatch between the workflow image and the data-index service image:

      • patch-k8s-resource workflow: serverless-workflow-project 1.0.0-SNAPSHOT powered by Quarkus 3.20.1
      • data-index-service: data-index-service-postgresql 9.103.0.redhat-00004 powered by Quarkus 3.15.4.redhat-00001

      The newer workflow produces event types that the older data-index service cannot recognize.

      Error from data-index logs

      ERROR [io.sma.rea.mes.provider] (vert.x-eventloop-thread-1) SRMSG00201: Error caught while processing a message 
      in method org.kie.kogito.index.service.messaging.BlockingMessagingEventConsumer#onProcessInstanceEvent: 
      java.lang.UnsupportedOperationException: Unrecognized event type 
          at org.kie.kogito.event.serializer.MultipleProcessInstanceDataEventDeserializer.getCloudEvent(...)
      

      Crash loop sequence

      1. The patch-k8s-resource workflow executes and completes the patch operation successfully
      2. The workflow publishes a process instance event to the data-index service via HTTP
      3. The data-index service receives the event but fails to deserialize it with Unrecognized event type
      4. The reactive messaging consumer (BlockingMessagingEventConsumer#onProcessInstanceEvent) is marked as failed
      5. Both liveness and readiness health checks report DOWN:
        {"status":"DOWN","checks":[{"name":"SmallRye Reactive Messaging - liveness check","status":"DOWN",
        "data":{"application-...#onProcessInstanceEvent":"[KO] - Unrecognized event type "}}]}
        
      6. Kubernetes kills the pod due to failed liveness probe
      7. The pod restarts and the cycle repeats on the next workflow event

      Reproduction steps

      1. Deploy RHDH with Orchestrator and Resource Optimization plugin
      2. Trigger the patch-k8s-resource workflow (e.g., via Apply Recommendations in Cost Management > Optimizations)
      3. Observe the data-index-service pod entering a crash loop

      Expected behavior

      The data-index service should be compatible with the workflow images shipped alongside it and should be able to deserialize all event types produced by those workflows.

      Environment

      • OpenShift 4.x
      • RHDH Operator deployed
      • SonataFlow Platform with data-index and workflow pods in rhdh-operator namespace

              ydayagi yaron dayagi
              gharden1 Gary Harden
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: