Uploaded image for project: 'Kogito'
  1. Kogito
  2. KOGITO-1405

data-index: race condition when restarting because of new protobuf files

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Done
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.1
    • Component/s: Data Index
    • Labels:
      None
    • Sprint:
      2020 Week 10-12 (from Mar 2), 2020 Week 13-15 (from Mar 23)
    • Steps to Reproduce:
      Hide

      git clone https://github.com/kiegroup/kogito-cloud-operator

      make
      docker tag quay.io/kiegroup/kogito-cloud-operator:0.8.0 quay.io/YOUR_USERNAME/kogito-cloud-operator:0.8.0
      docker push quay.io/YOUR_USERNAME/kogito-cloud-operator:0.8.0

      make run-tests operator_image=quay.io/YOUR_USERNAME/kogito-cloud-operator services_image_version=0.8.0-rc3 build_image_version=0.8.0-rc3 tags="@dataindex && @persistence && @events && @native"

      Show
      git clone https://github.com/kiegroup/kogito-cloud-operator make docker tag quay.io/kiegroup/kogito-cloud-operator:0.8.0 quay.io/YOUR_USERNAME/kogito-cloud-operator:0.8.0 docker push quay.io/YOUR_USERNAME/kogito-cloud-operator:0.8.0 make run-tests operator_image=quay.io/YOUR_USERNAME/kogito-cloud-operator services_image_version=0.8.0-rc3 build_image_version=0.8.0-rc3 tags="@dataindex && @persistence && @events && @native"

      Description

      I just came across a race condition of our infrastructure when deploying a new application with persistence and data-index.
      This happens mostly when the application is built as Quarkus native.

      Build image version: 0.8.0-rc3
      Services image version: 0.8.0-rc3

      We have a scenario on Cucumber tests that is failing when building in native: https://github.com/kiegroup/kogito-cloud-operator/blob/d195873989b8561913bbf5e91066756a3ae870ce/test/features/install_dataindex.feature#L37

      This is what should happen on Openshift when we launch the test:

      • Install Kogito Operator and depedent operators
      • Install data-index with 1 replica (so with Kafka and Infinispan as well)
      • Build jbpm-quarkus-example in native mode with persistence and events
      • When built, start the example application and restart data-index with new protobuf files from example
      • Send post request on example to register a new orders process
      • The example app sends a message to Kafka topic about the new process instance
      • Data-index retrieves the message and store in Infinispan
      • We can then retrieve this process instance from data-index

      What happens is that between the time the data-index is restarted with new protobuf files (which means new instance is started, when running it is activated and old one is terminated), the application has started (which is pretty fast with Quarkus), sent a message to Kafka and the "old" data-index retireves it.
      As it has no protobuf files for this process, it throws the exception and does not store anything.
      Then "new" data-index is running and we try to get the process instance information. and guess what ... there is nothing as nothing has been stored due to the error

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              cnicolai Cristiano Nicolai
              Reporter:
              tradisso Tristan Radisson
              Tester:
              Tristan Radisson Tristan Radisson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: