Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-6281

[Operator] XSite Cache CRs can falsely report Ready=True when XSite is not enabled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Operator
    • None
    • False
    • None
    • False

      Issue only occurs when the ConfigListener is enabled.

      Given an Infinispan CR that does not define XSite and the below Cache CR with xsite Backups configured, the expected behaviour is that upon Cache creation the Cache CR will have it's status set to Ready=False and a message containing error ISPN000571.

      apiVersion: infinispan.org/v2alpha1
      kind: Cache
      metadata: 
        name: cache-xsite
      spec: 
        clusterName: test-cache-cr
        name: cache-xsite
        template: |-
          <distributed-cache name="eu-customers" mode="ASYNC" statistics="true">
                <encoding media-type="application/x-protostream"/>
                <state-transfer enabled="false"/>
                <memory>
                  <binary eviction="MEMORY" size="400000000"/> <!-- 400 MB -->
                </memory>
                <expiration lifespan="600000"/> <!-- 10 min -->
                  <backups>
                  <backup site="SiteA"
                          strategy="ASYNC">
                    <state-transfer chunk-size="600"
                                    timeout="2400000"
                                    max-retries="30"
                                    wait-time="2000"
                                    mode="AUTO"/>
                     <take-offline after-failures="5" min-wait="10000"/>
                  </backup>
                </backups>
              </distributed-cache>
        updates: 
          strategy: retain
      

      The current behaviour is that upon initial Cache creation on the server, the Cache CR correct reports the status:

      apiVersion: infinispan.org/v2alpha1
      kind: Cache
      metadata: 
        annotations: 
          infinispan.org/listener-generation: "2"
        creationTimestamp: "2023-06-02T08:19:15Z"
        finalizers: 
        - finalizer.infinispan.org
        generation: 2
        labels: 
          test-name: TestCacheCR
        name: cache-xsite
        namespace: namespace-for-testing
        resourceVersion: "10226"
        uid: 36080a6e-432f-4f85-8f6b-a695148991a3
      spec: 
        clusterName: test-cache-cr
        name: cache-xsite
        template: "distributedCache: \n  mode: \"ASYNC\"\n  statistics: \"true\"\n  backups: 
          \n    SiteA: \n      backup: \n        strategy: \"ASYNC\"\n        stateTransfer: 
          \n          chunkSize: \"600\"\n          timeout: \"2400000\"\n          maxRetries: 
          \"30\"\n          waitTime: \"2000\"\n          mode: \"AUTO\"\n        takeOffline: 
          \n          afterFailures: \"5\"\n          minWait: \"10000\"\n  encoding: \n
          \   mediaType: \"application/x-protostream\"\n  expiration: \n    lifespan: \"600000\"\n
          \ memory: \n    storage: \"BINARY\"\n    maxSize: \"400000000\"\n  stateTransfer: 
          \n    enabled: \"false\"\n"
        updates: 
          strategy: retain
      status: 
        conditions: 
        - message: 'unable to create cache with template: unexpected HTTP status code (400):
            unexpected error creating cache, response: ISPN000571: RELAY2 not found in the
            protocol stack. Cannot perform cross-site operations.'
          status: "False"
          type: Ready
      

      However, the ConfigListener then receives a CREATE cache event from the Server which causes the Cache CR to be reconfigured and the status to be reset to Ready=True.

      apiVersion: infinispan.org/v2alpha1
      kind: Cache
      metadata: 
        annotations: 
          infinispan.org/listener-generation: "2"
        creationTimestamp: "2023-06-02T08:19:15Z"
        finalizers: 
        - finalizer.infinispan.org
        generation: 2
        labels: 
          test-name: TestCacheCR
        name: cache-xsite
        namespace: namespace-for-testing
        resourceVersion: "10227"
        uid: 36080a6e-432f-4f85-8f6b-a695148991a3
      spec: 
        clusterName: test-cache-cr
        name: cache-xsite
        template: "distributedCache: \n  mode: \"ASYNC\"\n  statistics: \"true\"\n  backups: 
          \n    SiteA: \n      backup: \n        strategy: \"ASYNC\"\n        stateTransfer: 
          \n          chunkSize: \"600\"\n          timeout: \"2400000\"\n          maxRetries: 
          \"30\"\n          waitTime: \"2000\"\n          mode: \"AUTO\"\n        takeOffline: 
          \n          afterFailures: \"5\"\n          minWait: \"10000\"\n  encoding: \n
          \   mediaType: \"application/x-protostream\"\n  expiration: \n    lifespan: \"600000\"\n
          \ memory: \n    storage: \"BINARY\"\n    maxSize: \"400000000\"\n  stateTransfer: 
          \n    enabled: \"false\"\n"
        updates: 
          strategy: retain
      status: 
        conditions: 
        - status: "True"
          type: Ready
      

      The reason for this is that the cache get's created on the server, but only fails on startup, causing a Cache CREATED event to be triggered. The REST endpoint returns the error as expected and the Operator Cache CR controller correctly reports that the Cache failed and sets Ready=False. However, the ConfigListener then receives a Cache CREATED event via the ContainerResource#ConfigurationListener which sets Ready=True.

      We should add an additional check to the Cache controller to inspect the health of a cache before setting Ready=True.

              remerson@redhat.com Ryan Emerson
              remerson@redhat.com Ryan Emerson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: