-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
8.1.0.GA-CR4, 8.1.0.Beta
-
False
-
-
False
-
User Experience
-
-
-
-
-
-
+
-
-
An EAP 8.1.0 Beta + Red Hat Datagrid 8.5.3.GA interoperability test on OpenShift that validates EAP behavior against remote RHDG failover fails intermittently, signaling cache inconsistencies:
java.lang.AssertionError: 1 expectation failed. JSON path value doesn't match. Expected: is "10" Actual: null at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480) at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:73) at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:60) at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:86) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:57) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277) at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure.validate(ResponseSpecificationImpl.groovy:512) at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure$validate$1.call(Unknown Source) at io.restassured.internal.ResponseSpecificationImpl.validateResponseIfRequired(ResponseSpecificationImpl.groovy:696) at io.restassured.internal.ResponseSpecificationImpl.this$2$validateResponseIfRequired(ResponseSpecificationImpl.groovy) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:198) at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:62) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185) at io.restassured.internal.ResponseSpecificationImpl.body(ResponseSpecificationImpl.groovy:270) at io.restassured.specification.ResponseSpecification$body$1.callCurrent(Unknown Source) at io.restassured.internal.ResponseSpecificationImpl.body(ResponseSpecificationImpl.groovy:117) at io.restassured.internal.ValidatableResponseOptionsImpl.body(ValidatableResponseOptionsImpl.java:244) at org.jboss.qa.appsint.tests.eap.rhdg.eap8.session.offload.Eap8WebCacheOffloadedToOperatorRhdgTests.testValue(Eap8WebCacheOffloadedToOperatorRhdgTests.java:262) at org.jboss.qa.appsint.tests.eap.rhdg.eap8.session.offload.Eap8WebCacheOffloadedToOperatorRhdgTests.rhdgFailover(Eap8WebCacheOffloadedToOperatorRhdgTests.java:188) ...
The deployment is built via the EAP Maven plugin with the cloud-default-config layer, plus the web-clustering, ejb, and ejb-dist-cache, and excluding the ejb-local-cache layer.
The infinispan subsystem is configured to connect via HotRod:
/socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=rhdg:add(host=${env.JDG_HOST}, port=${env.JDG_PORT}) /subsystem=infinispan/remote-cache-container=rhdg-container:add(default-remote-cluster=data-grid-cluster) /subsystem=infinispan/remote-cache-container=rhdg-container/remote-cluster=data-grid-cluster:add(socket-bindings=[rhdg]) /subsystem=infinispan/cache-container=web/invalidation-cache=rhdg-cache:add() /subsystem=infinispan/cache-container=web/invalidation-cache=rhdg-cache/store=hotrod:add(remote-cache-container=rhdg-container,fetch-state=false,purge=false,passivation=false,shared=true) /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache,value=rhdg-cache) /subsystem=infinispan/remote-cache-container=rhdg-container:write-attribute(name=properties, value={infinispan.client.hotrod.auth_realm=default,infinispan.client.hotrod.use_auth=true,infinispan.client.hotrod.auth_username=${env.CACHE_USERNAME},infinispan.client.hotrod.auth_password=${env.CACHE_PASSWORD},infinispan.client.hotrod.auth_server_name=rhdg-host,infinispan.client.hotrod.sasl_properties.javax.security.sasl.qop=auth,infinispan.client.hotrod.sasl_mechanism=SCRAM-SHA-512,infinispan.client.hotrod.sni_host_name=rhdg-host,infinispan.client.hotrod.ssl_hostname_validation=false,infinispan.client.hotrod.trust_store_path=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt,})
The test logic is about creating an EAP cluster that offloads a web session cache to a RHDG cluster, and checking that the expected values are stored in the cache when an RHDG instance is ungracefully stopped and the related 3 replicas cluster scaled down to 2 immediately after that.
This is similar to JBEAP-29870, but about an RHDG failover scenario, rather than an EAP one.
The overall configuration (layers + infinispan subsystem) has been validated already by developers, so we're setting this as a blocker for 8.1.0 GA.
Regarding the test logic, here's a source code fragment, enriched with numbered comments to emphasize the most relevant steps:
// 1. start a 2 replicas RHDG cluster, then - once it's well-formed - starting a 2 replicas EAP cluster setInitialClustersReplicas(); List<Pod> pods = rhdgOpenShiftProvisioner.getPods(); // 2. get a reference to the RHDG pod that will be deleted Pod podToFail = pods.get(0); log.debug("The \"{}\" pod will be terminated ungracefully to simulate Infinispan/RHDG failover", podToFail.getMetadata().getName()); // 3. store a web session value, which is persisted to the remote Infinispan cache RequestSpecification session = RestAssured.given().accept(ContentType.JSON) .filter(new SessionFilter()); putValue(session, 10); // 4. as noted in https://issues.redhat.com/browse/JBEAP-29870 - here we need to add a sleep period for the // pod deletion since it is not guaranteed that data was successfully replicated/persisted prior to abrupt pod // deletion, which would make the test fail intermittently. Thread.sleep(PAUSE_TO_ALLOW_DATA_REPLICATION_IN_SECONDS * 1000); testValue(session, 10); // 4. scaling the RHDG cluster up to 3 replicas log.debug("Scaling Infinispan/RHDG cluster up to 3 replicas..."); rhdgOpenShiftProvisioner.scale(3, true); // 5. deleting the first RHDG pod // killing the first pod will cause the RHDG Operator to try and redeploy it rhdgOpenShiftProvisioner.getOpenShift().deletePod(podToFail); // 6. scaling the RHDG cluster dow to 2 replicas immediately after the pod deletion // but here we scale down to 2, so the operator should: // a. react to the #0-pod deletion by spinning it up again // b. once it's ready, react to the sale down request by deleting the #1-pod log.debug("Scaling Infinispan/RHDG cluster down to 2 replicas..."); rhdgOpenShiftProvisioner.scale(2, true); // 7. read the value, here's where the test is failing intermittently testValue(session, 10);
As a final note, both the EAP pods have clean logs at the end of the test execution, and the same applies to the 2 remaining RHDG pods.
Feel free to reach out for any additional details.