[ISPN-2550] NoSuchElementException in Hot Rod Encoder

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 5.2.0.Final
Affects Version/s: 5.2.0.Final
Component/s: Remote Protocols
Labels:
None

Sprint:
Beta6
Git Pull Request:
https://github.com/infinispan/infinispan/pull/1536
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=886565

Tomas noticed this a while ago in a specific functional test:
https://bugzilla.redhat.com/show_bug.cgi?id=875151

I'm creating a more general JIRA, cause I'm having this in resilience test.

What I found by quick debug, is that here:

https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/Encoders.scala#L106

               for (segmentIdx <- 0 until numSegments) {
                  val denormalizedSegmentHashIds = allDenormalizedHashIds(segmentIdx)
                  val segmentOwners = ch.locateOwnersForSegment(segmentIdx)
                  for (ownerIdx <- 0 until segmentOwners.length) {
                     val address = segmentOwners(ownerIdx % segmentOwners.size)
                     val serverAddress = members(address)
                     val hashId = denormalizedSegmentHashIds(ownerIdx)
                     log.tracef("Writing hash id %d for %s:%s", hashId, serverAddress.host, serverAddress.port)
                     writeString(serverAddress.host, buf)
                     writeUnsignedShort(serverAddress.port, buf)
                     buf.writeInt(hashId)
                  }
               }

we're trying to obtain serverAddress for nonexistent address and NoSuchElementException is not handled properly.
It hapens after I kill a node in a resilience test and the exception appears when querying for the node in the members cache.

causes

ISPN-5314 EventSocketTimeoutTest.testSocketTimeoutWithEvent randomly failing

Closed

incorporates

ISPN-2624 JDG: Storage-only example: HotRodDecoder: NSEE: key not found: node1/clustered

Closed

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565

Verified for 6.1.0.ER8

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565 Verified for 6.1.0.ER8

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Sorry Michal, I didn't refresh the JIRA page before posting my comment.

I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM Sorry Michal, I didn't refresh the JIRA page before posting my comment. I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run:

http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png
only expected exceptions:
http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/

there is still problem with uneven request balancing (~~ISPN-2632~~) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png only expected exceptions: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/ there is still problem with uneven request balancing ( ISPN-2632 ) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

as I say ~~ISPN-2642~~ didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM as I say ISPN-2642 didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

@Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then?

@Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ~~ISPN-2642~~ as well, have you tested it?

Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM @Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then? @Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ISPN-2642 as well, have you tested it?

Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch
and ran resilience tests in hyperion

http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png

the issues ~~ISPN-2550~~ and ~~ISPN-2642~~ didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch and ran resilience tests in hyperion http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png the issues ISPN-2550 and ISPN-2642 didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

Michal Linhard (Inactive) added a comment - 2012/12/14 12:08 PM

The IndexOutOfBoundsException appears independently of dan's fix so I created ~~ISPN-2642~~

Michal Linhard (Inactive) added a comment - 2012/12/14 12:08 PM The IndexOutOfBoundsException appears independently of dan's fix so I created ISPN-2642

Galder Zamarreño added a comment - 2012/12/14 5:08 AM

Plus, if there really is an when a node joins in (as opposed to killing), your fix won't work and would result in inbalances in the cluster... but let's not make judgements, let's see what ~~ISPN-2624~~ is about and then we talk...

Galder Zamarreño added a comment - 2012/12/14 5:08 AM Plus, if there really is an when a node joins in (as opposed to killing), your fix won't work and would result in inbalances in the cluster... but let's not make judgements, let's see what ISPN-2624 is about and then we talk...

Galder Zamarreño added a comment - 2012/12/14 5:07 AM

@Dan, ~~ISPN-2624~~ is a different scenario. Happens when node starts up and one of the nodes is apparently set up for storage only (no Netty endpoint). To avoid confusion, I'm treating it as a different case right now, cos it smells like a misconfiguration. Michal's case is about killing nodes.

Galder Zamarreño added a comment - 2012/12/14 5:07 AM @Dan, ISPN-2624 is a different scenario. Happens when node starts up and one of the nodes is apparently set up for storage only (no Netty endpoint). To avoid confusion, I'm treating it as a different case right now, cos it smells like a misconfiguration. Michal's case is about killing nodes.

Dan Berindei (Inactive) added a comment - 2012/12/13 12:05 PM - edited

What is the issue in ~~ISPN-2624~~? The subject looks the same to me

Michal, yes, the commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 was intended to fix the IndexOutOfBoundsException.

Dan Berindei (Inactive) added a comment - 2012/12/13 12:05 PM - edited What is the issue in ISPN-2624 ? The subject looks the same to me Michal, yes, the commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 was intended to fix the IndexOutOfBoundsException.

Galder Zamarreño added a comment - 2012/12/13 6:50 AM

Tomas' functional issue has now been separated into ~~ISPN-2624~~, leaving this JIRA fully focused on the situation when the nodes are killed.

Galder Zamarreño added a comment - 2012/12/13 6:50 AM Tomas' functional issue has now been separated into ISPN-2624 , leaving this JIRA fully focused on the situation when the nodes are killed.

Galder Zamarreño added a comment - 2012/12/12 9:46 AM

Tomas, seems like the config that you provided works fine as storage only.

Can you create a separate issue to follow your issue? Don't wanna mix with node kill issue.

Also, if you can replicate the issue again and provide JDG version information, TRACE logs...etc? Can you try to replicate the issue on master of JDG too?

Galder Zamarreño added a comment - 2012/12/12 9:46 AM Tomas, seems like the config that you provided works fine as storage only. Can you create a separate issue to follow your issue? Don't wanna mix with node kill issue. Also, if you can replicate the issue again and provide JDG version information, TRACE logs...etc? Can you try to replicate the issue on master of JDG too?

Michal Linhard (Inactive) added a comment - 2012/12/12 9:39 AM

The IndexOutOfBoundsException was found when running with https://github.com/danberindei/infinispan/commit/c3325b134704016fa556343529d6a3a5b9a96bcb

btw now i can see another commit on the t_2550_m branch, would it still be helpful to test with it ?

Michal Linhard (Inactive) added a comment - 2012/12/12 9:39 AM The IndexOutOfBoundsException was found when running with https://github.com/danberindei/infinispan/commit/c3325b134704016fa556343529d6a3a5b9a96bcb btw now i can see another commit on the t_2550_m branch, would it still be helpful to test with it ?

Dan Berindei (Inactive) added a comment - 2012/12/12 9:07 AM

Michal, what is the last commit you had when you ran the test?

Dan Berindei (Inactive) added a comment - 2012/12/12 9:07 AM Michal, what is the last commit you had when you ran the test?

RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 875151 from NEW to ASSIGNED

RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 875151 from NEW to ASSIGNED

Michal Linhard (Inactive) added a comment - 2012/12/11 10:34 AM

900MB of tasty tracelogs from runs with 5.2.0.Beta5 (resilience tests on hudson / perflab)

http://www.qa.jboss.com/~mlinhard/test_results/serverlogs-trace-ispn2550.zip

njoy!

Michal Linhard (Inactive) added a comment - 2012/12/11 10:34 AM 900MB of tasty tracelogs from runs with 5.2.0.Beta5 (resilience tests on hudson / perflab) http://www.qa.jboss.com/~mlinhard/test_results/serverlogs-trace-ispn2550.zip njoy!

Michal Linhard (Inactive) added a comment - 2012/12/11 9:23 AM

Dan, I wanted to try your change, but I don't see any further commit on the branch https://github.com/danberindei/infinispan/tree/t_2550_m

Michal Linhard (Inactive) added a comment - 2012/12/11 9:23 AM Dan, I wanted to try your change, but I don't see any further commit on the branch https://github.com/danberindei/infinispan/tree/t_2550_m

Dan Berindei (Inactive) added a comment - 2012/12/11 8:14 AM

The IndexOutOfBoundsException seems to appear because we're generating numOwners (2) "denormalized" hash ids for each segment, but the consistent hash has more than owners for one segment (3). This can happen during a join, when the write CH is a union between the previous CH and the new, balanced, CH.

Tomas, I've updated my branch to use the read CH instead, could you try again?

Dan Berindei (Inactive) added a comment - 2012/12/11 8:14 AM The IndexOutOfBoundsException seems to appear because we're generating numOwners (2) "denormalized" hash ids for each segment, but the consistent hash has more than owners for one segment (3). This can happen during a join, when the write CH is a union between the previous CH and the new, balanced, CH. Tomas, I've updated my branch to use the read CH instead, could you try again?

RH Bugzilla Integration added a comment - 2012/12/11 8:08 AM

Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 875151

Yes, RCMs get the server list dynamically from the servers. However only the servers with an endpoint should add their address to the list.

RH Bugzilla Integration added a comment - 2012/12/11 8:08 AM Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 875151 Yes, RCMs get the server list dynamically from the servers. However only the servers with an endpoint should add their address to the list.

RH Bugzilla Integration added a comment - 2012/12/11 8:00 AM

Martin Gencur <mgencur@redhat.com> made a comment on bug 875151

Just a note about the test: When we create a RemoteCacheManager and passing just one address to it, it does not mean that all requsts through cache.put/get will go just to this one address but possibly to all nodes in the cluster. Is that right? AFAIK the HotRod client is dynamically getting the information about all clustered nodes and autonomously chooses one of the cluster nodes to send requests to. If my assumption is correct, we would need to use Memcached or REST client to properly test the storage-only example, not HotRod.

RH Bugzilla Integration added a comment - 2012/12/11 8:00 AM Martin Gencur <mgencur@redhat.com> made a comment on bug 875151 Just a note about the test: When we create a RemoteCacheManager and passing just one address to it, it does not mean that all requsts through cache.put/get will go just to this one address but possibly to all nodes in the cluster. Is that right? AFAIK the HotRod client is dynamically getting the information about all clustered nodes and autonomously chooses one of the cluster nodes to send requests to. If my assumption is correct, we would need to use Memcached or REST client to properly test the storage-only example, not HotRod.

RH Bugzilla Integration added a comment - 2012/12/11 6:22 AM

Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

RH Bugzilla Integration added a comment - 2012/12/11 6:22 AM Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

Galder Zamarreño added a comment - 2012/12/11 6:21 AM

Hey dberinde@redhat.com, can you check that IndexOutOfBoundsException issue? Let's see if Michal can upload TRACE.

NadirX, Tomas issue appears to show a storage only node (which shouldn't have any endpoints, log ending in 49...) responding to a client request, so the endpoint is somehow active. Can you check the JDG configuration he's using to see if there's any issues there?

Galder Zamarreño added a comment - 2012/12/11 6:21 AM Hey dberinde@redhat.com , can you check that IndexOutOfBoundsException issue? Let's see if Michal can upload TRACE. NadirX , Tomas issue appears to show a storage only node (which shouldn't have any endpoints, log ending in 49...) responding to a client request, so the endpoint is somehow active. Can you check the JDG configuration he's using to see if there's any issues there?

RH Bugzilla Integration added a comment - 2012/12/11 6:21 AM

Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

RH Bugzilla Integration added a comment - 2012/12/11 6:21 AM Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

RH Bugzilla Integration added a comment - 2012/12/11 5:45 AM

Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/config-examples/standalone-storage-only/src/test/java/com/jboss/datagrid/test/examples/StorageOnlyConfigExampleTest.java

RH Bugzilla Integration added a comment - 2012/12/11 5:45 AM Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151 https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/config-examples/standalone-storage-only/src/test/java/com/jboss/datagrid/test/examples/StorageOnlyConfigExampleTest.java

Michal Linhard (Inactive) added a comment - 2012/12/11 5:38 AM

I've run tests locally with dan's fix and I'm seeing these exceptions:

11:19:23,919 ERROR [org.infinispan.server.hotrod.HotRodDecoder] (HotRodClientMaster-5) ISPN005009: Unexpected error before any request parameters read
java.lang.IndexOutOfBoundsException: 2
	at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:44)
	at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:96)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:92)
	at scala.collection.immutable.Range.foreach(Range.scala:81)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1.apply$mcVI$sp(AbstractTopologyAwareEncoder1x.scala:92)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x.writeHashTopologyHeader(AbstractTopologyAwareEncoder1x.scala:89)
	at org.infinispan.server.hotrod.AbstractEncoder1x.writeHeader(AbstractEncoder1x.scala:62)
	at org.infinispan.server.hotrod.HotRodEncoder.encode(HotRodEncoder.scala:63)
	at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:67)
	at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60)
	at org.jboss.netty.channel.Channels.write(Channels.java:712)
	at org.jboss.netty.channel.Channels.write(Channels.java:679)
	at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
	at org.infinispan.server.core.AbstractProtocolDecoder.exceptionCaught(AbstractProtocolDecoder.scala:295)
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
	at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:49)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Michal Linhard (Inactive) added a comment - 2012/12/11 5:38 AM I've run tests locally with dan's fix and I'm seeing these exceptions: 11:19:23,919 ERROR [org.infinispan.server.hotrod.HotRodDecoder] (HotRodClientMaster-5) ISPN005009: Unexpected error before any request parameters read java.lang.IndexOutOfBoundsException: 2 at scala.collection.mutable.ResizableArray$ class. apply(ResizableArray.scala:44) at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:96) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:92) at scala.collection.immutable.Range.foreach(Range.scala:81) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1.apply$mcVI$sp(AbstractTopologyAwareEncoder1x.scala:92) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x.writeHashTopologyHeader(AbstractTopologyAwareEncoder1x.scala:89) at org.infinispan.server.hotrod.AbstractEncoder1x.writeHeader(AbstractEncoder1x.scala:62) at org.infinispan.server.hotrod.HotRodEncoder.encode(HotRodEncoder.scala:63) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:67) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60) at org.jboss.netty.channel.Channels.write(Channels.java:712) at org.jboss.netty.channel.Channels.write(Channels.java:679) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at org.infinispan.server.core.AbstractProtocolDecoder.exceptionCaught(AbstractProtocolDecoder.scala:295) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533) at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:49) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662)

RH Bugzilla Integration added a comment - 2012/12/11 3:59 AM

Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

I attached surefire report from our test suite.
Galder, please, see test: trunk/remote/config-examples/standalone-storage-only/src/test/java/com.jboss.datagrid.test.examples.StorageOnlyConfigExampleTest.java

It is failing on line 73: rc1.put("k", "v");

This put caused attached stack trace.
We are starting one JDG server with standalone-ha.xml and the second JDG with standalone-storage-only.xml which you can find in jsgServer/docs/examples/configs.

RH Bugzilla Integration added a comment - 2012/12/11 3:59 AM Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151 I attached surefire report from our test suite. Galder, please, see test: trunk/remote/config-examples/standalone-storage-only/src/test/java/com.jboss.datagrid.test.examples.StorageOnlyConfigExampleTest.java It is failing on line 73: rc1.put("k", "v"); This put caused attached stack trace. We are starting one JDG server with standalone-ha.xml and the second JDG with standalone-storage-only.xml which you can find in jsgServer/docs/examples/configs.

Michal Linhard (Inactive) added a comment - 2012/12/10 12:19 PM

Tomas' tracelog shows exactly the same spot as my scenario: https://bugzilla.redhat.com/attachment.cgi?id=641649 (I'm not sure about his test scenario though)

Michal Linhard (Inactive) added a comment - 2012/12/10 12:19 PM Tomas' tracelog shows exactly the same spot as my scenario: https://bugzilla.redhat.com/attachment.cgi?id=641649 (I'm not sure about his test scenario though)

Galder Zamarreño added a comment - 2012/12/10 11:46 AM

Tomas, I was wondering which of the functional tests you had developed was failing, and where (stacktrace of failure...etc). The idea is to replicate that specific test in the Infinispan codebase. Thanks.

Galder Zamarreño added a comment - 2012/12/10 11:46 AM Tomas, I was wondering which of the functional tests you had developed was failing, and where (stacktrace of failure...etc). The idea is to replicate that specific test in the Infinispan codebase. Thanks.

RH Bugzilla Integration added a comment - 2012/12/10 10:15 AM

Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151

Hi Galder,

I experience this problem in our functional test suite for remote mode (server).

preNOTE: you probably don'e need to install Arquillian project as it's CR1 is published already.
preNOTE: you need to create empty directory with name "bundles" in edg0/ edg1/ etc.

Please see this doc: https://docspace.corp.redhat.com/docs/DOC-87715
Download our tests from svn and run this specific test. (for storage only example)

Just sink to edgTest/trunk/remote and run

mvn -s ~/programs/eclipseWorkspace/settings_mead_jdg_plus_local.xml clean verify -Dstack=udp -pl config-examples/standalone-storage-only -Dnode0.edghome=/home/tsykora/edg0 -Dnode1.edghome=/home/tsykora/edg1 -Dnode2.edghome=/home/tsykora/edg2 -Dmaven.test.failure.ignore=true

NOTE: this user specific mvn setting file (-s) is pointing to my "local" repo which is comming with regular ER builds. You can ignore it and simply run this with these settings using MEAD repo:

https://svn.devel.redhat.com/repos/jboss-qa/jdg/scripts/settings_mead_jdg.xml

You can obtain latest JDG server from here: http://download.lab.bos.redhat.com/devel/jdg/stage/JDG-6.1.0-ER5/

I hope I didn't forget anything. In case of any problem, anything, let me know.

RH Bugzilla Integration added a comment - 2012/12/10 10:15 AM Tomas Sykora <tsykora@redhat.com> made a comment on bug 875151 Hi Galder, I experience this problem in our functional test suite for remote mode (server). preNOTE: you probably don'e need to install Arquillian project as it's CR1 is published already. preNOTE: you need to create empty directory with name "bundles" in edg0/ edg1/ etc. Please see this doc: https://docspace.corp.redhat.com/docs/DOC-87715 Download our tests from svn and run this specific test. (for storage only example) Just sink to edgTest/trunk/remote and run mvn -s ~/programs/eclipseWorkspace/settings_mead_jdg_plus_local.xml clean verify -Dstack=udp -pl config-examples/standalone-storage-only -Dnode0.edghome=/home/tsykora/edg0 -Dnode1.edghome=/home/tsykora/edg1 -Dnode2.edghome=/home/tsykora/edg2 -Dmaven.test.failure.ignore=true NOTE: this user specific mvn setting file (-s) is pointing to my "local" repo which is comming with regular ER builds. You can ignore it and simply run this with these settings using MEAD repo: https://svn.devel.redhat.com/repos/jboss-qa/jdg/scripts/settings_mead_jdg.xml You can obtain latest JDG server from here: http://download.lab.bos.redhat.com/devel/jdg/stage/JDG-6.1.0-ER5/ I hope I didn't forget anything. In case of any problem, anything, let me know.

Michal Linhard (Inactive) added a comment - 2012/12/10 10:08 AM

Right, that's true, I've just spoken with Tomas, he's gonna supply the way how to test this in his scenario.
I'll try to test dan's fix as well.

Michal Linhard (Inactive) added a comment - 2012/12/10 10:08 AM Right, that's true, I've just spoken with Tomas, he's gonna supply the way how to test this in his scenario. I'll try to test dan's fix as well.

Galder Zamarreño added a comment - 2012/12/10 9:27 AM

Michal, in the beginning you mentioned something about Tomas finding this in a functional test, that's the test I'm looking for

Also, if you can replicate the issue easily, can you try Dan's fix to see if it works?

Galder Zamarreño added a comment - 2012/12/10 9:27 AM Michal, in the beginning you mentioned something about Tomas finding this in a functional test, that's the test I'm looking for Also, if you can replicate the issue easily, can you try Dan's fix to see if it works?

Michal Linhard (Inactive) added a comment - 2012/12/06 5:02 AM

And one more important thing: during the whole test, constant small load of multiple hotrod clients is applied. I think I had to have at least 10 locally for the bug to appear. Seems like it happens when they're receiving the new topology and it fails for some of them...

Michal Linhard (Inactive) added a comment - 2012/12/06 5:02 AM And one more important thing: during the whole test, constant small load of multiple hotrod clients is applied. I think I had to have at least 10 locally for the bug to appear. Seems like it happens when they're receiving the new topology and it fails for some of them...

Michal Linhard (Inactive) added a comment - 2012/12/06 4:47 AM

I found this using resilience test that's implemented in the distributed smartfrog framework that we run in our perflab , I don't have it in a simple test method.

What it does is this:
1. start 4 nodes
2. let them run 5 min
3. kill node2
4. wait for cluster of node1,node3,node4
5. wait 5 min
6. start node2
7. wait for cluster node1 - node4
8. wait 5 min

the exception happens in step 3 right after killing the node2.
I also managed to reproduce this locally running 4 nodes on my laptop - that's how I debugged it.

Michal Linhard (Inactive) added a comment - 2012/12/06 4:47 AM I found this using resilience test that's implemented in the distributed smartfrog framework that we run in our perflab , I don't have it in a simple test method. What it does is this: 1. start 4 nodes 2. let them run 5 min 3. kill node2 4. wait for cluster of node1,node3,node4 5. wait 5 min 6. start node2 7. wait for cluster node1 - node4 8. wait 5 min the exception happens in step 3 right after killing the node2. I also managed to reproduce this locally running 4 nodes on my laptop - that's how I debugged it.

Galder Zamarreño added a comment - 2012/12/05 1:24 PM

Michal, can you share the test so that we can map it to an Infinispan unit test and verify Dan's fix?

Galder Zamarreño added a comment - 2012/12/05 1:24 PM Michal, can you share the test so that we can map it to an Infinispan unit test and verify Dan's fix?

Galder Zamarreño added a comment - 2012/12/05 12:18 PM

Dan, did you check the functional test Michal's referring to? You might be able to create a test out of that? I'm assigning to you since you're more familiar with these changes.

Galder Zamarreño added a comment - 2012/12/05 12:18 PM Dan, did you check the functional test Michal's referring to? You might be able to create a test out of that? I'm assigning to you since you're more familiar with these changes.

Dan Berindei (Inactive) added a comment - 2012/12/04 8:54 AM

Galder, I think I have a fix for this issue: https://github.com/danberindei/infinispan/commit/3712ffac1ec1503f17b3f9de022bfc98a20b90e1

The problem is that I don't have a test to go with it, so I'm not sure if it really works. So I'm not issuing a PR, but I'm leaving it here for reference.

Dan Berindei (Inactive) added a comment - 2012/12/04 8:54 AM Galder, I think I have a fix for this issue: https://github.com/danberindei/infinispan/commit/3712ffac1ec1503f17b3f9de022bfc98a20b90e1 The problem is that I don't have a test to go with it, so I'm not sure if it really works. So I'm not issuing a PR, but I'm leaving it here for reference.

RH Bugzilla Integration added a comment - 2012/11/26 11:03 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 875151

I'm seeing this in resilience tests for 6.1.0.ER4,
I've created more general JIRA for this.

RH Bugzilla Integration added a comment - 2012/11/26 11:03 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 875151 I'm seeing this in resilience tests for 6.1.0.ER4, I've created more general JIRA for this.

Assignee:: Dan Berindei (Inactive)

Reporter:: Michal Linhard (Inactive)

Archiver:: Amol Dongare

Created:: 2012/11/26 10:59 AM

Updated:: 2024/07/15 12:38 PM

Resolved:: 2013/01/03 6:04 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/14 12:08 PM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/14 12:08 PM

Collapse comment: Galder Zamarreño added a comment - 2012/12/14 5:08 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/14 5:08 AM

Collapse comment: Galder Zamarreño added a comment - 2012/12/14 5:07 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/14 5:07 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/13 12:05 PM, Edited by Dan Berindei - 2012/12/13 12:07 PM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/13 12:05 PM, Edited by Dan Berindei - 2012/12/13 12:07 PM

Collapse comment: Galder Zamarreño added a comment - 2012/12/13 6:50 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/13 6:50 AM

Collapse comment: Galder Zamarreño added a comment - 2012/12/12 9:46 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/12 9:46 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/12 9:39 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/12 9:39 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/12 9:07 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/12 9:07 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/11 10:34 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/11 10:34 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/11 9:23 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/11 9:23 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/11 8:14 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/11 8:14 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 8:08 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 8:08 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 8:00 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 8:00 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 6:22 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 6:22 AM

Collapse comment: Galder Zamarreño added a comment - 2012/12/11 6:21 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/11 6:21 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 6:21 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 6:21 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 5:45 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 5:45 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/11 5:38 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/11 5:38 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/11 3:59 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/11 3:59 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/10 12:19 PM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/10 12:19 PM

Collapse comment: Galder Zamarreño added a comment - 2012/12/10 11:46 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/10 11:46 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/10 10:15 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/10 10:15 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/10 10:08 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/10 10:08 AM

Collapse comment: Galder Zamarreño added a comment - 2012/12/10 9:27 AM

Expand comment: Galder Zamarreño added a comment - 2012/12/10 9:27 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/06 5:02 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/06 5:02 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/06 4:47 AM