-
Bug
-
Resolution: Done
-
Major
-
4.0.6
-
None
Hello,
I am using the following configuration:
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"> <UDP /> <PING /> <MERGE3 /> <FD /> <VERIFY_SUSPECT /> <ASYM_ENCRYPT encrypt_entire_message="true" sym_keylength="128" sym_algorithm="AES/ECB/PKCS5Padding" asym_keylength="2048" asym_algorithm="RSA" /> <pbcast.NAKACK2 /> <UNICAST3 /> <pbcast.STABLE /> <FRAG2 /> <AUTH auth_class="org.jgroups.auth.X509Token" auth_value="auth" keystore_path="keystore.jks" keystore_password="pwd" cert_alias="alias" cipher_type="RSA" /> <pbcast.GMS /> </config>
I have 7 services, but will try to show logs for 2 ones, coordinator and some random node, and all the other nodes behave similarly.
Initially, when these nodes join the cluster, everything is fine.
The server is a shared machine with slow CPU and also slow HDD, so sometimes, when other applications are busy with their tasks, whole my cluster can get frozen for 3-5 minutes. During/in the end of this freeze, some service may tell me the following (in logs):
org.jgroups.protocols.FD up WARNING: node-26978: I was suspected by node-27291; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK WARNING: node-26978: unrecognized cipher; discarding message from node-27291 org.jgroups.protocols.Encrypt handleEncryptedMessage WARNING: node-26978: unrecognized cipher; discarding message from node-27291 org.jgroups.protocols.Encrypt handleEncryptedMessage WARNING: node-26978: unrecognized cipher; discarding message from node-36734 org.jgroups.protocols.Encrypt handleEncryptedMessage
so the node was kicked out from the cluster, as it became "suspect", but the node doesn't agree with that fact. Cluster coordinator has already changed sym private key, so in the further logs of this server I see "unrecognized cipher".
In cluster coordinator logs I see the following:
INFO: ISPN100000: Node node-26978 joined the cluster **** WARN: node-27291: unrecognized cipher; discarding message from node-26978 org.jgroups.logging.Slf4jLogImpl error ERROR: key requester node-26978 is not in current view [***]; ignoring key request org.jgroups.logging.Slf4jLogImpl warn WARN: node-27291: unrecognized cipher; discarding message from node-26978 INFO: ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[node-26978|8] (7) [node-26978, node-12721, node-17625, node-45936, node-56674, node-36734, node-27291], 2 subgroups: [node-27291|7] (6) [node-27291, node-12721, node-17625, node-45936, node-56674, node-36734], [node-27291|6] (7) [node-27291, node-26978, node-12721, node-17625, node-45936, node-56674, node-36734]
My understanding of what has happened:
For example I have 3 nodes
in the cluster. The cluster gets frozen for some minutes, so node
{C} becomes suspected, and kicked out from the cluster by coordinator. For some reason {C}ignores that fact. Later, after cluster is up again, it becomes ignoring messages from
{C}, because it is using ASYM encryption and private key has been re-generated by coordinator. Also, for some reason MERGE operation doesn't work, and {C}can not join back to cluster, and now cluster has 2 subgroups, that don't communicate to each other, and I don't fully understand why this happens.
How I temporary resolved this issue: changed ASYM_ENCRYPT to SYM_ENCRYPT, and now any node can come back to the cluster successfully after freeze, as the key doesn't change.
Also, I didn't test, but think change_key_on_leave="false" will help, but this is not the way I want to use.
So looks like this a problem with AUTH + ASYM_ENCRYPT protocol combination, when node in some cases can not rejoin the cluster.