-
Bug
-
Resolution: Done
-
Major
-
2.6.1
-
None
-
Windows Server 2003
-
Workaround Exists
-
In UDP with receive_on_all_interfaces="true", if any interface is not configured – say, the network cable is unplugged – then JGroups will refuse to start by throwing an exception (IP addresses changed to protect the innocent):
Caused by: org.jgroups.ChannelException: failed to start protocol stack
at org.jgroups.JChannel.startStack(JChannel.java:1445)
at org.jgroups.JChannel.connect(JChannel.java:356)
at org.jgroups.blocks.NotificationBus.start(NotificationBus.java:126)
at the application making use of JGroups
Caused by: java.lang.Exception: problem creating sockets (bind_addr=xxxxxxxx/11.22.33.44, mcast_addr=224.1.2.3:4444)
at org.jgroups.protocols.UDP.start(UDP.java:363)
at org.jgroups.stack.Configurator.startProtocolStack(Configurator.java:75)
at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:301)
at org.jgroups.JChannel.startStack(JChannel.java:1442)
... 9 more
Caused by: java.net.SocketException: bad argument for IP_MULTICAST_IF2: No IP addresses bound to interface
at java.net.PlainDatagramSocketImpl.join(Ljava.net.InetAddress;Ljava.net.NetworkInterface;)V(Native Method)
at java.net.PlainDatagramSocketImpl.joinGroup(PlainDatagramSocketImpl.java:196)
at java.net.MulticastSocket.joinGroup(MulticastSocket.java:357)
at org.jgroups.protocols.UDP.bindToInterfaces(UDP.java:525)
at org.jgroups.protocols.UDP.createSockets(UDP.java:470)
at org.jgroups.protocols.UDP.start(UDP.java:359)
... 12 more
Simply taking every NIC whose status is "network cable unplugged" and disabling that interface allows JGroups to start.
This may be a feature request and not a bug report, but this seems unnecessarily strict. With the setting of receive_on_all_interfaces="true", if at least one interface comes up, then that should be enough for JGroups to function.
In the application where I encountered this, I have both receive_on_all_interfaces="true" and send_on_all_interfaces="false". The reason is that if send_on_all_interfaces is true but JGroups fails to be able to send a message, there is no notification of this failure. (That is, if the message cannot be sent on any interface at all.) By sending on one interface by receiving on all, I appear to get the best of all worlds where a cluster of machines with multiple NICs should be able to communicate with one another no matter what the binding order of the NICs and no matter what order Java presents the interfaces.
Except that receive_on_all_interfaces="true" requires EVERY NIC that is not disabled to be perfectly functioning or you cannot do anything at all. Which means unplug one NIC cable and now the application cannot function at all, despite the fact that there's another network on which the clustered machines are all available.
Preferred behavior: If ** at least one ** interface successfully opens a socket when receive_on_all_interfaces="true", then succeed. If ** all ** interfaces fail as shown above, then throw the exception shown above.