Friday, August 28, 2009

MPJ Express on Infiniband

Courtesy: Zafar Gillani (http://hpc.seecs.edu.pk/~zafar/about.html) who worked on porting MPJ Express on Infiniband.

MPJ Express can be executed on InfiniBand in couple of ways (so far):

Use IPoIB: For this just provide IP addresses in the "machines" file alloted to the InfiniBand HCAs. Run the MPJ Express as on GigE or FE.

Over SDP: First create a configuration file somewhere on the system (such as in ~/mpj-user/ directory). Lets say we created "sdp.conf". There are two approaches to this, either write bind rule or connect rule in "sdp.conf" file (as explained below). Comments are represented via a # sign.

bind
example: bind 192.168.2.1 *

connect
example: connect 192.168.2.1 *

* in port means it can use any free port on runtime. The port can be defined in range as well such as 15000-*. A separate bind rule has to be defined for each compute node that has configured IB HCA. This is sort of a machines file:

bind 192.168.2.1 *
bind 192.168.2.2 *
bind 192.168.2.3 *
and so on

When to use bind rule? SDP protocol transport should be used when a TCP socket binds to an address and port that matches the rule. In Java this is equivalent to socket.bind(SocketAddress). Bind rule is recommended since this will explicitly bind an unbound socket to an IP address using the SDP protocol.

When to use connect rule? SDP protocol transport should be used when an unbound TCP socket attempts to connect to an address and port that matches the rule.

To execute MPJ Express on InfiniBand simply use the following command:
mpjrun.sh -Dcom.sun.sdp.conf=sdp.conf -Djava.net.preferIPv4Stack=true -np 2 Application.

Switch -Djava.net.preferIPv4Stack=true is optional but is recommended since this will explicitly tell JVM to use IPv4. This prevents Java from using IPv6 if IPv6 is enabled on IB HCAs (InfiniBand Host Channel Adaptor analogous to NICs).