Sunday, November 10, 2013

Installing, configuring and using MPJ Express on Linux


This tutorial is aimed at installing MPJ Express software and configuring it on Linux machines. It covers steps and processes required to make MPJ Express up and running. We assume that java 1.6 or higher is installed on your system or cluster, you want to run MPJ Express. Additionally, if you want to compile source code of MPJ Express Apache ant 1.6.2 or higher should be installed along with Perl. (Note that these are optional and may not require if you don’t want to recompile MPJ Express).
As a first step download latest version of MPJ Express (current latest is 0.38). Once downloaded extract it in the directory (assume you have downloaded and extracted it under your home directory i.e export/home/mpj/mpj-v_038). Now you need to set the environment variables from MPJ Express. You can add these environment variables in .bashrc (note: If you are using Ubuntu OS then you need to define these variables at the start of the file,  else it will not be set properly and will cause problems)


Once the environment variables are set you can proceed further with writing a Hello World program in HelloWorld.java

Then you need to compile it. If you look at the source code the first statement is “import mpi.*;”, this statement is responsible for including all the MPI related functionality implemented by MPJ Express. With the import statement we can now initialize MPI environment using  “MPI.Init(args);” and can terminate using “MPI.Finalize();”. We can also use other methods like “MPI.COMM_WORLD.Rank();” which return the current rank of process and “MPI.COMM_WORLD.Size();” which return the total size of world. Now we need to compile this MPI code written using MPJ Express.

$javac -cp .:$MPJ_HOME/lib/mpj.jar HelloWorld.java

The –cp (the java classpath options) is used to specify the MPJ Express jar file which contain all the implemented libraries that will help in compiling the java source file. This should create a HellowWorld.class file if the compilation process succeeds.
So far we were able to set environment and compile java code with MPJ Express. Now we have to run the compile code with MPJ Express. In the current version of MPJ Express, it supports two types of configuration i.e. multicore mode and cluster mode.

Running MPJ Express in multicore mode
In multicore mode MPJ Express executes or run processes on same machine, having multiple cores or SMPs. In this mode no network communication is involved.  The communication is performed through shared memory between processes. Running MPJ Express in multicore mode is much simpler than other configurations. To run MPJ Express job in multicore mode execute the following command.
$ mpjrun.sh -np 4 HelloWorld

mpjrun.sh” is the command used to run MPJ Express jobs. It accepts different parameters. –np is used to specify the number of MPI (MPJ Express) processes to start. –np 4 means that to run HelloWorld program on 4 processes. In the above command we did not specified device (or configuration mode), the default is multicore mode. Following is the output of the above command.



Running MPJ Express in cluster mode

The second type of configuration or mode is cluster mode. The cluster mode uses network communication to do message passing. Currently there are two types of networks supported by MPJ Express i.e. niodev that uses java nio package to communicate and mxdev that runs on myrinet network. For this tutorial we will only use niodev. As we are running MPJ Express in cluster mode, which involve multiple node connected through network thus we need to start MPJ Daemon on the target machines. First create a file name "machines" and list down name of all the nodes (either IP address or Hostname). A sample machines file looks like

compute-0-1
compute-0-2


Now with help of mpjboot command we will start MPJ Daemon on target machines. For mpjboot to work properly, first enable ssh passwordless authentication, as mpjboot uses ssh to start MPJ Daemon on remote machines listed in machines file (Note we assume that cluster is using shared filesystem). 


The message above states that MPJ Daemon processes are started. But if you want to verify if daemon are running on the target machines you can do ssh onto the target node and with help of “ps aux” command you can verify if daemon has been started. If daemon are not running you should enable debugging and logging, which are saved in wrapper.log file. This can be done by following user guide Section 3. (you can contact MPJ Express mailing https://lists.sourceforge.net/lists/listinfo/mpjexpress-users list for your queries).    If MPJ Daemon is running properly than we can proceed forward and run MPJ Express job. For that we will use

$ mpjrun.sh -np 2 -dev niodev HelloWorld

mpjrun.sh –np option tells how many processes to launch, –dev switch specifies device to use, here we will use niodev, another device is mxdev (for myrinet network). For cluster mode mpjrun.sh also expect machines file to be placed in same directory where we are running this command. The output for the HelloWorld program in cluster mode is as follow.

MPJ Express (0.38) is started in the cluster configuration
Starting process <0> on <compute-0-1>
Starting process <1> on <compute-0-2>
Hi from <0>
Hi from <1>
Stopping process <0> on <compute-0-1>
Stopping process <1> on <compute-0-2>

If we want to shop MPJ Daemon we use

$ mpjhalt machines

This will stop the MPJ Daemon running on host listed in "machines" file.