Tuesday, May 12, 2009

Communicating Multi-dimensional arrays using MPJ Express

Many scientific applications utilize multi-dimensional arrays for storing data. Naturally in the parallel versions, there is a frequent requirement of communicating these multi-dimensional arrays.

Currently the MPJ Express version "directly" supports communicating basic datatypes to and from single dimension arrays. But, it is obviously possible to communicate multiple dimension arrays. Here we'll see how 2D arrays are communicated using the MPJ Express software.

There are mainly two ways of communicating 2D arrays. The first is to communicate this data using the MPI.OBJECT datatype. The second is to map (or flatten) the 2D array onto a 1D array and communicate normally. If you are looking for performance, the second option is recommended. The main reason is performance, the first option is severely hampered by the performance of Java's serialization.

The MultidimMatrix class communicates data from the 2D array using the MPI.OBJECT datatype.

The MultiAsSingleDimMatrix class stores the 2D array as a 1D array. By doing this, the data can be communicated using the datatype of the array---in our example, the MPI.DOUBLE datatype.

Friday, May 8, 2009

Overhead of using Multi-dimensional arrays in Java

As discussed many times in the related literature, the Java multidimensional arrays introduce a good amount of overhead because of the way they are stored in the memory. The purpose of this post is to quantify this performance overhead.

Let's first look at the matrix multiplication implementation that uses the multidimensional arrays (the 2D version).

Now let's look at the same code but implemented using single dimension array. Basically the two dimensional array is mapped onto the single dimension array (the 1D version).

Now put some performance results:

aamir@barq:~/tmp> java MultidimMatrix
time => 19.821604495
aamir@barq:~/tmp> java Matrix
time -> 11.928243662

The 1D code is 1.66 times faster than the 2D code.

Tuesday, May 5, 2009

Parallel Programming with Java

I recently gave a talk on "Parallel Programming with Java" to Masters students at the University of La Coruna, Spain.

This talk gives a good introduction on how to get started with the MPJ Express software.

Monday, May 4, 2009

Ports Used by the MPJ Express Software

There are three kinds of ports used by the MPJ Express software.
  1. Daemons Ports (where MPJ Express daemons listen on compute-nodes). The value of this can be changed in two steps:
  • Edit $MPJ_HOME/conf/wrapper.conf and search for property "wrapper.app.parameter.2=10000". Change it to whatever you want- lets call it X.
  • Now you start your parallel application, use -dport switch to specify X. Some thing like this - mpjrun.sh ... -dport X ..
  1. There are also ports used by MPJ Express runtime on the head-node. This is used to ship across the code. The default value of this is 15000. It can be changed by using the switch -sport Y - where `Y' is the port that you choose.
  2. Each MPJ Express process uses a port for communication with peers. This can be changed by using -mpjport Z switch to mpjrun.sh - the default value is 20000.