Notes on MPI for LCG The main problem with the current version of edg-lcg with MPI is that OpenPBS moves files only from CE to ONE WN. In the case of MPI it needs to move files to the set of WNs choosen for the job. The script we developed does this job in a simple case: when the program to be executed will stay in the job directory, that is in the default directory where the PBS puts the job files. Then the script will copy all job subdirectory from the WN where the job is executed to all the others in the set choosen for the job. To use MPI without shared home what you need is: 1) have the last openpbs from EDG-LCG patched to workwith SSH without shared home 2) configure SSH based on HostBasedAutentication so that every WN can contact every other WN 3) have the script we developed. Point 1 is satisfied if you are updated to the last EDG-LCG release Point 2 You have to configure every WN as in Grid-IT the CE is. The file /etc/ssh/sshd_config must contain at least the following rows: row 1>HostbasedAuthentication yes row 2>IgnoreUserKnownHosts yes row 3>IgnoreRhosts yes The file /etc/ssh/ssh_config must contain the foollowing row row> HostbasedAuthentication yes in the section Host* The file /etc/ssh_known_hosts2 must contain the public key of all the WN and the CE in the site and must be replicated on every computer The file /etc/ssh/shosts.equiv must contain the list of the simbolic name of WNs and CE Point 3 To start an MPI job you have to prepare a JDL as the following JobType = "MPICH"; NodeNumber = 4; Executable = "MPItest.sh"; Arguments = "executable.exe"; StdOutput = "test.out"; StdError = "test.err"; OutputSandbox = { "test.err", "test.out", "executable.out" }; RetryCount = 7; Type = "Job"; InputSandbox = { "MPItest.sh", "executable.exe" }; where MPItest.sh is the following script: #!/bin/bash echo "********************************************" echo $HOSTNAME echo "********************************************" echo $PBS_NODEFILE echo $PWD echo "********************************************" env echo "\n\n****************************************" for i in `cat $PBS_NODEFILE` do echo $i /usr/bin/rsync -v -az -e ssh `pwd` $i:$HOME/.mpi/ ssh $i chmod 777 `pwd`/$1 ssh $i ls -alR `pwd` echo "@@@@@@@@@@@@@@" done echo "\n\n****************************************" echo "Eseguo" chmod 777 $1 ls -l mpirun -np 4 -machinefile $PBS_NODEFILE `pwd`/$1 > executable.out that, how you can see, copy the current directory to the other WNs listed in the file referred from PBS_NODEFILE. This latter file is generated from PBS when a request of multiple CPUs arrive. WARNING: the number of nodes in JDL (NodeNumber = x) and in the script ( mpirun -np x .....) must be the same NOTES: For some strange and not well understood reason, after the first rsync the output of the script is lost. Then we suggest to stream the output of executable.exe to a file. With this structure we are able to execute MPI binary file remotely on PBS queues. Some of us are working on a similar structure for LSF. With a straightforward modification of the script you can send a source file, compile it on one node, copy on the other nodes and execute it. SUGGESTION Every site that complies this notes could publish on the Information Server the properties MPICHOK to acknowledge all the others. Giuseppe Andronico Giacinto Donvito