[Contents] [Index] [Top] [Bottom] [Prev] [Next]


A. Vendor MPI Implementations

HP MPI

When you use mpirun in stand-alone mode, you provide it the host names to be used by the MPI job. To achieve better resource utilization, you can have LSF manage the allocation of hosts, coordinating the start-up phase with mpirun. This is done by preceding the regular HP MPI mpirun command with:

   % bsub pam -mpi 

Example: To run a single-host job and have the LSF Batch system select the host, the command:

   % mpirun -np 14 a.out 

is entered as:

   % bsub pam -mpi mpirun -np 14 a.out 

Example: To run a multi-host job and have the LSF Batch system select the hosts, the command:

   % mpirun -f appfile 

is entered as:

   % bsub pam -mpi mpirun -f appfile 

where appfile contains the following entries:

   -h foo -np 8 a.out
-h bar -np 4 b.out
-h foo -np 2 c.out
 

In this example, the hosts foo and bar are treated as symbolic names and refer to the actual hosts that the LSF Batch system allocates to the job. The a.out and c.out processes are guaranteed to run on the same host. The b.out processes may run on a different host, depending on the resources available and the LSF Batch system scheduling algorithms.

For a complete list of mpirun options and environment variable controls, refer to the mpirun man page and the HP MPI User's Guide version 1.4.

SGI MPI

The -mpi argument on the bsub and pam command-line is a replacement for mpirun in the HP environment. Everything after -mpi shall be exactly as it would normally appear if mpirun were being used.

Example: To run a the a.out job and have the LSF Batch system select the host, the command:

   % mpirun -np 4 a.out 

is entered as:

   % mpirun pam -mpi -np 4 a.out 

Example: To run a multihost job and have the LSF Batch system select the hosts, the following command:

   % mpirun -f appfile 

is entered as:

   % bsub pam -mpi -f appfile 

where appfile contains the following entries:

   foo -np 4 a.out
bar -np 4 b.out
foo -np 2 c.out 

For a complete list of mpirun options and environment variable controls refer to the mpirun man page.

SUN HPC MPI

When running LSF Batch jobs on Sun platforms, you can include the Sun-specific argument -sunhpc on the bsub command line, after any other bsub arguments. The following arguments to -sunhpc provide additional control over bsub behavior in a Sun HPC environment.

-n processes

Specify the number of processes to run. Note that the bsub -n argument specifies the number of CPUs to be used for the job.

Example: To start a 48-process interactive job on PAM-enabled queue hpc that will wrap over at least 4, and as many as 16, CPUs:

% bsub -I -n 4,16 -q hpc -sunhpc -n 48 jobname

Note

Setting the minimum number of CPUs to a number greater than 1 raises the possibility that, if there are fewer CPUs available than the minimum number you specify, the job may fail to start. In this example, if fewer than 4 CPUs are available, the job will not start. You can avoid this potential problem by setting the minimum number of CPUs to 1. However, this introduces the potential cost to performance of having the processes wrapped over a smaller number of CPUs.

-P host:port

Specify the PAM address of another job with which the new job should colocate. The PAM address is the TCP socket used for communications between the job and PAM.

Example: To start a 4-CPU interactive job on PAM-enabled queue hpc:

     % bsub -I -n 4 -q hpc -sunhpc -P Athos:123 jobname 

The new job is colocated with the job whose PAM is running on host Athos, using port 123.

-j job_ID

Specify the job ID of another job with which the new job should colocate.

   -J job_name 

Specify the job name of another job with which the new job should colocate.

-s

Specify that the job is to be spawned in the STOPPED state.

To identify processes in the STOPPED state, issue the ps command with the -el argument:

orpheus 215 => ps -el
F  S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY    TIME CMD
19 T 0 0 0 0 0 SY f0274e38 0 ? 0:00 sched

Here, the sched command is in STOPPED state, as indicated by the T entry in the S (State) column.

Note that, when spawning a process in the STOPPED state under LSF, the name of your program will not appear in the ps output. Instead, the stopped process will be identified as a RES daemon.

Example: To start a 1-CPU interactive job on PAM-enabled queue hpc, in the STOPPED state:

   % bsub -I -n 1 -q hpc -sunhpc -s jobname 



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.