This chapter describes how to use some of the basic features of LSF. After following the examples in this chapter you should be able to use LSF for most of the everyday tasks.
Configuration options shown in the following examples, such as host types and model names, host CPU factors (representing relative processor speed), and resource names are examples only; your system likely has different values for these settings.
Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, etc.
LSF provides tools for users to get information about the system. The first command you want to use when you learn LSF is lsid
. This command tells you the version of LSF, the name of your LSF cluster, and the current master host.
% lsid
LSF 3.1, Dec 1, 1997
Copyright 1992-1997 Platform Computing Corporation My cluster name is test_cluster
My master name is hostA
To find out who your cluster administrator is and see a summary of your cluster, run the lsclusters
command:
% lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
test_cluster ok hostb lsf 6 6
If you are using the LSF MultiCluster product, you will see one line for each of the clusters that your local cluster is connected to in the output of lsclusters
.
The lsinfo
command lists all the resources available in the cluster.
% lsinfo
RESOURCE_NAME TYPE ORDER DESCRIPTION
r15s Numeric Inc 15-second CPU run queue length
r1m Numeric Inc 1-minute CPU run queue length (alias:cpu)
r15m Numeric Inc 15-minute CPU run queue length
ut Numeric Inc 1-minute CPU utilization (0.0 to 1.0)
pg Numeric Inc Paging rate (pages/second)
io Numeric Inc Disk IO rate (Kbytes/second)
ls Numeric Inc Number of login sessions (alias: login)
it Numeric Dec Idle time (minutes) (alias: idle)
tmp Numeric Dec Disk space in /tmp (Mbytes)
swp Numeric Dec Available swap space (Mbytes) (alias:swap)
mem Numeric Dec Available memory (Mbytes)
ncpus Numeric Dec Number of CPUs
ndisks Numeric Dec Number of local disks
maxmem Numeric Dec Maximum memory (Mbytes)
maxswp Numeric Dec Maximum swap space (Mbytes)
maxtmp Numeric Dec Maximum /tmp space (Mbytes)
cpuf Numeric Dec CPU factor
rexpri Numeric N/A Remote execution priority
server Boolean N/A LSF server host
irix Boolean N/A IRIX UNIX
hpux Boolean N/A HP_UX
solaris Boolean N/A Sun Solaris
cserver Boolean N/A Compute server
fserver Boolean N/A File server
aix Boolean N/A AIX UNIX
type String N/A Host type
model String N/A Host model
status String N/A Host status
hname String N/A Host name
TYPE_NAME
HPPA
SGI6
ALPHA
SUNSOL
RS6K
NTX86 MODEL_NAME CPU_FACTOR
DEC3000 10.00
R10K 14.00
PENT200 6.00
IBM350 7.00
SunSparc 6.00
HP735 9.00
HP715 5.00
The lsinfo
command displays three lists of information:
The resources listed by lsinfo
include built-in resources maintained by the LIM and site specific resources configured by the LSF administrator. For a complete description of how LSF manages resources, see `Resources' on page 35.
The host types and host models are defined by the LSF administrator. Host types represent binary compatible hosts; all hosts of the same type can run the same executables. Host models give the relative CPU performance of different processors. In this example, your LSF cluster treats an R10K
processor as being twice as fast as an IBM 350 processor (these numbers were invented for this example, and do not necessarily correspond to the actual performance of these systems).
LSF keeps information about all hosts in the cluster. Some information is static and some is dynamic. Static information is either configured by the LSF administrator, or is a fixed property of the system. An example of static host information is the amount of RAM memory available to users on a host.
Dynamic host information, or load indices, is determined by the LSF system, and updated regularly. Dynamic information represents the changing resources available on the host. Examples of dynamic host information are the current CPU load and the currently available temporary file space.
A load sharing cluster may consist of hosts of differing architecture and speed. The lshosts
command displays configuration information about hosts. All these parameters are defined by the LSF administrator in the LSF configuration files, or determined by the LIM directly from the system.
% lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostD SUNSOL SunSparc 6.0 1 64M 112M Yes (solaris cserver)
hostB ALPHA DEC3000 10.0 1 94M 168M Yes (alpha cserver)
hostM RS6K IBM350 7.0 1 64M 124M Yes (cserver aix)
hostC SGI6 R10K 14.0 16 1024M 1896M Yes (irix cserver)
hostA HPPA HP715 6.0 1 98M 200M Yes (hpux fserver)
In this example, the host type SUNSOL
represents Sun SPARC systems running Solaris, and ALPHA
represents a Digital Alpha server running Digital Unix.
See `Listing Hosts' on page 27 for a complete description of the lshosts
command.
The lsload
command prints out current load information.
% lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostD ok 0.1 0.0 0.1 2% 0.0 5 3 81M 82M 45M
hostC ok 0.7 1.2 0.5 50% 1.1 11 0 322M 337M 252M
hostM ok 0.8 2.2 1.4 60% 15.4 0 136 62M 57M 45M
hostA busy *5.2 3.6 2.6 99% *34.4 4 0 70M 34M 18M
hostB lockU 1.0 1.0 1.5 99% 0.8 5 33 12M 24M 23M
The first line lists the load index names, and each following line gives the load levels for one host. The r15s
, r1m
and r15m
fields give the CPU load, averaged over different time intervals. The ut
field gives the percentage of time the CPU is in use. pg
is the paging rate, ls
is the number of login sessions, it
is the idle time (the time since the last interactive user activity), swp
is the available swap space in megabytes, mem
is the available RAM in megabytes, and tmp
is the available temporary disk space in megabytes.
The status
column gives the load status of the host. A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk `*
'. In the above example, hostA
is busy because load indices r15s
and pg
are too high. The lshosts -l
command shows the load thresholds.
Hosts with ok
status are listed first. The ok
hosts are sorted based on CPU and memory load, with the best host listed first.
The lsload
command reports more load indices if the -l
option is given.
The lsmon
command provides an updating display of load information. The xlsmon command is an X-windows graphical display of host status and load levels in your LSF cluster.
See the lsload
(1
), lsmon
(1
), and xlsmon
(1
) manual pages for more information. Also see `Displaying the Load' on page 29.
LSF supports transparent execution of jobs on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as ctrl-z and ctrl-c work as expected.
There are different ways to run jobs on a remote host. To run myjob
on the best available host, enter:
% lsrun myjob
LSF automatically selects the best host that is of the same type as the local host.
If you want to run myjob
on a host with specific resources, you must specify the resource requirements. For example,
% lsrun -R 'cserver && swp>100' myjob
runs myjob
on a host that has resource `cserver
' (see `Displaying Available Resources' on page 14) and has at least 100 megabytes of virtual memory available.
If you want to run your job on a particular host, use the -m
option:
% lsrun -m hostD myjob
When you run an interactive job on a remote host, you can do most of the job controls as if it were running locally. If your shell supports job control, you can suspend and resume the job and bring the job to background or foreground as if it were a local job. For a complete description, see the lsrun(1)
manual page.
You can also write one-line shell scripts or csh
aliases to hide the remote execution. For example:
#! /bin/sh
# Script to remotely execute myjob
exec lsrun -m hostD myjob
or
% alias myjob "lsrun -m hostD myjob"
The lstcsh
shell is a load-sharing version of the tcsh
command interpreter. It is compatible with csh
and supports many useful extensions. csh
and tcsh
users can use lstcsh
to send jobs to other hosts in the cluster without needing to learn any new commands. You can run lstcsh
from the command line, or use the chsh
command to set it as your login shell. Refer to `Using lstcsh' on page 149 for a more detailed description of lstcsh
.
LSF Make is a load-sharing, parallel version of GNU make. It is compatible with makefiles for most versions of make. LSF Make uses the LSF load information to choose the best group of hosts for your make job. Targets in the makefile are processed in parallel on the chosen hosts using the LSF remote execution facilities. You do not need to modify your makefile to use LSF Make. By default, LSF Make chooses hosts that are all of the same type. LSF Make is invoked using the lsmake
command.
The following example uses the lsmake -V
and -j 3
options to run on three hosts and produce verbose output:
% lsmake -V -j 3
[hostA] [hostD] [hostK]
<< Execute on local host >>
cc -O -c arg.c -o arg.o
<< Execute on remote host hostA >>
cc -O -c dev.c -o dev.o
<< Execute on remote host hostK >>
cc -O -c main.c -o main.o
<< Execute on remote host hostD >>
cc -O arg.o dev.o main.o
LSF Make includes control over parallelism for recursive makes, which are often used for source code trees that are organized into subdirectories. Parallelism can also be controlled by the load on the NFS file server, so that parallel makes do not overload the server and slow everyone else down. See `Using LSF Make' on page 159 for details.
LSF Batch uses some (or all) of the hosts in an LSF cluster as batch server hosts. The host list is configured by the LSF administrator. The bhosts
command displays information about these hosts.
% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 2 1 1 0 0 0
hostB ok - 3 2 1 0 0 1
hostC ok - 32 10 9 0 1 0
hostD ok - 32 10 9 0 1 0
hostM unavail - 3 3 1 1 1 0
STATUS
gives the status of sbatchd
. If a host is down or its sbatchd
is not up, its STATUS
is `unavail
'. The JL/U
column shows the maximum number of job slots a single user can use on each host at one time. MAX
gives the maximum number of job slots that are configured for each host. The RUN
, SSUSP
, and USUSP
columns display the number of job slots in use by jobs in RUN
state, suspended by the system, and suspended by the user, respectively. The field RSV
shows job slots that are reserved by LSF Batch for some jobs. The NJOBS
field shows the sum of field RUN
, SSUSP
, USUSP
, and RSV
.
For a more detailed description of the bhosts
command see `Batch Hosts' on page 79.
To submit a job to the LSF Batch system, use the bsub
command.
For example, submit the job sleep 30
. This command does nothing, and takes 30 seconds to do it. The LSF administrator configures one queue to be the default job queue; if you submit a job without specifying a queue, the job goes to the default queue.
% bsub sleep 30
Job <1234> is submitted to default queue <normal>
In the above example, 1234
is the job ID assigned by LSF Batch to this job, and normal
is the name of the default job queue.
Your batch job remains pending until all conditions for its execution are met. Each batch queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.
The -m "
hostA
hostB ..."
option specifies that the job must run on one of the specified hosts. By specifying a single host, you can force your job to wait until that host is available and then run on that host.
For a detailed description of the bsub
command see `Submitting Batch Jobs' on page 89.
Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of the server hosts in the LSF cluster; the default is to use all server hosts.
System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job. The bqueues
command lists the available LSF Batch queues:
% bqueues
QUEUE_NAME PRIO NICE STATUS MAX JL/U JL/P NJOBS PEND RUN SUSP
owners 49 10 Open:Active - - - 1 0 1 0
priority 43 10 Open:Active 10 - - 8 5 3 0
night 40 10 Open:Inactive - - - 44 44 0 0
short 35 20 Open:Active 20 - 2 4 0 4 0
license 33 10 Open:Active 40 - - 1 1 0 0
normal 30 20 Open:Active - 2 - 0 0 0 0
A dash `-' in any entry means that the column does not apply to the row. In this example some queues have no per-queue, per-user or per-processor job limits configured, so the MAX
, JL/U
and JL/P
entries are `-'.
You can submit jobs to a queue as long as its STATUS
is Open
. However, jobs are not dispatched unless the queue is Active
.
The bjobs
command reports the status of LSF Batch jobs. The -u all
option specifies that jobs for all users should be listed; the default is to list only jobs you submitted. Running jobs are listed first. Pending jobs are listed in the order in which they will be scheduled. Jobs in high priority queues are listed before those in lower priority queues.
% bjobs -u all
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1004 user RUN short hostA hostA job0 Dec 16 09:23
1235 user2 PEND priority hostM job1 Dec 11 13:55
1234 user2 SSUSP normal hostD hostM job3 Dec 11 10:09
1250 user1 PEND short hostA job4 Dec 11 13:59
If you also want to see jobs that finished recently, enter:
% bjobs -a
All your jobs that are still in the LSF Batch system and jobs finished recently are displayed.
The bjobs
command has many other options. See `Batch Jobs' on page 56. Also refer to the bjobs(1)
manual page for a complete description.
You can submit your job to the LSF Batch system using the graphical user interface application xbsub
as shown in Figure 3.
Figure 3. xbsub
Job Submission Window
xlsbatch
is another graphical user interface application for LSF Batch. You can use it to monitor host, job, and queue status, and control your jobs.
Figure 4. xlsbatch
Main Window
Both xbsub
and xlsbatch
have extensive on-line help available through the Help menu of each application.
xbsub
can be started either directly from the command line or from xlsbatch
using the `Submit
' button.