[Contents] [Index] [Top] [Bottom] [Prev] [Next]


2. Getting Started

This chapter describes how to use some of the basic features of LSF. After following the examples in this chapter you should be able to use LSF for most of the everyday tasks.

Configuration options shown in the following examples, such as host types and model names, host CPU factors (representing relative processor speed), and resource names are examples only; your system likely has different values for these settings.

Getting Cluster Information

Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, etc.

Displaying the Cluster and Master Names

LSF provides tools for users to get information about the system. The first command you want to use when you learn LSF is lsid. This command tells you the version of LSF, the name of your LSF cluster, and the current master host.

lsid
LSF 3.1, Dec 1, 1997
Copyright 1992-1997 Platform Computing Corporation My cluster name is test_cluster
My master name is hostA

To find out who your cluster administrator is and see a summary of your cluster, run the lsclusters command:

lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
test_cluster ok hostb lsf 6 6

If you are using the LSF MultiCluster product, you will see one line for each of the clusters that your local cluster is connected to in the output of lsclusters.

Displaying Available Resources

The lsinfo command lists all the resources available in the cluster.

lsinfo
RESOURCE_NAME   TYPE     ORDER  DESCRIPTION
r15s            Numeric  Inc    15-second CPU run queue length
r1m              Numeric   Inc    1-minute CPU run queue length (alias:cpu)
r15m            Numeric  Inc    15-minute CPU run queue length
ut              Numeric  Inc    1-minute CPU utilization (0.0 to 1.0)
pg              Numeric  Inc    Paging rate (pages/second)
io              Numeric  Inc    Disk IO rate (Kbytes/second)
ls              Numeric  Inc    Number of login sessions (alias: login)
it              Numeric  Dec    Idle time (minutes) (alias: idle)
tmp             Numeric  Dec    Disk space in /tmp (Mbytes)
swp               Numeric   Dec    Available swap space (Mbytes) (alias:swap)
mem             Numeric  Dec    Available memory (Mbytes)
ncpus           Numeric  Dec    Number of CPUs
ndisks          Numeric  Dec    Number of local disks
maxmem          Numeric  Dec    Maximum memory (Mbytes)
maxswp          Numeric  Dec    Maximum swap space (Mbytes)
maxtmp          Numeric  Dec    Maximum /tmp space (Mbytes)
cpuf            Numeric  Dec    CPU factor
rexpri          Numeric  N/A    Remote execution priority
server          Boolean  N/A    LSF server host
irix            Boolean  N/A    IRIX UNIX
hpux            Boolean  N/A    HP_UX
solaris         Boolean  N/A    Sun Solaris
cserver         Boolean  N/A    Compute server
fserver         Boolean  N/A    File server
aix             Boolean  N/A    AIX UNIX
type            String   N/A    Host type
model           String   N/A    Host model
status          String   N/A    Host status
hname           String   N/A    Host name
TYPE_NAME
HPPA
SGI6
ALPHA
SUNSOL
RS6K
NTX86 MODEL_NAME   CPU_FACTOR
DEC3000      10.00
R10K         14.00
PENT200      6.00
IBM350       7.00
SunSparc     6.00
HP735        9.00
HP715        5.00

The lsinfo command displays three lists of information:

The resources listed by lsinfo include built-in resources maintained by the LIM and site specific resources configured by the LSF administrator. For a complete description of how LSF manages resources, see `Resources' on page 35.

The host types and host models are defined by the LSF administrator. Host types represent binary compatible hosts; all hosts of the same type can run the same executables. Host models give the relative CPU performance of different processors. In this example, your LSF cluster treats an R10K processor as being twice as fast as an IBM 350 processor (these numbers were invented for this example, and do not necessarily correspond to the actual performance of these systems).

Getting Host Information

LSF keeps information about all hosts in the cluster. Some information is static and some is dynamic. Static information is either configured by the LSF administrator, or is a fixed property of the system. An example of static host information is the amount of RAM memory available to users on a host.

Dynamic host information, or load indices, is determined by the LSF system, and updated regularly. Dynamic information represents the changing resources available on the host. Examples of dynamic host information are the current CPU load and the currently available temporary file space.

Displaying Static Host Information

A load sharing cluster may consist of hosts of differing architecture and speed. The lshosts command displays configuration information about hosts. All these parameters are defined by the LSF administrator in the LSF configuration files, or determined by the LIM directly from the system.

% lshosts
HOST_NAME  type    model     cpuf  ncpus maxmem maxswp server      RESOURCES
hostD      SUNSOL  SunSparc  6.0   1     64M    112M   Yes (solaris cserver)
hostB      ALPHA   DEC3000   10.0  1     94M    168M   Yes   (alpha cserver)
hostM      RS6K    IBM350    7.0   1     64M    124M   Yes     (cserver aix)
hostC      SGI6    R10K      14.0  16    1024M  1896M  Yes    (irix cserver)
hostA      HPPA    HP715     6.0   1     98M    200M   Yes    (hpux fserver)

In this example, the host type SUNSOL represents Sun SPARC systems running Solaris, and ALPHA represents a Digital Alpha server running Digital Unix.

See `Listing Hosts' on page 27 for a complete description of the lshosts command.

Displaying Load Information

The lsload command prints out current load information.

% lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostD ok 0.1 0.0 0.1 2% 0.0 5 3 81M 82M 45M
hostC ok 0.7 1.2 0.5 50% 1.1 11 0 322M 337M 252M 
hostM ok 0.8 2.2 1.4 60% 15.4 0 136 62M 57M 45M
hostA busy *5.2 3.6 2.6 99% *34.4 4 0 70M 34M 18M
hostB lockU  1.0 1.0 1.5 99% 0.8 5 33 12M 24M 23M

The first line lists the load index names, and each following line gives the load levels for one host. The r15s, r1m and r15m fields give the CPU load, averaged over different time intervals. The ut field gives the percentage of time the CPU is in use. pg is the paging rate, ls is the number of login sessions, it is the idle time (the time since the last interactive user activity), swp is the available swap space in megabytes, mem is the available RAM in megabytes, and tmp is the available temporary disk space in megabytes.

The status column gives the load status of the host. A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk `*'. In the above example, hostA is busy because load indices r15s and pg are too high. The lshosts -l command shows the load thresholds.

Hosts with ok status are listed first. The ok hosts are sorted based on CPU and memory load, with the best host listed first.

The lsload command reports more load indices if the -l option is given.

The lsmon command provides an updating display of load information. The xlsmon command is an X-windows graphical display of host status and load levels in your LSF cluster.

See the lsload(1), lsmon(1), and xlsmon(1) manual pages for more information. Also see `Displaying the Load' on page 29.

Running Jobs

LSF supports transparent execution of jobs on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as ctrl-z and ctrl-c work as expected.

Running Jobs on Remote Hosts

There are different ways to run jobs on a remote host. To run myjob on the best available host, enter:

% lsrun myjob

LSF automatically selects the best host that is of the same type as the local host.

If you want to run myjob on a host with specific resources, you must specify the resource requirements. For example,

% lsrun -R 'cserver && swp>100' myjob

runs myjob on a host that has resource `cserver' (see `Displaying Available Resources' on page 14) and has at least 100 megabytes of virtual memory available.

If you want to run your job on a particular host, use the -m option:

% lsrun -m hostD myjob

When you run an interactive job on a remote host, you can do most of the job controls as if it were running locally. If your shell supports job control, you can suspend and resume the job and bring the job to background or foreground as if it were a local job. For a complete description, see the lsrun(1) manual page.

You can also write one-line shell scripts or csh aliases to hide the remote execution. For example:

#! /bin/sh
# Script to remotely execute myjob
exec lsrun -m hostD myjob

or

% alias myjob "lsrun -m hostD myjob"

Load Sharing Commands With lstcsh

The lstcsh shell is a load-sharing version of the tcsh command interpreter. It is compatible with csh and supports many useful extensions. csh and tcsh users can use lstcsh to send jobs to other hosts in the cluster without needing to learn any new commands. You can run lstcsh from the command line, or use the chsh command to set it as your login shell. Refer to `Using lstcsh' on page 149 for a more detailed description of lstcsh.

Parallel Processing With LSF Make

LSF Make is a load-sharing, parallel version of GNU make. It is compatible with makefiles for most versions of make. LSF Make uses the LSF load information to choose the best group of hosts for your make job. Targets in the makefile are processed in parallel on the chosen hosts using the LSF remote execution facilities. You do not need to modify your makefile to use LSF Make. By default, LSF Make chooses hosts that are all of the same type. LSF Make is invoked using the lsmake command.

The following example uses the lsmake -V and -j 3 options to run on three hosts and produce verbose output:

% lsmake -V -j 3
[hostA] [hostD] [hostK]
<< Execute on local host >>
cc -O -c arg.c -o arg.o
<< Execute on remote host hostA >>
cc -O -c dev.c -o dev.o
<< Execute on remote host hostK >>
cc -O -c main.c -o main.o
<< Execute on remote host hostD >>
cc -O arg.o dev.o main.o

LSF Make includes control over parallelism for recursive makes, which are often used for source code trees that are organized into subdirectories. Parallelism can also be controlled by the load on the NFS file server, so that parallel makes do not overload the server and slow everyone else down. See `Using LSF Make' on page 159 for details.

Listing Hosts

LSF Batch uses some (or all) of the hosts in an LSF cluster as batch server hosts. The host list is configured by the LSF administrator. The bhosts command displays information about these hosts.

% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 2 1 1  0  0  0
hostB  ok  -  3  2  1  0  0  1
hostC  ok -  32 10 9 0 1  0
hostD ok -  32  10 9  0 1 0
hostM unavail  -  3  3  1 1 1 0

STATUS gives the status of sbatchd. If a host is down or its sbatchd is not up, its STATUS is `unavail'. The JL/U column shows the maximum number of job slots a single user can use on each host at one time. MAX gives the maximum number of job slots that are configured for each host. The RUN, SSUSP, and USUSP columns display the number of job slots in use by jobs in RUN state, suspended by the system, and suspended by the user, respectively. The field RSV shows job slots that are reserved by LSF Batch for some jobs. The NJOBS field shows the sum of field RUN, SSUSP, USUSP, and RSV.

For a more detailed description of the bhosts command see `Batch Hosts' on page 79.

Submitting a Job

To submit a job to the LSF Batch system, use the bsub command.

For example, submit the job sleep 30. This command does nothing, and takes 30 seconds to do it. The LSF administrator configures one queue to be the default job queue; if you submit a job without specifying a queue, the job goes to the default queue.

% bsub sleep 30
Job <1234> is submitted to default queue <normal>

In the above example, 1234 is the job ID assigned by LSF Batch to this job, and normal is the name of the default job queue.

Your batch job remains pending until all conditions for its execution are met. Each batch queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.

The -m "hostA hostB ..." option specifies that the job must run on one of the specified hosts. By specifying a single host, you can force your job to wait until that host is available and then run on that host.

For a detailed description of the bsub command see `Submitting Batch Jobs' on page 89.

Selecting a Job Queue

Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of the server hosts in the LSF cluster; the default is to use all server hosts.

System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job. The bqueues command lists the available LSF Batch queues:

bqueues
QUEUE_NAME  PRIO  NICE STATUS        MAX  JL/U  JL/P NJOBS PEND RUN SUSP
owners      49   10    Open:Active   -    -     -    1     0    1   0
priority    43   10    Open:Active   10   -     -    8     5    3   0
night       40   10    Open:Inactive -    -     -    44    44   0   0
short       35   20    Open:Active   20   -     2    4     0    4   0
license     33   10    Open:Active   40   -     -    1     1    0   0
normal      30   20    Open:Active   -    2     -    0     0    0   0

A dash `-' in any entry means that the column does not apply to the row. In this example some queues have no per-queue, per-user or per-processor job limits configured, so the MAX, JL/U and JL/P entries are `-'.

You can submit jobs to a queue as long as its STATUS is Open. However, jobs are not dispatched unless the queue is Active.

Tracking Batch Jobs

The bjobs command reports the status of LSF Batch jobs. The -u all option specifies that jobs for all users should be listed; the default is to list only jobs you submitted. Running jobs are listed first. Pending jobs are listed in the order in which they will be scheduled. Jobs in high priority queues are listed before those in lower priority queues.

% bjobs -u all
JOBID USER  STAT  QUEUE     FROM_HOST EXEC_HOST JOB_NAME  SUBMIT_TIME
1004  user  RUN   short     hostA     hostA     job0      Dec 16 09:23
1235  user2 PEND  priority  hostM               job1      Dec 11 13:55
1234  user2 SSUSP normal    hostD     hostM     job3      Dec 11 10:09
1250  user1 PEND   short    hostA               job4      Dec 11 13:59

If you also want to see jobs that finished recently, enter:

% bjobs -a

All your jobs that are still in the LSF Batch system and jobs finished recently are displayed.

The bjobs command has many other options. See `Batch Jobs' on page 56. Also refer to the bjobs(1) manual page for a complete description.

xbsub and xlsbatch GUI Applications

You can submit your job to the LSF Batch system using the graphical user interface application xbsub as shown in Figure 3.

Figure 3. xbsub Job Submission Window

xlsbatch is another graphical user interface application for LSF Batch. You can use it to monitor host, job, and queue status, and control your jobs.

Figure 4. xlsbatch Main Window

Both xbsub and xlsbatch have extensive on-line help available through the Help menu of each application.

xbsub can be started either directly from the command line or from xlsbatch using the `Submit' button.


[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.