This chapter describes the system resources LSF keeps track of and how you use LSF resource specifications. Topics covered in this chapter are:
A computer may be thought of as a collection of resources used to execute programs. Different applications often require different resources. For example, a number crunching program may take a lot of CPU power, but a large spreadsheet may need a lot of memory to run well. Some applications may run only on machines of a specific type, and not on others. To run applications as efficiently as possible, the LSF system needs to take these factors into account.
In LSF, resources are handled by naming them and tracking information relevant to them. LSF does its scheduling according to application's resource requirements and resources available on individual hosts. LSF classifies resources in different ways.
general resources
These are resources that are available on all hosts, e.g. all the load indices, number of processors on a host, total swap space, host status.
special resources
These are resources that are only associated with some hosts, e.g. FileServer, aix, solaris, SYSV.
dynamic resources
These are resources that change their values dynamically, e.g. all the load indices, host status.
static resources
These are resources that do not change their values, e.g. all resources except load indices and host status are static resources.
numerical resources
These are resources that take numerical values, e.g. all the load indices, number of processors on a host, host CPU factor.
string resources
These are resources that take string values, e.g. host type, host model, host status.
Boolean resources
These are resources that denote the availability of specific features, e.g. hspice, FileServer, SYSV, aix.
configured resources
These are resources defined by user sites, such as external indices and resources defined in thelsf.shared
file, e.g. FileServer, fddi.
built-in resources
These are resources that are always defined by LIM, e.g. load indices, number of CPUs, total swap space.
host-based resources
These are resources that are not shared among hosts, but are tied to individual hosts. An application must run on a particular host to access such resources, e.g. CPU, memory (using up memory on one host does not affect the available memory on another host), swap space.
shared resources
These are resources that are not associated with individual hosts in the same way, but are "owned" by the entire cluster, or a subset of hosts within the cluster. An application can access such a resource from any host which is configured to share it, but doing so affects its value as seen by other hosts, e.g. floating licenses, shared file systems.
Resource names are case sensitive, and can be up to 29 characters in length (excluding some characters reserved as operators in resource requirement strings). You can list the resources available in your cluster using the lsinfo
command.
Load indices measure the availability of dynamic, non-shared resources on hosts in the LSF cluster. Load indices built into the LIM are updated at fixed time intervals. External load indices are updated when new values are received from the external load collection program, ELIM, configured by the LSF administrator. Load indices are numeric in value.
Table 1 summarizes the load indices collected by the LIM.
available space in temporary filesystem1 | |||||
1Directory C:\temp on NT and /tmp on UNIX. |
The status
index is a string indicating the current status of the host. This status applies to the LIM and RES. The possible values for status
are:
ok
The LIM can select the host for remote execution
busy
A load index exceeds the threshold defined by the LSF administrator; the LIM will not select the host for interactive jobs
lockU
The host is locked by a user or the LSF administrator
lockW
The host's availability time window is closed
unavail
The LIM on the host is not responding
unlicensed
The host does not have a valid LSF license.
If the LIM is available but the RES server is not responding, status
begins with a `-'.
Here is an example of the output from lsload
:
% lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostN ok 0.0 0.0 0.1 1% 0.0 1 224 43M 67M 3M
hostK -ok 0.0 0.0 0.0 3% 0.0 3 0 38M 40M 7M
hostG busy *6.2 6.9 9.5 85% 1.1 30 0 5M 400M 385M
hostF busy 0.1 0.1 0.3 7% *17 6 0 9M 23M 28M
hostV unavail
The r15s
, r1m
and r15m
load indices are the 15-second, 1-minute and 15-minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval.
Run queue length indices are not necessarily the same as the load averages printed by the uptime(1)
command; uptime
load averages on some platforms also include processes that are in short term wait states (such as paging or disk I/O).
On multiprocessor systems more than one process can execute at a time. LSF scales the run queue value on multiprocessor systems to make the CPU load of uniprocessors and multiprocessors comparable. The scaled value is called the effective run queue length. The -E
option shows the effective run queue length.
LSF also adjusts the CPU run queue based on the relative speeds of the processors (the CPU factor). The normalized run queue length is adjusted for both number of processors and CPU speed. The host with the lowest normalized run queue length will run a CPU intensive job the fastest. The -N
option shows the normalized CPU run queue lengths.
The ut
index measures CPU utilization, which is the percentage of time spent running system and user code. A host with no process running has a ut
value of 0 percent; a host on which the CPU is completely busy has a ut
of 100 percent.
The
pg
index gives the virtual memory paging rate in pages per second. This index is closely tied to the amount of available RAM memory and the total size of the processes running on a host; if there is not enough RAM to satisfy all processes, the paging rate will be high. Paging rate is a good measure of how a machine will respond to interactive use; a machine that is paging heavily feels very slow.
The paging rate is reported in units of pages rather than kilobytes, because the relationship between interactive response and paging rate is largely independent of the page size.
The ls
index gives the number of users logged in. Each user is counted once, no matter how many times they have logged into the host.
The it
index is the interactive idle time of the host, in minutes. Idle time is measured from the last input or output on a directly attached terminal or a network pseudo-terminal supporting a login session.
This does not include activity directly through the X server such as CAD applications or emacs
windows, except on SunOS 4, Solaris, and HP-UX systems.
The tmp
index is the space available on the file system that contains the /tmp
(UNIX) or the C:\temp (NT) directory in megabytes.
The swp
index gives the currently available swap space in megabytes. This represents the largest process that can be started on the host.
The mem
index is an estimate of the real memory currently available to user processes. This represents the approximate size of the largest process that could be started on a host without causing the host to start paging. This is an approximation because the virtual memory behaviour of operating systems varies from system to system and is hard to predict.
The io
index is only displayed with the -l
option to lsload
. This index measures I/O throughput to disks attached directly to this host, in kilobytes per second. It does not include I/O to disks that are mounted from other hosts.
External load indices are defined by the LSF administrator. The lsinfo
command lists the external load indices and the lsload -l
command displays the values of all load indices. If you need more information about the external load indices defined at your site, contact your LSF administrator.
Static resources represent host information that does not change over time such as the maximum RAM available to user processes and the number of processors in a machine. Most static resources are determined by the LIM at start-up time. Table 2 lists the static resources reported by the LIM.
The type
and model
static resources are strings specifying the host type and model.
The CPU factor is the speed of the host's CPU relative to other hosts in the cluster. If one processor is twice the speed of another, its CPU factor should be twice as large. The CPU factors are defined by the LSF administrator. For multiprocessor hosts the CPU factor is the speed of a single processor; LSF automatically scales the host CPU load to account for additional processors.
The server
static resource is Boolean; its value is 1 if the host is configured to execute tasks from other hosts, and 0 if the host is a client.
Static resources can be used to select appropriate hosts for particular jobs based on binary architecture, relative CPU speed, and system configuration.
A shared resource is a resource that is not tied to a specific host, but is associated with the entire cluster, or a specific subset of hosts within the cluster. Examples of shared resources include:
An application may use a shared resource by running on any host from which that resource is accessible. For example, in a cluster in which each host has a local disk but can also access a disk on a file server, the disk on the file server is a shared resource, and the local disk is a host-based resource. There will be one value for the entire cluster which measures the utilization of the shared resource, but each host-based resource is measured separately.
LSF does not contain any built-in shared resources. All shared resources must be configured by the LSF Administrator. A shared resource may be configured to be dynamic or static. In the above example, the total space on the shared disk may be static while the amount of space currently free is dynamic. A site may also configure the shared resource to report numeric, string or Boolean values.
In order to view the shared resources in the cluster, use the -s
option of the lshosts
, lsload
, and bhosts
commands. For example, suppose a cluster consists of two hosts, each of which have access to a total of five floating licenses for a particular package. They also access a scratch directory, containing 500MB of disk space, from a file server. The LSF administrator has set the resource definitions as shown in Table 3.
The output of lshosts -s
could be:
% lshosts -s RESOURCE VALUE LOCATION
tot_lic 5 host1 host2
tot_scratch 500 host1 host2
The "VALUE" field indicates the amount of that resource. The "LOCATION" column shows the hosts which share this resource. The information displayed by lshosts(1)
is static, meaning that the value will not change over time. lsload -s
displays the information about shared resources which are dynamic:
% lsload -s RESOURCE VALUE LOCATION
avail_lic 2 host1 host2
avail_scratch 100 host1 host2
The above output indicates that 2 licenses are available, and that the shared scratch directory currently contains 100MB of space.
Under LSF Batch, shared resources may be viewed using bhosts -s
:
% bhosts -s RESOURCE TOTAL RESERVED LOCATION
tot_lic 5 0.0 hostA hostB
tot_scratch 500 0.0 hostA hostB
avail_lic 2 3.0 hostA hostB
avail_scratch 100 400.0 hostA hostB
The "TOTAL" column gives the value of the resource. For dynamic resources, the "RESERVED" column displays the amount that has been reserved by running jobs.
Boolean resource names are used to describe features that may be available only on some machines in a cluster. For example:
Any characteristics or attributes of certain hosts that can be useful in selecting hosts for remote jobs may be configured as Boolean resources. Specifying a Boolean resource in the resource requirements of a job limits the set of computers that can execute the job. Table 4 lists some examples of Boolean resources.
The lsinfo
command lists all the resources configured in the LSF cluster. See `Displaying Available Resources' on page 14 for an example of the lsinfo
command. The lsinfo -l
option gives more detail about each index:
% lsinfo -l r1m
RESOURCE_NAME: r1m
DESCRIPTION: 1-minute CPU run queue length (alias: cpu)
TYPE ORDER INTERVAL BUILTIN DYNAMIC
Numeric Inc 15 Yes Yes
TYPE
indicates whether the resource is numeric, string, or Boolean.
ORDER
is Inc
if the numeric value of the load index increases as the load it measures increases, such as ut
(CPU utilization), or Dec
if the numeric value decreases as the load increases. If the resource is not numeric, the ORDER
is N/A
.
INTERVAL
shows the number of seconds between updates of that index.
BUILTIN
is Yes
if the index is built into the LIM and No
if the index is external.
DYNAMIC
is Yes
if the resource is a load index that changes over time and No
if the resource is a static or Boolean resource.
A resource requirement string describes the resources a job needs. LSF uses resource requirements to select hosts for remote execution and job execution.
A resource requirement string is divided into four sections:
The selection section specifies the criteria for selecting hosts from the system. The ordering section indicates how the hosts that meet the selection criteria should be sorted. The resource usage section specifies the expected resource consumption of the task. The job spanning section indicates if a (parallel) batch job should span across multiple hosts.
The syntax of a resource requirement expression is:
select[selectstring] order[orderstring] rusage[usagestring] span [spanstring]
The section names are select
, order
, rusage
, and span
. The syntax for each of selectstring, orderstring, usagestring, and spanstring is defined below.
The square brackets are an essential part of the resource requirement expression.
Depending on the command, one or more of these sections may apply. The lshosts
command only selects hosts, but does not order them. The lsload
command selects and orders hosts, while lsplace
uses the information in select
, order
, and rusage
sections to select an appropriate host for a task. The lsloadadj
command uses the rusage
section to determine how the load information should be adjusted on a host, while bsub
uses all four sections. Sections that do not apply for a command are ignored.
If no section name is given, then the entire string is treated as a selection string. The select
keyword may be omitted if the selection string is the first string in the resource requirement.
The selection string specifies the characteristics a host must have to match the resource requirement. It is a logical expression built from a set of resource names. The lsinfo
command lists all the resource names and their descriptions. The resource names swap
, idle
, logins
, and cpu
are accepted as aliases for swp
, it
, ls
, and r1m
respectively.
The selection string can combine resource names with logical and arithmetic operators. Non-zero arithmetic values are treated as logical TRUE, and zero as logical FALSE. Boolean resources (for example, server
to denote LSF server hosts) have a value of one if they are defined for a host, and zero otherwise.
Table 5 shows the operators that can be used in selection strings. The operators are listed in order of decreasing precedence.
The selection string is evaluated for each host; if the result is non-zero, then that host is selected. For example:
select[(swp > 50 && type == MIPS) || (swp > 35 && type == ALPHA)] select[((2*r15s + 3*r1m + r15m) / 6 < 1.0) && !fs && (cpuf > 4.0)]
For the string resources type
and model
, the special value any
selects any value and local
selects the same value as that of the local host. For example, type==local
selects hosts of the same type as the host submitting the job. If a job can run on any type of host, include type==any
in the resource requirements. If no type
is specified, the default depends on the command. For lshosts
, lsload
, lsmon
and lslogin
the default is type==any
. For lsplace
, lsrun
, lsgrun
, and bsub
the default is type==local
unless a model or Boolean resource is specified, in which case it is type==any
.
The order string allows the selected hosts to be sorted according to the values of resources. The syntax of the order string is
[-]res[:[-]res]...
Each res must be a dynamic load index; that is, one of the indices r15s
, r1m
, r15m
, ut
, pg
, io
, ls
, it
, tmp
, swp
, mem
, or an external load index defined by the LSF administrator. For example, swp:r1m:tmp:r15s
is a valid order string.
The values of r15s
, r1m
, and r15m
used for sorting are the normalized load indices returned by lsload -N
(see `Load Indices' on page 37).
The order string is used for host sorting and selection. The ordering begins with the rightmost index in the order string and proceeds from right to left. The hosts are sorted into order based on each load index, and if more hosts are available than were requested, the LIM drops the least desirable hosts according to that index. The remaining hosts are then sorted by the next index.
After the hosts are sorted by the leftmost index in the order string, the final phase of sorting orders the hosts according to their status, with hosts that are currently not available for load sharing (that is, not in the ok
state) listed at the end.
Because the hosts are resorted for each load index, only the host status and the leftmost index in the order string actually affect the order in which hosts are listed. The other indices are only used to drop undesirable hosts from the list.
When sorting is done on each index, the direction in which the hosts are sorted (increasing vs decreasing values) is determined by the default order returned by lsinfo
for that index. This direction is chosen such that after sorting, the hosts are ordered from best to worst on that index.
When an index name is preceded by a minus sign `-', the sorting order is reversed so that hosts are ordered from worst to best on that index.
The default sorting order is r1m:pg
(except for lslogin
(1
): ls:r1m
).
This string defines the expected resource usage of the task. It is used to specify resource reservations for LSF Batch jobs, or for mapping tasks onto hosts and adjusting the load when running interactive jobs.
For LSF Batch jobs, the resource usage section is used along with the queue configuration parameter RES_REQ
(see `Scheduling Conditions' on page 65). External indices are also considered in the resource usage string.
The syntax of the resource usage string is
res=value[:res=value]...[:res=value][:duration=value][:decay=valu e]
The res parameter can be any load index. The value parameter is the initial reserved amount. If res or value is not given, the default is not to reserve that resource.
The duration parameter is the time period within which the specified resources should be reserved. It is specified in minutes by default. If the value is followed by the letter 'h', it is specified in hours. For example, 'duration=30
' and 'duration=2h
' specify a duration of 30 minutes and two hours respectively. If duration is not specified, the default is to reserve the total amount for the lifetime of the job.
The decay parameter indicates how the reserved amount should decrease over the duration. A value of 1, 'decay=1
', indicates that system should linearly decrease the amount reserved over the duration. The default decay value is 0, which causes the total amount to be reserved for the entire duration. Values other than 0 or 1 are unsupported. If duration is not specified decay is ignored.
rusage[mem=50:duration=100:decay=1]
The above example indicates that 50MB memory should be reserved for the job. As the job runs, the amount reserved will decrease at approximately 0.5 megabytes per minute until the 100 minutes is up.
Resource reservation is only available for LSF Batch. If you run jobs using LSF Base, such as through lsrun
, LIM uses resource usage to determine the placement of jobs. LIM's placement is limited in comparison to LSF Batch in that the LIM does not track when an application finishes. Resource usage requests are used to temporarily increase the load so that a host is not overloaded. When LIM makes a placement advice, external load indices are not considered in the resource usage string. In this case, the syntax of the resource usage string is
res[=value]:res[=value]: ... :res[=value]
The res is one of the resources whose value is returned by the lsload
command.
rusage[r1m=0.5:mem=20:swp=40]
The above example indicates that the task is expected to increase the 1-minute run queue length by 0.5, consume 20 Mbytes of memory and 40 Mbytes of swap space.
If no value is specified, the task is assumed to be intensive in using that resource. In this case no more than one task will be assigned to a host regardless of how many CPUs it has.
The default resource usage for a task is r15s=1.0:r1m=1.0:r15m=1.0
. This indicates a CPU intensive task which consumes few other resources.
This string specifies the locality of a parallel job (see `Specifying Locality' on page 104). Currently only the following two cases are supported:
span[hosts=1]
This indicates that all the processors allocated to this job must be on the same host.
span[ptile=n]
This indicates that only n processors on each host should be allocated to the job regardless of how many processors the host possesses.
If span is omitted, LSF Batch will allocate the required processors for the job from the available set of processors.
A shared resource may be used in the resource requirement string of any LSF command. For example when submitting an LSF Batch job which requires a certain amount of shared scratch space, you might submit the job as follows:
% bsub -R "avail_scratch > 200 && swap > 50" myjob
The above assumes that all hosts in the cluster have access to the shared scratch space. The job will only be scheduled if the value of the "avail_scratch" resource is more than 200MB and will go to a host with at least 50MB of available swap space.
It is possible for a system to be configured so that only some hosts within the LSF cluster have access to the scratch space. In order to exclude hosts which cannot access a shared resource, the "defined(resource_name)" function must be specified in the resource requirement string. For example:
% bsub -R "defined(avail_scratch) && avail_scratch > 100 && swap > 100" myjob
would exclude any hosts which cannot access the scratch resource. The LSF administrator configures which hosts do and do not have access to a particular shared resource.
Shared resources can also work together with the resource reservation mechanism of LSF Batch to prevent over-committing resources when scheduling. To indicate that a shared resource is to be reserved while a job is running, specify the resource name in the 'rusage' section of the resource requirement string. For example:
% bsub -R "select[defined(verilog_lic)] rusage[verilog_lic=1]" myjob
would schedule the job on a host when there is verilog license available. The license will be reserved by the job after it is scheduled, until it completes.
Some applications require resources other than the default. LSF can store resource requirements for specific applications so that the application automatically runs with the correct resources. For frequently used commands and software packages, the LSF administrator can set up cluster-wide resource requirements available to all users in the cluster. See the LSF Batch Administrator's Guide for more information.
You may have applications that you need to control yourself. Perhaps your administrator did not set them up for load sharing for all users, or you need a non-standard setup. You can use LSF commands to find out resource names available in your system, and tell LSF about the needs of your applications. LSF stores the resource requirements for you from then on.
A task is a UNIX or NT command or a user-created executable program; the terms application or job are also used to refer to tasks.
The resource requirements of applications are stored in the remote task list file. When you run a job through LSF, LSF automatically picks up the job's default resource requirement string from the remote task list files, unless you explicitly override the default by specifying the resource requirement string on the command line.
There are three sets of task list files: the system-wide
default file lsf.task
, the cluster default file lsf.task.
cluster,
and the user file $HOME/.lsftask
. The system and cluster default
files apply to all users. The user file specifies the tasks to be added to or
removed from the system lists for your jobs. Resource requirements specified
in your user file override those in the system lists.
The lsrtasks
command inspects and modifies the remote task list. Invoking lsrtasks
commands with no arguments displays the resource requirements of tasks in the remote list, separated from the task name by `/'.
% lsrtasks
cc/cpu cfd3d/type == SG1 && cpu compressdir/cpu:mem f77/cpu verilog/cpu && cadence compress/cpu dsim/type == any hspice/cpu && cadence nas/swp > 200 && cpu compress/-:cpu:mem epi/hpux11 sparc regression/cpu cc/type == local synopsys/swp >150 && cpu
You can specify resource requirements when tasks are added to the user's remote task list. If the task to be added is already in the list, its resource requirements are replaced.
%
lsrtasks + myjob/swap>=100 && cpu
This adds myjob
to the remote tasks list with its resource requirements.
Most LSF commands accept a -R
resreq argument to specify resource requirements. The exact behaviour depends on the command; for example, specifying a resource requirement for the lsload
command displays the load levels for all hosts that have the requested resources.
Specifying resource requirements for the lsrun
command causes LSF to select the best host out of the set of hosts that have the requested resources. The -R
resreq option overrides any resource requirements specified in the remote task list. For an example of the lsrun
command with the -R
resreq option see `Running Remote Jobs with lsrun' on page 140.