LSF Batch is a distributed batch system for clusters of UNIX and Windows NT computers. With LSF Batch, you can use a heterogeneous network of computers as a single system. All batch jobs go through a consistent interface, independent of the resources they need or the hosts they run on.
LSF Batch has the same view of cluster and master host as the LSF Base, although LSF Batch may only use some of the hosts in the cluster as servers. The slave batch daemon, sbatchd
, runs on every host that the LSF administrator configures as an LSF Batch server. The master batch daemon, mbatchd
, always runs on the same host as the master LIM. See `Finding the Master' on page 25 for more information on the master LIM.
This chapter provides important background information on how LSF Batch works and describes the commands that give information about your LSF Batch system. Topics include:
Each LSF Batch job goes through a series of state transitions until it eventually completes its task, crashes or is terminated. Figure 9 shows the possible states of a job during its life cycle.
Many jobs enter only three states:
RUN - dispatched to a host and running
A job remains pending until all conditions for its execution are met. The conditions may include:
A job may terminate abnormally for various reasons. Job termination may happen from any state. An abnormally terminated job goes into EXIT
state. The situations where a job terminates abnormally include:
Jobs may also be suspended at any time. A job can be suspended by its owner, by the LSF administrator or by the LSF Batch system. There are three different states for suspended jobs:
PSUSP
- suspended by its owner or the LSF administrator while in PEND
state
USUSP
- suspended by its owner or the LSF administrator after being dispatched
SSUSP
- suspended by the LSF Batch system after being dispatched
After a job has been dispatched and started on a host, it is suspended by the LSF Batch system if the load on the execution host or hosts becomes too high. In such a case, batch jobs could be interfering among themselves or could be interfering with interactive jobs. In either case, some jobs should be suspended to maximize host performance or to guarantee interactive response time. LSF Batch suspends jobs according to their priority.
When a host is busy, LSF Batch suspends lower priority jobs first unless the scheduling policy associated with the job dictates otherwise. A job may also be suspended by the system if the job queue has a time window and the current time goes outside the time window.
A system suspended job can later be resumed by LSF Batch if the load condition on the execution host becomes good enough or when the closed time window of the queue opens again.
The default First-Come-First-Served (FCFS) job scheduling is often insufficient for an environment with competing users. Fairshare scheduling is an alternative to the default FCFS scheduling. Fairshare scheduling divides the processing power of the LSF cluster among users and groups to provide fair access to resources. Fairshare is not necessarily equal share. Your cluster administrator can configure shares for users or groups to achieve controlled accesses to resources.
Your LSF cluster administrator defines fairshare policies by assigning shares to users or groups. The special names others
and default
can also be assigned shares.
The name others
is a virtual group referring to all other users not explicitly listed in the share parameter. For example, product group may be assigned 100 shares, while all others together assigned 10 shares.
The name default
is a virtual user referring to each of the other users not explicitly configured in the share parameter. For example, if the product
group is assigned 100 shares and default
user assigned 10 shares, then every user not belonging to the product
group will have 10 shares, as if their user names were explicitly listed in the share parameter. As a special case, if default
is the only user name in the share parameter, it implements the equal share policy.
LSF Batch uses an account to maintain information about shares and resource consumption of every user or group. A dynamic priority is calculated for each user or group according to configured shares, CPU time consumed (CPU time used for fairshare is not normalized by the host CPU speed factors) for the past HIST_HOURS hours (see `Configuration Parameters' on page 85), number of jobs currently running, and the total elapsed time of jobs. This dynamic priority is then used to decide which user's or group's jobs should be dispatched first. If some users or groups have used less than their fairshare of the resources, their pending jobs (if any) are scheduled next, jumping ahead of jobs of other users.
The CPU time used for host partition scheduling is not normalized by the host CPU speed factors.
LSF Batch provides three different varieties of fairshare configuration. These are queue level fairshare, host partition fairshare, and hierarchical fairshare.
Host partition fairshare scheduling allows sharing policy to be defined for a group of hosts, rather than in a queue. A host partition specifies a group of hosts together with share allocations among the users or groups. A special host name all
can be used to refer to all hosts used by LSF Batch.
Note that only users or groups who are configured to use the host partition can run jobs on these hosts.
Fairshare defined by host partition applies to all queues that run jobs on these hosts.
To find out what host partitions are configured in your cluster, run 'bhpart' command.
Host partition fairshare is an alternative to queue level fairshare scheduling. You cannot use both in the same LSF cluster.
Fairshare policy can be defined at the queue level to allow different policies to be applied for different queues. Queue-level fairshare handles resource contention among user jobs within the same queue.
To find out if a queue has fairshare defined, run the bqueues -l
command. Your queue has fairshare defined if you see the parameter "USER_SHARES" in the output of the above command.
Queue level fairshare scheduling is an alternative to host partition fairshare scheduling. You cannot use both in the same LSF cluster.
When assigning shares in the fairshare queue or host partition to a user group, each member of the group can be given the same share, or all members are collectively given the share. When the share is collectively assigned, the share each member receives depends on the size of the group and the number of jobs submitted by its members.
With large user groups, it is desirable that the shares assigned to a group are subdivided among subgroups. The shares may be further partitioned within subgroups to create a hierarchical share assignment. Figure 10 gives an example of how an engineering department might want to configure sharing among several groups.
Figure 10. Sample Fairsharing Configuration
The situation pictured in Figure 10 is a share tree in which users are organized into hierarchical groups. Shares are assigned to users or groups at each level in the hierarchy. In the above example, the Development group will get the largest share (50%) of the resources in the event of contention. Shares assigned to the Development group can be further divided among the Systems, Application and Test groups which receive 50%, 35%, and 15%, respectively. At the lowest level, individual users may be allocated shares of the immediate group they belong to.
Each node in the share tree represents either a group account or a user account. A user account corresponds to an individual user who runs jobs while a group account allows for assigning shares collectively to a group and subdividing the shares amongst its members. Note that user accounts are leaf nodes in the share tree while group accounts are always non-leaf nodes. The resource consumption of a group account is the total of the consumption of all users accounts defined recursively under that group. By assigning shares to groups, the administrator can control the rate of allocation of resources to all members of the group.
LSF Batch implements hierarchical fairshare in two steps. First, define hierarchical share distribution by defining hierarchical user groups as discussed above. Second, use the hierarchical share distribution in queue-level fairshare or host partition fairshare definitions. When a fairshare policy uses a group name that represents a hierarchical share distribution, it allocates resources according to the share distribution as if the hierarchy were defined inside the policy.
Hierarchical share distribution information can be displayed by the bugroup
command with -l
option. See `Viewing Hierarchical Share Information' on page 82 for more information.
Each host partition or queue is considered a share provider and may specify its own fairshare hierarchy for controlling the allocation of resources to its users. For example, a user or group may have large shares in one queue but a small share in the other. When a share provider selects a job to run it searches the share tree from the top and picks the node at each level with the highest priority until a leaf node corresponding to a user is encountered. A job from that user is selected and dispatched if a suitable host is found and that user's priority together with that of its parent groups is updated. As a user or group account dispatches jobs, its priority will decrease, giving other users or groups a chance to access the resources.
As a user you can belong to one or more of the groups in the share tree. There will be a separate user account for each group you belong to. The priority of your jobs will be affected by the group's share assignment. When there is contention for resources among the groups, the system will favour those groups with a larger share. Users belonging to multiple groups can specify the share account that will be used to determine the priority of each job. Note that a given user may not have an account under a particular group. In the previous example, 'User3' does not have an account under the 'ChipX' group.
This section discusses other LSF Batch scheduling policies. All these policies can co-exist in the same system.
When LSF Batch schedules jobs, those in higher priority queues are considered first. Jobs in lower priority queues are only started if all higher priority jobs are waiting for specified resources, hosts, starting times, or other constraints.
When a high priority job is ready to run, all the LSF Batch server hosts may already be running lower priority jobs. The high priority job ends up waiting for the low priority jobs to finish. If the low priority jobs take a long time to complete, the higher priority jobs may be blocked for an unacceptably long time.
LSF solves this problem by allowing preemptive scheduling within LSF Batch queues. Jobs pending in a preemptive queue can preempt lower priority jobs on a host by suspending them and starting the higher priority jobs on the host.
A queue can also be defined as preemptable. In this case, jobs in higher priority queues can preempt jobs in the preemptable queue even if the higher priority queues are not specified as preemptive.
When the preemptive scheduling policy is used, jobs in preemptive queues may violate the user or host job slot limits. However, LSF Batch ensures that the total number of slots used by running jobs (excluding jobs that are suspended) does not exceed the job slot limits. This is done by suspending lower priority jobs.
Some queues accept exclusive jobs. A job can run exclusively only if it is submitted with the -x
option to the bsub
command specifying a queue that is configured to accept exclusive jobs. An exclusive job runs by itself on a host -- it is dispatched only to a host with no other batch jobs running and LSF does not send any other jobs to the host until the exclusive job completes.
Once an exclusive job is started on a host, the LSF Batch system locks that host out of load sharing by sending a request to the underlying LSF to change the host's status to lockU
. The host is no longer available for load sharing by any other task (either interactive or batch) until the exclusive job finishes.
The scheduling of parallel jobs supports the notion of processor reservation. Parallel jobs requiring a large number of processors can often not be started if there are many lower priority sequential jobs in the system. There may not be enough resources at any one instant to satisfy a large parallel job, but there may be enough to allow a sequential job to be started. With the processor reservation feature the problem of starvation of parallel jobs can be reduced.
When a parallel job cannot be dispatched because there aren't enough execution slots to satisfy its minimum processor requirements, the currently available slots will be reserved for the job. These reserved job slots are accumulated until there are enough available to start the job. When a slot is reserved for a job it is unavailable to any other job.
To use this feature, a queue must have processor reservation
policy enabled through the SLOT_RESERVE
parameter (see `Processor
Reservation for Parallel Jobs' on page 211 of the LSF
Batch Administrator's Guide). To avoid deadlock situations, the
period of reservation is specified through the MAX_RESERVE_TIME
parameter. The system will accumulate reserved slots for a job until MAX_RESERVE_TIME
minutes and if an insufficient number have been accumulated, all slots are freed
and made available to other jobs. The MAX_RESERVE_TIME
parameter
takes effect from the start of the first reservation for a job and a job can
go through multiple reservation cycles before it accumulates enough slots to
be actually started.
Reserved slots can be displayed with the bjobs
command. The number of reserved slots can be displayed with the bqueues
, bhosts
, bhpart
, and busers
commands. Look in the RSV
column.
Processor reservation ensures that large parallel jobs will not suffer processor starvation. However, in a heavily loaded LSF Batch system with jobs requiring a varying number of processors, a large number of parallel jobs submitted earlier may keep reserving processors. In such cases, FCFS discipline imposes long average wait times on each job, and thereby degrades the system's utilization as many available processor slots are reserved but not used. Backfill policy allows jobs requiring fewer processors and running for shorter periods of time to use the processors reserved by the larger parallel jobs, if these smaller jobs will not delay the start of any of the large parallel jobs.
Since jobs that are backfilled cannot delay the start of that jobs that reserved the job slots, backfilled jobs will not be preempted in any case.
Scheduling parameters specify the load conditions under which pending jobs are dispatched, running jobs are suspended, and suspended jobs are resumed. These parameters are configured by the LSF administrator in a variety of ways.
Load thresholds can be configured by your LSF administrator to schedule jobs in queues. There are two possible types of load thresholds: loadSched
and loadStop
. Each load threshold specifies a load index value. A loadSched
threshold is the scheduling threshold which determines the load condition for dispatching pending jobs. If a host's load is beyond any defined loadSched
, a job will not be started on the host. This threshold is also used as the condition for resuming suspended jobs. A loadStop
threshold is the suspending condition that determines when running jobs should be suspended.
Thresholds can be configured for each queue, for each host, or a combination of both. To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched.
The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.
When jobs are running on a host, LSF Batch periodically checks the load levels on that host. If any load index exceeds the corresponding per-host or per-queue suspending threshold for a job, LSF Batch suspends the job. The job remains suspended until the load levels satisfy the scheduling thresholds.
To find out what parameters are configured for your cluster, see `Detailed Queue Information' on page 69 and `Batch Hosts' on page 79.
Scheduling conditions are a more general way for specifying job dispatching conditions at the queue level. Three parameters, RES_REQ
, STOP_COND
and RESUME_COND
, can be specified in the definition of a queue. These parameters take resource requirement strings as values (see `Resource Requirement Strings' on page 46 for more details) which results in a more flexible specification of conditions than load threshold.
The resource requirement conditions for dispatching a
job to a host can be specified through the queue level RES_REQ
parameter (see `Queue-Level Resource Requirement'
on page 213 of the LSF Batch
Administrator's Guide for further details). This parameter provides
an alternative for `loadshare' as described in `Load
Thresholds' on page 64.
You can also specify the resource requirements for your job using the -R
option to the bsub
command. If you specify resource requirements that are already defined in the queue, the host must satisfy both requirements to be eligible for running the job. In some cases, the queue specification sets an upper or lower bound on a resource. If you attempt to exceed that bound, your job will be rejected.
The condition for suspending a job can be specified using
the queue level STOP_COND
parameter. It is defined by a resource
requirement string (see `Suspending Condition'
on page 215 of the LSF Batch
Administrator's Guide). The stopping condition can only be specified
in the queue. This parameter provides similar but more flexible function for
loadStop
as described in `Load Thresholds'
on page 64.
The resource requirement conditions that must be satisfied
on a host before a suspended job can be resumed is specified using the queue
level RESUME_COND
parameter (for more detail see `Resume
Condition' on page 215 of the LSF
Batch Administrator's Guide). The resume condition can only be
specified in the queue.
To find out details about the parameters of your cluster, see `Detailed Queue Information' on page 69 and `Batch Hosts' on page 79.
Separate time windows can be defined to control when jobs can be dispatched and when they are to be suspended.
Run windows are time windows during which jobs are allowed to run. When the windows are closed, running jobs are suspended and no new jobs are dispatched. The default is no restriction, or always open. Run windows can only be defined for queues (see `Detailed Queue Information' on page 69).
These windows are only applicable to batch jobs. Interactive jobs scheduled by the Load Information Manager (LIM) of LSF are controlled by another set of run windows (see `Listing Hosts' on page 27).
Dispatch windows are time windows during which jobs are allowed to be started. However, dispatch windows have no effect on jobs that have already started. This means that jobs are allowed to run outside the dispatch windows, but no new jobs will be started. The default is no restriction, or always open. Note that no jobs are allowed to start when the run windows are closed. Dispatch windows can be defined for both queues (see `Detailed Queue Information' on page 69) and batch server hosts (see `Batch Hosts' on page 79).
Batch queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Batch queues do not correspond to individual hosts; each job queue can use all server hosts in the cluster, or a configured subset of the server hosts.
The LSF administrator can configure job queues to control resource accesses by different users and types of application. Users select the job queue that best fits each job.
The bqueues
command lists the available LSF Batch queues.
% bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
interactive 400 Open:Active - - - - 2 0 2 0
priority 43 Open:Active - - - - 16 4 11 1
night 40 Open:Inactive - - - - 4 4 0 0
short 35 Open:Active - - - - 6 1 5 0
license 33 Open:Active - - - - 0 0 0 0
normal 30 Open:Active - - - - 0 0 0 0
idle 20 Open:Active - - - - 6 3 1 2
The PRIO
column gives the priority of the queue. The bigger the value, the higher the priority. Queue priorities are used by LSF Batch for job scheduling and control. Jobs from higher priority queues are dispatched first. Jobs from lower priority queues are suspended first when hosts are overloaded.
The STATUS
column shows the queue status. A queue accepts new jobs only if it is open and dispatches jobs only if it is active. A queue can be opened or closed only by the LSF administrator. Jobs submitted to a queue that is later closed are still dispatched as long as the queue is active. A queue can be made active or inactive either by the LSF administrator or by the run and dispatch windows of the queue.
The MAX
column shows the limit on the number of jobs dispatched from this queue at one time. This limit prevents jobs from a single queue from using too many hosts in a cluster at one time.
The JL/U
column shows the limit on the number of jobs dispatched at one time from this queue for each user. This prevents a single user from occupying too many hosts in a cluster while other users' jobs are waiting in the queue.
The JL/P
column shows the limit on the number of jobs from this queue dispatched to each processor. This prevents a single queue from occupying too many of the resources on a host.
The JL/H
column shows the maximum number of job slots a host can allocate for this queue. This limit controls the number of job slots for the queue on each host, regardless of the type of host: uniprocessor or multiprocessor.
The NJOBS
column shows the total number of job slots required by all jobs in the queue, including jobs that have not been dispatched and jobs that have been dispatched but have not finished.
A parallel job with N components would require N job slots.
The PEND
column shows the number of job slots needed by pending jobs in this queue.
The RUN
column shows the number of job slots used by running jobs in this queue.
The SUSP
column shows the number of job slots required by suspended jobs in this queue.
The -l
option to the bqueues
command displays the complete status and configuration for each queue. You can specify queue names on the command line to select specific queues:
% bqueues -l normal
QUEUE: normal
-- For normal low priority jobs, running only if hosts are lightly loaded. This is the default queue.
PARAMETERS/STATISTICS
PRIO NICE STATUS MAX JL/U JL/P NJOBS PEND RUN SSUSP USUSP
40 20 Open:Active 100 50 11 1 1 0 0 0
Migration threshold is 30 min.
CPULIMIT RUNLIMIT
20 min of IBM350 342800 min of IBM350
FILELIMIT DATALIMIT STACKLIMIT CORELIMIT MEMLIMIT PROCLIMIT
20000 K 20000 K 2048 K 20000 K 5000 K 3
SCHEDULING PARAMETERS
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - 0.7 1.0 0.2 4.0 50 - - - - -
loadStop - 1.5 2.5 - 8.0 240 - - - - -
SCHEDULING POLICIES: FAIRSHARE PREEMPTIVE PREEMPTABLE EXCLUSIVE
USER_SHARES: [groupA, 70] [groupB, 15] [default, 1]
DEFAULT HOST SPECIFICATION : IBM350
RUN_WINDOWS: 2:40-23:00 23:30-1:30
DISPATCH_WINDOWS: 1:00-23:50
USERS: groupA/ groupB/ user5
HOSTS: hostA, hostD, hostB
ADMINISTRATORS: user7
PRE_EXEC: /tmp/apex_pre.x > /tmp/preexec.log 2>&1
POST_EXEC: /tmp/apex_post.x > /tmp/postexec.log 2>&1
REQUEUE_EXIT_VALUES: 45
The bqueues -l
command only displays fields that apply to the queue. Any field that is not displayed has a default value that does not affect job scheduling or execution. In addition to the fields displayed by the default bqueues
command, the fields that may be displayed are:
DESCRIPTION
A description of the typical use of the queue.
Default queue indication
Indicates that this is the default queue.
SSUSP
The number of job slots required by jobs suspended by the system because of load levels or run windows.
USUSP
The number of jobs slots required by jobs suspended by the user or the LSF administrator.
RSV
The numbers of job slots in the queue that are reserved by LSF Batch for pending jobs.
Migration threshold
The time that a job dispatched from this queue will remain suspended by the system before LSF Batch attempts to migrate the job to another host.
CPULIMIT
The maximum CPU time a job can use, in minutes relative to the CPU factor of the named host.CPULIMIT
is scaled by the CPU factor of the execution host so that jobs are allowed more time on slower hosts.
When the job-levelCPULIMIT
is reached, the system sendsSIGXCPU
to all processes belonging to the job.
RUNLIMIT
The maximum wall clock time a process can use, in minutes.RUNLIMIT
is scaled by the CPU factor of the execution host. When a job has been in the RUN state for a total ofRUNLIMIT
minutes, LSF Batch sends aSIGUSR2
signal to the job. If the job does not exit within 10 minutes, LSF Batch sends aSIGKILL
signal to kill the job.
FILELIMIT
The maximum file size a process can create, in kilobytes. This limit is enforced by the UNIXsetrlimit
system call if it supports theRLIMIT_FSIZE
option, or theulimit
system call if it supports theUL_SETFSIZE
option.
DATALIMIT
The maximum size of the data segment of a process, in kilobytes. This restricts the amount of memory a process can allocate.DATALIMIT
is enforced by thesetrlimit
system call if it supports theRLIMIT_DATA
option, and unsupported otherwise.
STACKLIMIT
The maximum size of the stack segment of a process, in kilobytes. This restricts the amount of memory a process can use for local variables or recursive function calls.STACKLIMIT
is enforced by thesetrlimit
system call if it supports theRLIMIT_STACK
option.
CORELIMIT
The maximum size of a core file, in kilobytes. This limit is enforced by thesetrlimit
system call if it supports theRLIMIT_CORE
option.
MEMLIMIT
The maximum running set size (RSS) of a process, in kilobytes. If a process uses more thanMEMLIMIT
kilobytes of memory, its priority is reduced so that other processes are more likely to be paged in to available memory. This limit is enforced by thesetrlimit
system call if it supports theRLIMIT_RSS
option.
PROCLIMIT
The maximum number of processors allocated to a job. Jobs requesting more processors than the queue'sPROCLIMIT
are rejected.
PROCESSLIMIT
The maximum number of concurrent processes allocated to a job. IfPROCESSLIMIT
is reached, the system sends the following signals in sequence to all processes belonging to the job:SIGINT
,SIGTERM
, andSIGKILL
.
SWAPLIMIT
The swap space limit that a job may use. IfSWAPLIMIT
is reached, the system sends the following signals in sequence to all processes in the job:SIGINT
,SIGTERM
, andSIGKILL
.
loadSched
The load thresholds LSF Batch uses to determine whether a pending job in this queue can be dispatched to a host, and to determine when a suspended job can be resumed. The load indices are explained in `Load Indices' on page 37.
loadStop
The load thresholds LSF Batch uses to determine when to suspend a running batch job in this queue.
SCHEDULING POLICIES
Scheduling policies of the queue. Optionally, one or more of the following policies may be configured:
FAIRSHARE
Jobs in this queue are scheduled based on a fairshare policy. In general, a job will be dispatched before other jobs in this queue if the job's owner has more shares (seeUSER_SHARES
below), fewer running jobs, and has used less CPU time in the recent past, and the job has waited longer. If all the users have the same shares, jobs in this queue are scheduled in a round-robin fashion.
If the fairshare policy is not specified, jobs in this queue are scheduled based on the conventional first-come-first-served (FCFS) policy. That is, jobs are dispatched in the order they were submitted.
PREEMPTIVE
Jobs in this queue may preempt running jobs from lower priority queues. That is, jobs in this queue may still be able to start even though the job limit of a host or a user has been reached, as long as some of the job slots defined by the job limit are taken by jobs from those queues whose priorities are lower than the priority of this queue. Jobs from lower priority queues will be suspended to ensure that the running jobs (excluding suspended jobs) are within the corresponding job limit. If the preemptive policy is not specified, the default is not to preempt any job.
PREEMPTABLE
Jobs in this queue may be preempted by jobs in higher priority queues, even if the higher priority queues are not specified as preemptive.
EXCLUSIVE
Jobs dispatched from this queue can run exclusively on a host if the user so specifies at job submission time (see `Other bsub Options' on page 112). Exclusive execution means that the job is sent to a host with no other batch jobs running there, and no further job--batch or interactive--will be dispatched to that host while the job is running. The default is not to allow exclusive jobs.
BACKFILL
Parallel jobs can reserve job slots on hosts so that they are not prevented from executing if they are competing with jobs requiring fewer processors (as specified viabsub -n
). This maximum slot reservation time controls how long, in seconds, a slot is reserved for a job. The backfill policy allows a site to make use of the reserved slots for short jobs without delaying the starting time of the parallel job doing the reserving.
The run limit of currently started jobs is used to compute the estimated start time of a job when backfilling is enabled. A job can backfill the reserved slots of another job if it will finish, based on its run limit, before the estimated start time of the backfilled job. Jobs in a backfill queue can backfill any jobs which are reserving slots. If backfilling is enabled, the estimated start time of a job can be viewed using bjobs -l
. LSF Batch provides support for Backfill at the queue level.
USER_SHARES
A list of [username, share] pairs. username is either a user name or a user group name. share is the number of shares of resources assigned to the user or user group. A party will get a portion of the resources proportional to the party's share divided by the sum of the shares of all parties specified in this queue.
DEFAULT HOST SPECIFICATION
A host name or host model name. The appropriate CPU scaling factor of the host or host model (seelsinfo
(1
)) is used to adjust the actual CPU time limit at the execution host (seeCPULIMIT
above). This specification overrides the system defaultDEFAULT_HOST_SPEC
(see `Configuration Parameters' on page 85).
RUN_WINDOWS
One or more run windows in a week during which jobs in this queue may execute. When a queue is out of its window or windows, no job in this queue will be dispatched. In addition, when the end of a run window is reached, any running jobs from this queue are suspended until the beginning of the next run window, when they are resumed. The default is no restriction, or always open.
A window is displayed in the format of begin_time-end_time. Time is specified in the format of [day:]hour[:minute], where all fields are numbers in their respective legal ranges: 0(Sunday)-6 for day, 0-23 for hour, and 0-59 for minute. The default value for minute is 0 (on the hour). The default value for day is every day of the week. The begin_time and end_time of a window are separated by `-', with no blank characters (SPACE or TAB) in between. Both begin_time and end_time must be present for a window. Windows are separated by blank characters. If only the character `-' is displayed, the windows are always open.
DISPATCH_WINDOWS
One or more dispatch windows in a week during which jobs in this queue may be dispatched to run. When a queue is out of its windows, no job in this queue can be dispatched. Jobs already dispatched are not affected by the dispatch windows. The default is no restriction, or always open. Dispatch windows are displayed in the same format as run windows (seeRUN_WINDOWS
above).
USERS
The list of users allowed to submit jobs to this queue.
HOSTS
The list of hosts to which this queue can dispatch jobs.
NQS DESTINATION QUEUES
The list of NQS queues to which this queue can dispatch jobs.
ADMINISTRATORS
A list of administrators of the queue. The users whose names are specified here are allowed to operate on the jobs in the queue and on the queue itself.
JOB_STARTER
An executable file that runs immediately prior to the batch job, taking the batch job file as an input argument. All jobs submitted to the queue are run via the job starter, which is generally used to create a specific execution environment before processing the jobs themselves.
PRE_EXEC
Queue's pre-execution command. This command is executed before the real batch job is run on the execution host (or on the first host selected for a parallel batch job).
POST_EXEC
Queue's post-execution command. This command is executed on the execution host when a job terminates.
REQUEUE_EXIT_VALUES
Jobs that exit with these values are automatically requeued.
RES_REQ
Resource requirements of the queue. Only the hosts that satisfied this resource requirement can be used by the queue.
RESUME_COND
The condition(s) that must be satisfied to resume a suspended job on a host.
STOP_COND
The condition(s) which determine whether a job running on a host should be suspended.
Note that some parameters are displayed only if they are defined.
When more than one batch queue is available, you need to decide which queue to use. If you submit a job without specifying a queue name, the LSF Batch system automatically chooses a suitable queue for the job from the candidate default queues, based on the requirements of the job.
LSF Batch has default queues. The bparams
command displays them:
% bparams
Default Queues: normal
...
The user can override this list by defining the environment variable LSB_DEFAULTQUEUE
.
Although simple to use, automatic queue selection may not behave as expected, if you do not choose your candidate queues properly. The criteria LSF Batch uses for selecting a suitable queue are as follows:
If multiple queues satisfy the above requirements, then the first queue listed in the candidate queues (as defined by DEFAULT_QUEUE
or LSB_DEFAULTQUEUE
) that satisfies the requirements is selected.
The default queues are normally suitable to run most jobs for most users, but they may have a very low priority or restrictive execution conditions to minimize interference with other jobs. If automatic queue selection is not satisfactory, you should choose the most suitable queue for each job.
The factors affecting your decision are user access restrictions, size of the job, resource limits of the queue, scheduling priority of the queue, active time windows of the queue, hosts used by the queue, the scheduling load conditions, and the queue description displayed by the bqueues -l
command.
The -u
user_name option specifies a user or user group so that bqueues
displays only the queues that accept jobs from these users.
The -m
host_name option allows users to specify a host name or host group name so that bqueues
displays only the queues that use these hosts to run jobs.
You must also be sure that the queue is enabled.
The following examples are based on the queues defined in the default LSF configuration. Your LSF administrator may have configured different queues.
To run a job during off hours because the job generates very high load to both the file server and the network, you can submit it to the night queue; use bsub -q night
.
If you have an urgent job to run, you may want to submit it to the priority queue; use bsub -q priority
.
If you want to use hosts owned by others and you do not want to bother the owners, you may want to run your low priority jobs on the idle queue so that as soon as the owner comes back, your jobs get suspended.
If you are running small jobs and do not want to wait too long to get the results, you can submit jobs to the short queue to be dispatched with higher priority. Make sure your jobs are short enough that they are not killed for exceeding the CPU time limit of the queue (check the resource limits of the queue, if any).
If your job requires a specific execution environment, you may need to submit it to a queue that has a particular job starter defined. Because only your system administrator is able to specify a queue-level job starter as part of the queue definition, you should check with him for more information. See `Queue-Level Job Starters' on page 129 of the LSF Batch Administrator's Guide for information on queue-level job starters.
The busers
command displays the maximum number of jobs a user or group may execute on a single processor, the maximum number of job slots a user or group may use in the cluster, the total number of job slots required by all submitted jobs of the user, and the number of job slots in the PEND
, RUN
, SSUSP
, and USUSP
states. If no user is specified, the default is to display information about the user who invokes this command. Here is an example of the output from the busers
command:
% busers all
USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV
default 1 12 - - - - - -
user9 1 12 34 22 10 2 0 0
groupA - 100 20 7 11 1 1 0
Note that if the reserved user name all
is specified, busers
reports all users who currently have jobs in the system, as well as default
, which represents a typical user. The purpose of listing default
in the output is to show the job slot limits (JL/P
and MAX
) of a typical user. No other parameters make sense for default
.
The counters displayed by busers
treat a parallel job requesting N processors the same as N jobs requesting one processor.
LSF Batch uses some (or all) of the hosts in an LSF cluster as execution hosts. The host list is configured by the LSF administrator. The bhosts
command displays information about these hosts.
% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok 2 2 0 0 0 0 0
hostD ok 2 4 2 1 0 0 1
hostB ok 1 2 2 1 0 1 0
STATUS
gives the status of the host and the sbatchd
daemon. If a host is down or the LIM is unreachable, the STATUS
is unavail
. If the LIM is reachable but the sbatchd
is not up, STATUS
is unreach
.
JL/U
is the job slot limit per user. The host will not allocate more than JL/U
job slots for one user at the same time. MAX
gives the maximum number of job slots that are allowed on this host. This does not mean that the host has to always allocate this many job slots if there are waiting jobs; the host must also satisfy its configured load conditions to accept more jobs.
The columns NJOBS
, RUN
, SSUSP
, USUSP,
and RSV
show the number of job slots used by jobs currently dispatched to the host, running on the host, suspended by the system, suspended by the user, and reserved on the host respectively.
The -l
option to the bhosts
command gives all information about each batch server host such as the CPU speed factor and the load threshold values for starting, resuming and suspending jobs. You can also specify host names on the command line to list the information for specific hosts.
% bhosts -l hostB
HOST: hostB
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOWS
ok 9 1 2 2 1 0 0 1 2:00-20:30
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - 40 - - - - - -
Migration threshold is 40 min.
Files are copied at checkpoint.
The DISPATCH_WINDOWS
column shows the time windows during which jobs can be started on the host. See `Detailed Queue Information' on page 69 for a description of the format of the DISPATCH_WINDOWS
column. Unlike the queue run windows, jobs are not suspended when the host dispatch windows close. Jobs running when the host dispatch windows close continue running, but no new jobs are started until the windows reopen.
CPUF
is the host CPU factor. loadSched
and loadStop
are the scheduling and suspending thresholds for the host. If a threshold is not defined, the threshold from the queue definition applies. If both the host and the queue define a threshold for a load index, the most restrictive threshold is used.
The migration threshold is the time that a job dispatched to this host can remain suspended by the system before LSF Batch attempts to migrate the job to another host.
If the host's operating system supports checkpoint copy, this is indicated here. With checkpoint copy, the operating system automatically copies all open files to the checkpoint directory when a process is checkpointed. Checkpoint copy is currently supported only on ConvexOS and Cray systems.
The LSF administrator can configure user and host groups. The group names act as convenient aliases wherever lists of user or host names can be specified on the command line or in configuration files. The administrator can also limit the total number of running jobs belonging to a user or a group of users. User groups can also be defined to reflect the hierarchical share distribution, as discussed in `Hierarchical Fairshare' on page 60.
The bugroup
and bmgroup
commands list the configured group names and members for user and host groups respectively.
% bugroup acct_users
GROUP_NAME USERS
acct_users : user1 user2 user4 group1/
Note that if a name is ended by a `/', it is a group.
% bmgroup big_servers
GROUP_NAME HOSTS
big_servers : hostD hostK
Specifying a user or host group to an LSF Batch command is the same as specifying all the user or host names in the group. For example, the command bsub -m big_servers
specifies that the job may be dispatched to either of the hosts hostD or hostK. The command bjobs -l
lists detailed information about the job, including the specified hosts and the load thresholds that apply to the job.
% bsub -m big_servers myjob
Job <31556> is submitted to default queue <normal>. % bjobs -l 31556
Job Id <31556>, User <user1>, Status <DONE>, Queue <normal>, Comm and <hostname>
Thu Oct 27 01:47:51: Submitted from host <hostA>, CWD <$HOME>,
Specified Hosts <big_servers>;
Thu Oct 27 01:47:52: Started on <hostK>;
Thu Oct 27 01:47:53: Done successfully. The CPU time used is 0.2
seconds.
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - 12 -
loadStop - - - - 55 - - - - - -
The hierarchical share distribution can be displayed by the bugroup
command with -l
option. The following gives an example of a system consisting of three groups:
% bugroup -l GROUP_NAME: g0
USERS: g1/ g2/
SHARES: [g2, 20] [g1, 10] GROUP_NAME: g1
USERS: user1 user2 user3
SHARES: [others, 10] [user3, 4] GROUP_NAME: g2
USERS: all users
SHARES: [user2, 10] [default, 5]
For fairsharing to take effect, the group share definitions must be associated with individual share providers (queues or host partitions) in the system. For example, if the above share definition was associated with a host partition consisting of hostA, hostB, and hostC, the bhpart
command can display the share distribution information. By default, the command only displays the top level share accounts associated with the partition. Use the -r
option to recursively display the entire share tree associated with the provider.
% bhpart hpartest HOST_PARTITION_NAME: hpartest
HOSTS: hostA hostB hostC SHARE_INFO_FOR: hpartest:/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
g0 100 5.440 5 0 0.0 1324 % bhpart -r hpartest HOST_PARTITION_NAME: hpartest
HOSTS: hopper SHARE_INFO_FOR: hpartest/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
g0 100 5.477 5 0 0.0 1324 SHARE_INFO_FOR: hpartest/g0/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
g2 20 1.645 3 0 0.0 816
g1 10 1.099 2 0 0.0 508 SHARE_INFO_FOR: hpartest/g0/g2/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
user3 10 3.333 0 0 0.0 0
user2 5 1.667 3 0 0.0 0
user1 5 1.667 0 0 0.0 0
SHARE_INFO_FOR: hpartest/g0/g1/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME
user2 4 1.333 0 0 0.0 0
others 10 1.099 2 0 0.0 508
Note that when displaying the share tree recursively, the output consists of a series of group accounts starting from the share provider. Each group contains the account information of any subgroups or users under that group. The ACCOUNT_PATH gives the path name of the group account starting from name of the share provider. Each user account similarly can be identified by a unique path e.g hpartest/g0/g1/user2.
The information associated with each account includes the static share assigned to that group or user as well as its dynamic priority. A higher value for the priority indicates that the user's or group's jobs will be considered before those with lower priority at the same level. Priorities for accounts at different levels in the tree should not be compared. Details about the number of started and reserved jobs together with the cpu time and run time used by the accounts previously submitted jobs is displayed. Note that the group account's job counters and cpu time fields are the sum of those for all users or subgroups underneath it.
The -Y
option of bqueues will display a similar share tree for a given fairshare queue.
If you frequently need to submit batch jobs that have to be started in a particular environment or require some type of setup to be performed before they are executed, your system administrator can include a job starter function in the definition of a selected queue. In a shell environment, this type of pre-execution setup is often handled by writing the preliminary procedures into a file (referred to as a wrapper) that itself contains a call to start the desired job.
In LSF, a queue-level job starter does the work of a wrapper. A job starter is simply a command (or set of commands) which, when included in the queue definition, is run immediately prior to all jobs submitted to the selected queue. The job starter performs its setup or environment functions, then calls the submitted job itself, which can inherit the execution environment created by the job starter. One typical use of this feature is to customize LSF for use with Atria ClearCase environment (see `Support for Atria ClearCase' on page 275of the LSF Batch Administrator's Guide).
A queue-level job starter can only be specified by the LSF administrator. You can specify a job starter for your interactive jobs using the LSF_JOB_STARTER
environment variable. See `Command-Level Job Starters' on page 144 for detailed information.
Queue-level job starters have no effect on interactive jobs, unless the interactive job is submitted to a queue as an interactive batch job (see `Interactive Batch Job Support' on page 145 for information on interactive batch jobs).
The bparams
command reports some generic configuration parameters of the LSF Batch system. These include the default queues, default host or host model for CPU speed scaling, job dispatch interval, job checking interval, job accepting interval, etc. The command can display such information in either short format or long format. The short format summarizes a few key parameters. For example:
% bparams
Default Queues: normal idle
Default Host Specification: DECAXP
Job Dispatch Interval: 20 seconds
Job Checking Interval: 15 seconds
Job Accepting Interval: 20 seconds
The -l
option to the bparams
command displays the information in long format, which gives a brief description of each parameter as well as the name of the parameter as it appears in the lsb.params
file. In addition, the long format lists every parameter defined in the lsb.params
file. Here is an example of the output from the long format of the bparams
command:
% bparams -l
System default queues for automatic queue selection:
DEFAULT_QUEUE = normal idle
The interval for dispatching jobs by master batch daemon:
MBD_SLEEP_TIME = 20 (seconds)
The interval for checking jobs by slave batch daemon:
SBD_SLEEP_TIME = 15 (seconds)
The interval for a host to accept two batch jobs subsequently:
JOB_ACCEPT_INTERVAL = 1 (* MBD_SLEEP_TIME)
The idle time of a host for resuming pg suspended jobs:
PG_SUSP_IT = 180 (seconds)
The amount of time during which finished jobs are kept in core:
CLEAN_PERIOD = 3600 (seconds)
The maximum number of finished jobs that are logged in current ev ent file:
MAX_JOB_NUM = 2000
The maximum number of retries for reaching a slave batch daemon:
MAX_SBD_FAIL = 3
The number of hours of resource consumption history:
HIST_HOURS = 5
The default project assigned to jobs.
DEFAULT_PROJECT = default
By default, LSF assumes a uniform user name space within a cluster. Some sites do not satisfy this assumption. For such sites, LSF provides support for the execution of batch jobs within a cluster with a non-uniform user name space.
You can set up a hidden.lsfhosts
file in your home directory that tells what accounts to use when you send jobs to remote hosts and which remote users are allowed to run jobs under your local account. This is similar to the .rhosts
file used by rcp
, rlogin
and rsh
.
The .lsfhosts
file consists of multiple lines, where each line is of the form:
hostname|clustername username [send
|recv
]
A `+
' in the hostname
or username
field indicates any LSF host or user respectively. The keyword send
indicates that if you send a job to host hostname,
then the account username
should be used. The keyword
recv
indicates that your local account
is enabled to run jobs from user username
on host hostname.
If neither
send
nor recv
are specified, then your
local account can both send jobs to and receive jobs from the account username
on hostname.
The clustername
argument is used for the LSF MultiCluster product. See `Using LSF MultiCluster' on page 187
Lines beginning with `#' are ignored.
The permission on your .lsfhosts
file must be set to read/write only by the owner. Otherwise, your .lsfhosts
file is silently ignored.
For example, assume that hostB and hostA in your cluster do not share the same user name/user ID space. You have an account user1
on host hostB and an account ruser_1
on host hostA. You want to be able to submit jobs from hostB to run on hostA.
Your .lsfhosts
files should be set up as follows:
% cat ~user1/.lsfhosts
hostA ruser_1 send
% cat ~ruser_1/.lsfhosts
hostB user1 recv
As another example, assume you have account user1
on host hostB and want to use the lsfguest
account when sending jobs to be run on host hostA. The lsfguest
account is intended to be used by any user submitting jobs from any LSF host.
The .lsfhosts
files should be set up as follows:
% cat ~user1/.lsfhosts
hostA lsfguest send
% cat ~lsfguest/.lsfhosts
+ + recv
When using account mapping, your job is always started as a login shell so that the start-up files of the user account, under which your job will run, are sourced.
Your .lsfhosts
file is read at job submission time. Subsequent changes made to this file will not affect the account used to run the job. Jobs submitted after the changes are made will pick up the new entries.
If you attempt to map to an account for which you have no permission, your job is put into PSUSP
state. You can modify the .lsfhosts
file of the execution account to give appropriate permission and resume the job.
The bpeek
command will not work on a job running under a different user account.
File transfer using the -f
option to the bsub
command will not work when running under a different user account unless rcp(1)
is set up to do the file copying.