[Contents] [Index] [Top] [Bottom] [Prev] [Next]


5. Using LSF Batch

LSF Batch is a distributed batch system for clusters of UNIX and Windows NT computers. With LSF Batch, you can use a heterogeneous network of computers as a single system. All batch jobs go through a consistent interface, independent of the resources they need or the hosts they run on.

LSF Batch has the same view of cluster and master host as the LSF Base, although LSF Batch may only use some of the hosts in the cluster as servers. The slave batch daemon, sbatchd, runs on every host that the LSF administrator configures as an LSF Batch server. The master batch daemon, mbatchd, always runs on the same host as the master LIM. See `Finding the Master' on page 25 for more information on the master LIM.

This chapter provides important background information on how LSF Batch works and describes the commands that give information about your LSF Batch system. Topics include:

Batch Jobs

Each LSF Batch job goes through a series of state transitions until it eventually completes its task, crashes or is terminated. Figure 9 shows the possible states of a job during its life cycle.

Figure 9. Batch Job States

Many jobs enter only three states:

PEND - waiting in the queue

RUN - dispatched to a host and running

DONE - terminated normally

A job remains pending until all conditions for its execution are met. The conditions may include:

A job may terminate abnormally for various reasons. Job termination may happen from any state. An abnormally terminated job goes into EXIT state. The situations where a job terminates abnormally include:

Jobs may also be suspended at any time. A job can be suspended by its owner, by the LSF administrator or by the LSF Batch system. There are three different states for suspended jobs:

PSUSP - suspended by its owner or the LSF administrator while in PEND state

USUSP - suspended by its owner or the LSF administrator after being dispatched

SSUSP - suspended by the LSF Batch system after being dispatched

After a job has been dispatched and started on a host, it is suspended by the LSF Batch system if the load on the execution host or hosts becomes too high. In such a case, batch jobs could be interfering among themselves or could be interfering with interactive jobs. In either case, some jobs should be suspended to maximize host performance or to guarantee interactive response time. LSF Batch suspends jobs according to their priority.

When a host is busy, LSF Batch suspends lower priority jobs first unless the scheduling policy associated with the job dictates otherwise. A job may also be suspended by the system if the job queue has a time window and the current time goes outside the time window.

A system suspended job can later be resumed by LSF Batch if the load condition on the execution host becomes good enough or when the closed time window of the queue opens again.

Fairshare Scheduling Policy

The default First-Come-First-Served (FCFS) job scheduling is often insufficient for an environment with competing users. Fairshare scheduling is an alternative to the default FCFS scheduling. Fairshare scheduling divides the processing power of the LSF cluster among users and groups to provide fair access to resources. Fairshare is not necessarily equal share. Your cluster administrator can configure shares for users or groups to achieve controlled accesses to resources.

Your LSF cluster administrator defines fairshare policies by assigning shares to users or groups. The special names others and default can also be assigned shares.

The name others is a virtual group referring to all other users not explicitly listed in the share parameter. For example, product group may be assigned 100 shares, while all others together assigned 10 shares.

The name default is a virtual user referring to each of the other users not explicitly configured in the share parameter. For example, if the product group is assigned 100 shares and default user assigned 10 shares, then every user not belonging to the product group will have 10 shares, as if their user names were explicitly listed in the share parameter. As a special case, if default is the only user name in the share parameter, it implements the equal share policy.

LSF Batch uses an account to maintain information about shares and resource consumption of every user or group. A dynamic priority is calculated for each user or group according to configured shares, CPU time consumed (CPU time used for fairshare is not normalized by the host CPU speed factors) for the past HIST_HOURS hours (see `Configuration Parameters' on page 85), number of jobs currently running, and the total elapsed time of jobs. This dynamic priority is then used to decide which user's or group's jobs should be dispatched first. If some users or groups have used less than their fairshare of the resources, their pending jobs (if any) are scheduled next, jumping ahead of jobs of other users.

Note

The CPU time used for host partition scheduling is not normalized by the host CPU speed factors.

LSF Batch provides three different varieties of fairshare configuration. These are queue level fairshare, host partition fairshare, and hierarchical fairshare.

Host Partition Fairshare Scheduling

Host partition fairshare scheduling allows sharing policy to be defined for a group of hosts, rather than in a queue. A host partition specifies a group of hosts together with share allocations among the users or groups. A special host name all can be used to refer to all hosts used by LSF Batch.

Note that only users or groups who are configured to use the host partition can run jobs on these hosts.

Fairshare defined by host partition applies to all queues that run jobs on these hosts.

To find out what host partitions are configured in your cluster, run 'bhpart' command.

Note

Host partition fairshare is an alternative to queue level fairshare scheduling. You cannot use both in the same LSF cluster.

Queue-Level Fairshare Scheduling

Fairshare policy can be defined at the queue level to allow different policies to be applied for different queues. Queue-level fairshare handles resource contention among user jobs within the same queue.

To find out if a queue has fairshare defined, run the bqueues -l command. Your queue has fairshare defined if you see the parameter "USER_SHARES" in the output of the above command.

Note

Queue level fairshare scheduling is an alternative to host partition fairshare scheduling. You cannot use both in the same LSF cluster.

Hierarchical Fairshare

When assigning shares in the fairshare queue or host partition to a user group, each member of the group can be given the same share, or all members are collectively given the share. When the share is collectively assigned, the share each member receives depends on the size of the group and the number of jobs submitted by its members.

With large user groups, it is desirable that the shares assigned to a group are subdivided among subgroups. The shares may be further partitioned within subgroups to create a hierarchical share assignment. Figure 10 gives an example of how an engineering department might want to configure sharing among several groups.

Figure 10. Sample Fairsharing Configuration

The situation pictured in Figure 10 is a share tree in which users are organized into hierarchical groups. Shares are assigned to users or groups at each level in the hierarchy. In the above example, the Development group will get the largest share (50%) of the resources in the event of contention. Shares assigned to the Development group can be further divided among the Systems, Application and Test groups which receive 50%, 35%, and 15%, respectively. At the lowest level, individual users may be allocated shares of the immediate group they belong to.

Each node in the share tree represents either a group account or a user account. A user account corresponds to an individual user who runs jobs while a group account allows for assigning shares collectively to a group and subdividing the shares amongst its members. Note that user accounts are leaf nodes in the share tree while group accounts are always non-leaf nodes. The resource consumption of a group account is the total of the consumption of all users accounts defined recursively under that group. By assigning shares to groups, the administrator can control the rate of allocation of resources to all members of the group.

LSF Batch implements hierarchical fairshare in two steps. First, define hierarchical share distribution by defining hierarchical user groups as discussed above. Second, use the hierarchical share distribution in queue-level fairshare or host partition fairshare definitions. When a fairshare policy uses a group name that represents a hierarchical share distribution, it allocates resources according to the share distribution as if the hierarchy were defined inside the policy.

Hierarchical share distribution information can be displayed by the bugroup command with -l option. See `Viewing Hierarchical Share Information' on page 82 for more information.

Each host partition or queue is considered a share provider and may specify its own fairshare hierarchy for controlling the allocation of resources to its users. For example, a user or group may have large shares in one queue but a small share in the other. When a share provider selects a job to run it searches the share tree from the top and picks the node at each level with the highest priority until a leaf node corresponding to a user is encountered. A job from that user is selected and dispatched if a suitable host is found and that user's priority together with that of its parent groups is updated. As a user or group account dispatches jobs, its priority will decrease, giving other users or groups a chance to access the resources.

As a user you can belong to one or more of the groups in the share tree. There will be a separate user account for each group you belong to. The priority of your jobs will be affected by the group's share assignment. When there is contention for resources among the groups, the system will favour those groups with a larger share. Users belonging to multiple groups can specify the share account that will be used to determine the priority of each job. Note that a given user may not have an account under a particular group. In the previous example, 'User3' does not have an account under the 'ChipX' group.

Other Scheduling Policies

This section discusses other LSF Batch scheduling policies. All these policies can co-exist in the same system.

Preemptive Scheduling

When LSF Batch schedules jobs, those in higher priority queues are considered first. Jobs in lower priority queues are only started if all higher priority jobs are waiting for specified resources, hosts, starting times, or other constraints.

When a high priority job is ready to run, all the LSF Batch server hosts may already be running lower priority jobs. The high priority job ends up waiting for the low priority jobs to finish. If the low priority jobs take a long time to complete, the higher priority jobs may be blocked for an unacceptably long time.

LSF solves this problem by allowing preemptive scheduling within LSF Batch queues. Jobs pending in a preemptive queue can preempt lower priority jobs on a host by suspending them and starting the higher priority jobs on the host.

A queue can also be defined as preemptable. In this case, jobs in higher priority queues can preempt jobs in the preemptable queue even if the higher priority queues are not specified as preemptive.

Note

When the preemptive scheduling policy is used, jobs in preemptive queues may violate the user or host job slot limits. However, LSF Batch ensures that the total number of slots used by running jobs (excluding jobs that are suspended) does not exceed the job slot limits. This is done by suspending lower priority jobs.

Exclusive Scheduling

Some queues accept exclusive jobs. A job can run exclusively only if it is submitted with the -x option to the bsub command specifying a queue that is configured to accept exclusive jobs. An exclusive job runs by itself on a host -- it is dispatched only to a host with no other batch jobs running and LSF does not send any other jobs to the host until the exclusive job completes.

Once an exclusive job is started on a host, the LSF Batch system locks that host out of load sharing by sending a request to the underlying LSF to change the host's status to lockU. The host is no longer available for load sharing by any other task (either interactive or batch) until the exclusive job finishes.

Processor Reservation

The scheduling of parallel jobs supports the notion of processor reservation. Parallel jobs requiring a large number of processors can often not be started if there are many lower priority sequential jobs in the system. There may not be enough resources at any one instant to satisfy a large parallel job, but there may be enough to allow a sequential job to be started. With the processor reservation feature the problem of starvation of parallel jobs can be reduced.

When a parallel job cannot be dispatched because there aren't enough execution slots to satisfy its minimum processor requirements, the currently available slots will be reserved for the job. These reserved job slots are accumulated until there are enough available to start the job. When a slot is reserved for a job it is unavailable to any other job.

To use this feature, a queue must have processor reservation policy enabled through the SLOT_RESERVE parameter (see `Processor Reservation for Parallel Jobs' on page 211 of the LSF Batch Administrator's Guide). To avoid deadlock situations, the period of reservation is specified through the MAX_RESERVE_TIME parameter. The system will accumulate reserved slots for a job until MAX_RESERVE_TIME minutes and if an insufficient number have been accumulated, all slots are freed and made available to other jobs. The MAX_RESERVE_TIME parameter takes effect from the start of the first reservation for a job and a job can go through multiple reservation cycles before it accumulates enough slots to be actually started.

Reserved slots can be displayed with the bjobs command. The number of reserved slots can be displayed with the bqueues, bhosts, bhpart, and busers commands. Look in the RSV column.

Backfill Scheduling

Processor reservation ensures that large parallel jobs will not suffer processor starvation. However, in a heavily loaded LSF Batch system with jobs requiring a varying number of processors, a large number of parallel jobs submitted earlier may keep reserving processors. In such cases, FCFS discipline imposes long average wait times on each job, and thereby degrades the system's utilization as many available processor slots are reserved but not used. Backfill policy allows jobs requiring fewer processors and running for shorter periods of time to use the processors reserved by the larger parallel jobs, if these smaller jobs will not delay the start of any of the large parallel jobs.

Since jobs that are backfilled cannot delay the start of that jobs that reserved the job slots, backfilled jobs will not be preempted in any case.

Scheduling Parameters

Scheduling parameters specify the load conditions under which pending jobs are dispatched, running jobs are suspended, and suspended jobs are resumed. These parameters are configured by the LSF administrator in a variety of ways.

Load Thresholds

Load thresholds can be configured by your LSF administrator to schedule jobs in queues. There are two possible types of load thresholds: loadSched and loadStop. Each load threshold specifies a load index value. A loadSched threshold is the scheduling threshold which determines the load condition for dispatching pending jobs. If a host's load is beyond any defined loadSched, a job will not be started on the host. This threshold is also used as the condition for resuming suspended jobs. A loadStop threshold is the suspending condition that determines when running jobs should be suspended.

Thresholds can be configured for each queue, for each host, or a combination of both. To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched.

The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.

When jobs are running on a host, LSF Batch periodically checks the load levels on that host. If any load index exceeds the corresponding per-host or per-queue suspending threshold for a job, LSF Batch suspends the job. The job remains suspended until the load levels satisfy the scheduling thresholds.

To find out what parameters are configured for your cluster, see `Detailed Queue Information' on page 69 and `Batch Hosts' on page 79.

Scheduling Conditions

Scheduling conditions are a more general way for specifying job dispatching conditions at the queue level. Three parameters, RES_REQ, STOP_COND and RESUME_COND, can be specified in the definition of a queue. These parameters take resource requirement strings as values (see `Resource Requirement Strings' on page 46 for more details) which results in a more flexible specification of conditions than load threshold.

The resource requirement conditions for dispatching a job to a host can be specified through the queue level RES_REQ parameter (see `Queue-Level Resource Requirement' on page 213 of the LSF Batch Administrator's Guide for further details). This parameter provides an alternative for `loadshare' as described in `Load Thresholds' on page 64.

You can also specify the resource requirements for your job using the -R option to the bsub command. If you specify resource requirements that are already defined in the queue, the host must satisfy both requirements to be eligible for running the job. In some cases, the queue specification sets an upper or lower bound on a resource. If you attempt to exceed that bound, your job will be rejected.

The condition for suspending a job can be specified using the queue level STOP_COND parameter. It is defined by a resource requirement string (see `Suspending Condition' on page 215 of the LSF Batch Administrator's Guide). The stopping condition can only be specified in the queue. This parameter provides similar but more flexible function for loadStop as described in `Load Thresholds' on page 64.

The resource requirement conditions that must be satisfied on a host before a suspended job can be resumed is specified using the queue level RESUME_COND parameter (for more detail see `Resume Condition' on page 215 of the LSF Batch Administrator's Guide). The resume condition can only be specified in the queue.

To find out details about the parameters of your cluster, see `Detailed Queue Information' on page 69 and `Batch Hosts' on page 79.

Time Windows for Queues and Hosts

Separate time windows can be defined to control when jobs can be dispatched and when they are to be suspended.

Run Windows

Run windows are time windows during which jobs are allowed to run. When the windows are closed, running jobs are suspended and no new jobs are dispatched. The default is no restriction, or always open. Run windows can only be defined for queues (see `Detailed Queue Information' on page 69).

Note

These windows are only applicable to batch jobs. Interactive jobs scheduled by the Load Information Manager (LIM) of LSF are controlled by another set of run windows (see `Listing Hosts' on page 27).

Dispatch Windows

Dispatch windows are time windows during which jobs are allowed to be started. However, dispatch windows have no effect on jobs that have already started. This means that jobs are allowed to run outside the dispatch windows, but no new jobs will be started. The default is no restriction, or always open. Note that no jobs are allowed to start when the run windows are closed. Dispatch windows can be defined for both queues (see `Detailed Queue Information' on page 69) and batch server hosts (see `Batch Hosts' on page 79).

Batch Queues

Batch queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Batch queues do not correspond to individual hosts; each job queue can use all server hosts in the cluster, or a configured subset of the server hosts.

The LSF administrator can configure job queues to control resource accesses by different users and types of application. Users select the job queue that best fits each job.

Finding Out What Queues Are Available

The bqueues command lists the available LSF Batch queues.

% bqueues
QUEUE_NAME   PRIO  STATUS        MAX JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
interactive  400   Open:Active   -   -    -    -    2      0     2    0
priority     43    Open:Active   -   -    -    -    16     4     11   1
night        40    Open:Inactive -   -    -    -    4      4     0    0
short        35    Open:Active   -   -    -    -    6      1     5    0
license      33    Open:Active   -   -    -    -    0      0     0    0
normal       30    Open:Active   -   -    -    -    0      0     0    0
idle         20    Open:Active   -   -    -    -    6      3     1    2

The PRIO column gives the priority of the queue. The bigger the value, the higher the priority. Queue priorities are used by LSF Batch for job scheduling and control. Jobs from higher priority queues are dispatched first. Jobs from lower priority queues are suspended first when hosts are overloaded.

The STATUS column shows the queue status. A queue accepts new jobs only if it is open and dispatches jobs only if it is active. A queue can be opened or closed only by the LSF administrator. Jobs submitted to a queue that is later closed are still dispatched as long as the queue is active. A queue can be made active or inactive either by the LSF administrator or by the run and dispatch windows of the queue.

The MAX column shows the limit on the number of jobs dispatched from this queue at one time. This limit prevents jobs from a single queue from using too many hosts in a cluster at one time.

The JL/U column shows the limit on the number of jobs dispatched at one time from this queue for each user. This prevents a single user from occupying too many hosts in a cluster while other users' jobs are waiting in the queue.

The JL/P column shows the limit on the number of jobs from this queue dispatched to each processor. This prevents a single queue from occupying too many of the resources on a host.

The JL/H column shows the maximum number of job slots a host can allocate for this queue. This limit controls the number of job slots for the queue on each host, regardless of the type of host: uniprocessor or multiprocessor.

The NJOBS column shows the total number of job slots required by all jobs in the queue, including jobs that have not been dispatched and jobs that have been dispatched but have not finished.

Note

A parallel job with N components would require N job slots.

The PEND column shows the number of job slots needed by pending jobs in this queue.

The RUN column shows the number of job slots used by running jobs in this queue.

The SUSP column shows the number of job slots required by suspended jobs in this queue.

Detailed Queue Information

The -l option to the bqueues command displays the complete status and configuration for each queue. You can specify queue names on the command line to select specific queues:

% bqueues -l normal
QUEUE: normal
  -- For normal low priority jobs, running only if hosts are lightly loaded. This is the default queue.
PARAMETERS/STATISTICS
PRIO NICE  STATUS      MAX JL/U JL/P NJOBS  PEND  RUN SSUSP USUSP
40   20    Open:Active 100 50   11   1      1     0   0     0
Migration threshold is 30 min.
CPULIMIT           RUNLIMIT
20 min of IBM350   342800 min of IBM350
FILELIMIT  DATALIMIT  STACKLIMIT  CORELIMIT  MEMLIMIT  PROCLIMIT
20000 K    20000 K    2048 K      20000 K    5000 K    3
SCHEDULING PARAMETERS
           r15s  r1m  r15m  ut   pg   io   ls  it  tmp  swp  mem
loadSched  -     0.7  1.0   0.2  4.0  50   -   -   -    -    -
loadStop   -     1.5  2.5   -    8.0  240  -   -   -    -    -
SCHEDULING POLICIES:  FAIRSHARE  PREEMPTIVE PREEMPTABLE EXCLUSIVE
USER_SHARES:  [groupA, 70] [groupB, 15]  [default, 1]
DEFAULT HOST SPECIFICATION : IBM350
RUN_WINDOWS:  2:40-23:00 23:30-1:30
DISPATCH_WINDOWS:  1:00-23:50
USERS: groupA/ groupB/ user5
HOSTS:  hostA, hostD, hostB
ADMINISTRATORS:  user7
PRE_EXEC: /tmp/apex_pre.x > /tmp/preexec.log 2>&1
POST_EXEC:  /tmp/apex_post.x > /tmp/postexec.log 2>&1
REQUEUE_EXIT_VALUES:  45

The bqueues -l command only displays fields that apply to the queue. Any field that is not displayed has a default value that does not affect job scheduling or execution. In addition to the fields displayed by the default bqueues command, the fields that may be displayed are:

DESCRIPTION
A description of the typical use of the queue.
Default queue indication
Indicates that this is the default queue.
SSUSP
The number of job slots required by jobs suspended by the system because of load levels or run windows.
USUSP
The number of jobs slots required by jobs suspended by the user or the LSF administrator.
RSV
The numbers of job slots in the queue that are reserved by LSF Batch for pending jobs.
Migration threshold
The time that a job dispatched from this queue will remain suspended by the system before LSF Batch attempts to migrate the job to another host.
CPULIMIT
The maximum CPU time a job can use, in minutes relative to the CPU factor of the named host. CPULIMIT is scaled by the CPU factor of the execution host so that jobs are allowed more time on slower hosts.
When the job-level CPULIMIT is reached, the system sends SIGXCPU to all processes belonging to the job.
RUNLIMIT
The maximum wall clock time a process can use, in minutes. RUNLIMIT is scaled by the CPU factor of the execution host. When a job has been in the RUN state for a total of RUNLIMIT minutes, LSF Batch sends a SIGUSR2 signal to the job. If the job does not exit within 10 minutes, LSF Batch sends a SIGKILL signal to kill the job.
FILELIMIT
The maximum file size a process can create, in kilobytes. This limit is enforced by the UNIX setrlimit system call if it supports the RLIMIT_FSIZE option, or the ulimit system call if it supports the UL_SETFSIZE option.
DATALIMIT
The maximum size of the data segment of a process, in kilobytes. This restricts the amount of memory a process can allocate. DATALIMIT is enforced by the setrlimit system call if it supports the RLIMIT_DATA option, and unsupported otherwise.
STACKLIMIT
The maximum size of the stack segment of a process, in kilobytes. This restricts the amount of memory a process can use for local variables or recursive function calls. STACKLIMIT is enforced by the setrlimit system call if it supports the RLIMIT_STACK option.
CORELIMIT
The maximum size of a core file, in kilobytes. This limit is enforced by the setrlimit system call if it supports the RLIMIT_CORE option.
MEMLIMIT
The maximum running set size (RSS) of a process, in kilobytes. If a process uses more than MEMLIMIT kilobytes of memory, its priority is reduced so that other processes are more likely to be paged in to available memory. This limit is enforced by the setrlimit system call if it supports the RLIMIT_RSS option.
PROCLIMIT
The maximum number of processors allocated to a job. Jobs requesting more processors than the queue's PROCLIMIT are rejected.
PROCESSLIMIT
The maximum number of concurrent processes allocated to a job. If PROCESSLIMIT is reached, the system sends the following signals in sequence to all processes belonging to the job: SIGINT, SIGTERM, and SIGKILL.
SWAPLIMIT
The swap space limit that a job may use. If SWAPLIMIT is reached, the system sends the following signals in sequence to all processes in the job: SIGINT, SIGTERM, and SIGKILL.
loadSched
The load thresholds LSF Batch uses to determine whether a pending job in this queue can be dispatched to a host, and to determine when a suspended job can be resumed. The load indices are explained in `Load Indices' on page 37.
loadStop
The load thresholds LSF Batch uses to determine when to suspend a running batch job in this queue.
SCHEDULING POLICIES
Scheduling policies of the queue. Optionally, one or more of the following policies may be configured:
FAIRSHARE
Jobs in this queue are scheduled based on a fairshare policy. In general, a job will be dispatched before other jobs in this queue if the job's owner has more shares (see USER_SHARES below), fewer running jobs, and has used less CPU time in the recent past, and the job has waited longer. If all the users have the same shares, jobs in this queue are scheduled in a round-robin fashion.
If the fairshare policy is not specified, jobs in this queue are scheduled based on the conventional first-come-first-served (FCFS) policy. That is, jobs are dispatched in the order they were submitted.
PREEMPTIVE
Jobs in this queue may preempt running jobs from lower priority queues. That is, jobs in this queue may still be able to start even though the job limit of a host or a user has been reached, as long as some of the job slots defined by the job limit are taken by jobs from those queues whose priorities are lower than the priority of this queue. Jobs from lower priority queues will be suspended to ensure that the running jobs (excluding suspended jobs) are within the corresponding job limit. If the preemptive policy is not specified, the default is not to preempt any job.
PREEMPTABLE
Jobs in this queue may be preempted by jobs in higher priority queues, even if the higher priority queues are not specified as preemptive.
EXCLUSIVE
Jobs dispatched from this queue can run exclusively on a host if the user so specifies at job submission time (see `Other bsub Options' on page 112). Exclusive execution means that the job is sent to a host with no other batch jobs running there, and no further job--batch or interactive--will be dispatched to that host while the job is running. The default is not to allow exclusive jobs.
BACKFILL
Parallel jobs can reserve job slots on hosts so that they are not prevented from executing if they are competing with jobs requiring fewer processors (as specified via bsub -n). This maximum slot reservation time controls how long, in seconds, a slot is reserved for a job. The backfill policy allows a site to make use of the reserved slots for short jobs without delaying the starting time of the parallel job doing the reserving.
The run limit of currently started jobs is used to compute the estimated start time of a job when backfilling is enabled. A job can backfill the reserved slots of another job if it will finish, based on its run limit, before the estimated start time of the backfilled job. Jobs in a backfill queue can backfill any jobs which are reserving slots. If backfilling is enabled, the estimated start time of a job can be viewed using bjobs -l. LSF Batch provides support for Backfill at the queue level.
USER_SHARES
A list of [username, share] pairs. username is either a user name or a user group name. share is the number of shares of resources assigned to the user or user group. A party will get a portion of the resources proportional to the party's share divided by the sum of the shares of all parties specified in this queue.
DEFAULT HOST SPECIFICATION
A host name or host model name. The appropriate CPU scaling factor of the host or host model (see lsinfo(1)) is used to adjust the actual CPU time limit at the execution host (see CPULIMIT above). This specification overrides the system default DEFAULT_HOST_SPEC (see `Configuration Parameters' on page 85).
RUN_WINDOWS
One or more run windows in a week during which jobs in this queue may execute. When a queue is out of its window or windows, no job in this queue will be dispatched. In addition, when the end of a run window is reached, any running jobs from this queue are suspended until the beginning of the next run window, when they are resumed. The default is no restriction, or always open.
A window is displayed in the format of begin_time-end_time. Time is specified in the format of [day:]hour[:minute], where all fields are numbers in their respective legal ranges: 0(Sunday)-6 for day, 0-23 for hour, and 0-59 for minute. The default value for minute is 0 (on the hour). The default value for day is every day of the week. The begin_time and end_time of a window are separated by `-', with no blank characters (SPACE or TAB) in between. Both begin_time and end_time must be present for a window. Windows are separated by blank characters. If only the character `-' is displayed, the windows are always open.
DISPATCH_WINDOWS
One or more dispatch windows in a week during which jobs in this queue may be dispatched to run. When a queue is out of its windows, no job in this queue can be dispatched. Jobs already dispatched are not affected by the dispatch windows. The default is no restriction, or always open. Dispatch windows are displayed in the same format as run windows (see RUN_WINDOWS above).
USERS
The list of users allowed to submit jobs to this queue.
HOSTS
The list of hosts to which this queue can dispatch jobs.
NQS DESTINATION QUEUES
The list of NQS queues to which this queue can dispatch jobs.
ADMINISTRATORS
A list of administrators of the queue. The users whose names are specified here are allowed to operate on the jobs in the queue and on the queue itself.
JOB_STARTER
An executable file that runs immediately prior to the batch job, taking the batch job file as an input argument. All jobs submitted to the queue are run via the job starter, which is generally used to create a specific execution environment before processing the jobs themselves.
PRE_EXEC
Queue's pre-execution command. This command is executed before the real batch job is run on the execution host (or on the first host selected for a parallel batch job).
POST_EXEC
Queue's post-execution command. This command is executed on the execution host when a job terminates.
REQUEUE_EXIT_VALUES
Jobs that exit with these values are automatically requeued.
RES_REQ
Resource requirements of the queue. Only the hosts that satisfied this resource requirement can be used by the queue.
RESUME_COND
The condition(s) that must be satisfied to resume a suspended job on a host.
STOP_COND
The condition(s) which determine whether a job running on a host should be suspended.
Note that some parameters are displayed only if they are defined.

Automatic Queue Selection

When more than one batch queue is available, you need to decide which queue to use. If you submit a job without specifying a queue name, the LSF Batch system automatically chooses a suitable queue for the job from the candidate default queues, based on the requirements of the job.

Specifying Default Queues

LSF Batch has default queues. The bparams command displays them:

% bparams
Default Queues: normal
...

The user can override this list by defining the environment variable LSB_DEFAULTQUEUE.

Queue Selection Mechanism

Although simple to use, automatic queue selection may not behave as expected, if you do not choose your candidate queues properly. The criteria LSF Batch uses for selecting a suitable queue are as follows:

If multiple queues satisfy the above requirements, then the first queue listed in the candidate queues (as defined by DEFAULT_QUEUE or LSB_DEFAULTQUEUE) that satisfies the requirements is selected.

Choosing a Queue

The default queues are normally suitable to run most jobs for most users, but they may have a very low priority or restrictive execution conditions to minimize interference with other jobs. If automatic queue selection is not satisfactory, you should choose the most suitable queue for each job.

The factors affecting your decision are user access restrictions, size of the job, resource limits of the queue, scheduling priority of the queue, active time windows of the queue, hosts used by the queue, the scheduling load conditions, and the queue description displayed by the bqueues -l command.

The -u user_name option specifies a user or user group so that bqueues displays only the queues that accept jobs from these users.

The -m host_name option allows users to specify a host name or host group name so that bqueues displays only the queues that use these hosts to run jobs.

You must also be sure that the queue is enabled.

The following examples are based on the queues defined in the default LSF configuration. Your LSF administrator may have configured different queues.

To run a job during off hours because the job generates very high load to both the file server and the network, you can submit it to the night queue; use bsub -q night.

If you have an urgent job to run, you may want to submit it to the priority queue; use bsub -q priority.

If you want to use hosts owned by others and you do not want to bother the owners, you may want to run your low priority jobs on the idle queue so that as soon as the owner comes back, your jobs get suspended.

If you are running small jobs and do not want to wait too long to get the results, you can submit jobs to the short queue to be dispatched with higher priority. Make sure your jobs are short enough that they are not killed for exceeding the CPU time limit of the queue (check the resource limits of the queue, if any).

If your job requires a specific execution environment, you may need to submit it to a queue that has a particular job starter defined. Because only your system administrator is able to specify a queue-level job starter as part of the queue definition, you should check with him for more information. See `Queue-Level Job Starters' on page 129 of the LSF Batch Administrator's Guide for information on queue-level job starters.

Batch Users

The busers command displays the maximum number of jobs a user or group may execute on a single processor, the maximum number of job slots a user or group may use in the cluster, the total number of job slots required by all submitted jobs of the user, and the number of job slots in the PEND, RUN, SSUSP, and USUSP states. If no user is specified, the default is to display information about the user who invokes this command. Here is an example of the output from the busers command:

% busers all
USER/GROUP JL/P MAX NJOBS PEND RUN  SSUSP USUSP RSV
default       1 12  -  -  -  -  - -
user9      1  12  34  22  10 2  0  0 
groupA   -  100 20  7  11 1  1  0

Note that if the reserved user name all is specified, busers reports all users who currently have jobs in the system, as well as default, which represents a typical user. The purpose of listing default in the output is to show the job slot limits (JL/P and MAX) of a typical user. No other parameters make sense for default.

Note

The counters displayed by busers treat a parallel job requesting N processors the same as N jobs requesting one processor.

Batch Hosts

LSF Batch uses some (or all) of the hosts in an LSF cluster as execution hosts. The host list is configured by the LSF administrator. The bhosts command displays information about these hosts.

% bhosts
HOST_NAME STATUS  JL/U MAX  NJOBS  RUN  SSUSP USUSP RSV
hostA     ok  2  2  0   0  0   0  0
hostD   ok  2  4  2  1  0  0  1 
hostB    ok  1  2  2  1  0  1 0 

STATUS gives the status of the host and the sbatchd daemon. If a host is down or the LIM is unreachable, the STATUS is unavail. If the LIM is reachable but the sbatchd is not up, STATUS is unreach.

JL/U is the job slot limit per user. The host will not allocate more than JL/U job slots for one user at the same time. MAX gives the maximum number of job slots that are allowed on this host. This does not mean that the host has to always allocate this many job slots if there are waiting jobs; the host must also satisfy its configured load conditions to accept more jobs.

The columns NJOBS, RUN, SSUSP, USUSP, and RSV show the number of job slots used by jobs currently dispatched to the host, running on the host, suspended by the system, suspended by the user, and reserved on the host respectively.

The -l option to the bhosts command gives all information about each batch server host such as the CPU speed factor and the load threshold values for starting, resuming and suspending jobs. You can also specify host names on the command line to list the information for specific hosts.

% bhosts -l hostB
HOST: hostB
STATUS CPUF JL/U MAX NJOBS  RUN SSUSP USUSP RSV  DISPATCH_WINDOWS
ok     9    1    2   2      1   0     0     1    2:00-20:30
          r15s   r1m  r15m   ut    pg    io   ls    it    tmp    swp    mem
loadSched -      -    -      -     -     -    -     -     -      -      -
loadStop  -      -    -      -     40    -    -     -     -      -      -
Migration threshold is 40 min.
Files are copied at checkpoint.

The DISPATCH_WINDOWS column shows the time windows during which jobs can be started on the host. See `Detailed Queue Information' on page 69 for a description of the format of the DISPATCH_WINDOWS column. Unlike the queue run windows, jobs are not suspended when the host dispatch windows close. Jobs running when the host dispatch windows close continue running, but no new jobs are started until the windows reopen.

CPUF is the host CPU factor. loadSched and loadStop are the scheduling and suspending thresholds for the host. If a threshold is not defined, the threshold from the queue definition applies. If both the host and the queue define a threshold for a load index, the most restrictive threshold is used.

The migration threshold is the time that a job dispatched to this host can remain suspended by the system before LSF Batch attempts to migrate the job to another host.

If the host's operating system supports checkpoint copy, this is indicated here. With checkpoint copy, the operating system automatically copies all open files to the checkpoint directory when a process is checkpointed. Checkpoint copy is currently supported only on ConvexOS and Cray systems.

User and Host Groups

The LSF administrator can configure user and host groups. The group names act as convenient aliases wherever lists of user or host names can be specified on the command line or in configuration files. The administrator can also limit the total number of running jobs belonging to a user or a group of users. User groups can also be defined to reflect the hierarchical share distribution, as discussed in `Hierarchical Fairshare' on page 60.

The bugroup and bmgroup commands list the configured group names and members for user and host groups respectively.

% bugroup acct_users
GROUP_NAME    USERS
acct_users :  user1 user2 user4 group1/

Note that if a name is ended by a `/', it is a group.

% bmgroup big_servers
GROUP_NAME    HOSTS
big_servers : hostD hostK

Specifying a user or host group to an LSF Batch command is the same as specifying all the user or host names in the group. For example, the command bsub -m big_servers specifies that the job may be dispatched to either of the hosts hostD or hostK. The command bjobs -l lists detailed information about the job, including the specified hosts and the load thresholds that apply to the job.

% bsub -m big_servers myjob
Job <31556> is submitted to default queue <normal>. % bjobs -l 31556
Job Id <31556>, User <user1>, Status <DONE>, Queue <normal>, Comm and <hostname>
Thu Oct 27 01:47:51: Submitted from host <hostA>, CWD <$HOME>,
Specified Hosts <big_servers>;
Thu Oct 27 01:47:52: Started on <hostK>;
Thu Oct 27 01:47:53: Done successfully. The CPU time used is 0.2
seconds.
            r15s  r1m  r15m   ut  pg  io  ls  it  tmp  swp  mem
loadSched   -     -    -      -   -   -   -   -   -    12   -
loadStop    -     -    -      -   55  -   -   -   -     -   -

Viewing Hierarchical Share Information

The hierarchical share distribution can be displayed by the bugroup command with -l option. The following gives an example of a system consisting of three groups:

% bugroup -l
GROUP_NAME:	 g0 
USERS: g1/ g2/
SHARES: [g2, 20] [g1, 10] GROUP_NAME: g1
USERS: user1 user2 user3
SHARES:  [others, 10] [user3, 4] GROUP_NAME:  g2
USERS:   all users
SHARES:   [user2, 10] [default, 5]

For fairsharing to take effect, the group share definitions must be associated with individual share providers (queues or host partitions) in the system. For example, if the above share definition was associated with a host partition consisting of hostA, hostB, and hostC, the bhpart command can display the share distribution information. By default, the command only displays the top level share accounts associated with the partition. Use the -r option to recursively display the entire share tree associated with the provider.

% bhpart hpartest
HOST_PARTITION_NAME: hpartest
HOSTS:  hostA hostB hostC SHARE_INFO_FOR: hpartest:/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME  RUN_TIME
g0 100 5.440 5 0 0.0       1324 % bhpart -r hpartest HOST_PARTITION_NAME: hpartest
HOSTS: hopper SHARE_INFO_FOR: hpartest/
USER/GROUP SHARES PRIORITY  STARTED  RESERVED CPU_TIME  RUN_TIME
g0 100    5.477  5     0  0.0       1324 SHARE_INFO_FOR: hpartest/g0/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME  RUN_TIME
g2  20  1.645  3    0   0.0       816
g1   10 1.099  2   0  0.0       508 SHARE_INFO_FOR: hpartest/g0/g2/
USER/GROUP SHARES  PRIORITY STARTED  RESERVED CPU_TIME  RUN_TIME
user3  10  3.333 0  0  0.0       0
user2  5  1.667  3  0   0.0       0
user1 5  1.667 0  0  0.0       0
SHARE_INFO_FOR: hpartest/g0/g1/
USER/GROUP SHARES  PRIORITY  STARTED RESERVED  CPU_TIME  RUN_TIME
user2  4  1.333  0  0  0.0       0
others 10 1.099 2  0      0.0       508

Note that when displaying the share tree recursively, the output consists of a series of group accounts starting from the share provider. Each group contains the account information of any subgroups or users under that group. The ACCOUNT_PATH gives the path name of the group account starting from name of the share provider. Each user account similarly can be identified by a unique path e.g hpartest/g0/g1/user2.

The information associated with each account includes the static share assigned to that group or user as well as its dynamic priority. A higher value for the priority indicates that the user's or group's jobs will be considered before those with lower priority at the same level. Priorities for accounts at different levels in the tree should not be compared. Details about the number of started and reserved jobs together with the cpu time and run time used by the accounts previously submitted jobs is displayed. Note that the group account's job counters and cpu time fields are the sum of those for all users or subgroups underneath it.

The -Y option of bqueues will display a similar share tree for a given fairshare queue.

Queue-Level Job Starters

If you frequently need to submit batch jobs that have to be started in a particular environment or require some type of setup to be performed before they are executed, your system administrator can include a job starter function in the definition of a selected queue. In a shell environment, this type of pre-execution setup is often handled by writing the preliminary procedures into a file (referred to as a wrapper) that itself contains a call to start the desired job.

In LSF, a queue-level job starter does the work of a wrapper. A job starter is simply a command (or set of commands) which, when included in the queue definition, is run immediately prior to all jobs submitted to the selected queue. The job starter performs its setup or environment functions, then calls the submitted job itself, which can inherit the execution environment created by the job starter. One typical use of this feature is to customize LSF for use with Atria ClearCase environment (see `Support for Atria ClearCase' on page 275of the LSF Batch Administrator's Guide).

A queue-level job starter can only be specified by the LSF administrator. You can specify a job starter for your interactive jobs using the LSF_JOB_STARTER environment variable. See `Command-Level Job Starters' on page 144 for detailed information.

Queue-level job starters have no effect on interactive jobs, unless the interactive job is submitted to a queue as an interactive batch job (see `Interactive Batch Job Support' on page 145 for information on interactive batch jobs).

Configuration Parameters

The bparams command reports some generic configuration parameters of the LSF Batch system. These include the default queues, default host or host model for CPU speed scaling, job dispatch interval, job checking interval, job accepting interval, etc. The command can display such information in either short format or long format. The short format summarizes a few key parameters. For example:

bparams
Default Queues:  normal idle
Default Host Specification:  DECAXP
Job Dispatch Interval:  20 seconds
Job Checking Interval:  15 seconds
Job Accepting Interval:  20 seconds

The -l option to the bparams command displays the information in long format, which gives a brief description of each parameter as well as the name of the parameter as it appears in the lsb.params file. In addition, the long format lists every parameter defined in the lsb.params file. Here is an example of the output from the long format of the bparams command:

bparams -l
System default queues for automatic queue selection:
    DEFAULT_QUEUE = normal idle
The interval for dispatching jobs by master batch daemon:
    MBD_SLEEP_TIME = 20 (seconds)
The interval for checking jobs by slave batch daemon:
    SBD_SLEEP_TIME = 15 (seconds)
The interval for a host to accept two batch jobs subsequently:
    JOB_ACCEPT_INTERVAL = 1 (* MBD_SLEEP_TIME)
The idle time of a host for resuming pg suspended jobs:
    PG_SUSP_IT = 180 (seconds)
The amount of time during which finished jobs are kept in core:
    CLEAN_PERIOD = 3600 (seconds)
The maximum number of finished jobs that are logged in current ev ent file:
    MAX_JOB_NUM = 2000
The maximum number of retries for reaching a slave batch daemon:
    MAX_SBD_FAIL = 3
The number of hours of resource consumption history:
    HIST_HOURS = 5
The default project assigned to jobs.
    DEFAULT_PROJECT = default

User Controlled Account Mapping

By default, LSF assumes a uniform user name space within a cluster. Some sites do not satisfy this assumption. For such sites, LSF provides support for the execution of batch jobs within a cluster with a non-uniform user name space.

You can set up a hidden.lsfhosts file in your home directory that tells what accounts to use when you send jobs to remote hosts and which remote users are allowed to run jobs under your local account. This is similar to the .rhosts file used by rcp, rlogin and rsh.

The .lsfhosts file consists of multiple lines, where each line is of the form:

hostname|clustername username [send|recv]

A `+' in the hostname or username field indicates any LSF host or user respectively. The keyword send indicates that if you send a job to host hostname, then the account username should be used. The keyword recv indicates that your local account is enabled to run jobs from user username on host hostname. If neither send nor recv are specified, then your local account can both send jobs to and receive jobs from the account username on hostname.

Note

The clustername argument is used for the LSF MultiCluster product. See `Using LSF MultiCluster' on page 187

Lines beginning with `#' are ignored.

Note

The permission on your .lsfhosts file must be set to read/write only by the owner. Otherwise, your .lsfhosts file is silently ignored.

For example, assume that hostB and hostA in your cluster do not share the same user name/user ID space. You have an account user1 on host hostB and an account ruser_1 on host hostA. You want to be able to submit jobs from hostB to run on hostA.

Your .lsfhosts files should be set up as follows:

On hostB:

cat ~user1/.lsfhosts
hostA ruser_1 send

On hostA:

cat ~ruser_1/.lsfhosts
hostB user1 recv

As another example, assume you have account user1 on host hostB and want to use the lsfguest account when sending jobs to be run on host hostA. The lsfguest account is intended to be used by any user submitting jobs from any LSF host.

The .lsfhosts files should be set up as follows:

On hostB:

cat ~user1/.lsfhosts
hostA lsfguest send

On hostA:

cat ~lsfguest/.lsfhosts
+  + recv

When using account mapping, your job is always started as a login shell so that the start-up files of the user account, under which your job will run, are sourced.

Your .lsfhosts file is read at job submission time. Subsequent changes made to this file will not affect the account used to run the job. Jobs submitted after the changes are made will pick up the new entries.

If you attempt to map to an account for which you have no permission, your job is put into PSUSP state. You can modify the .lsfhosts file of the execution account to give appropriate permission and resume the job.

Note

The bpeek command will not work on a job running under a different user account.

File transfer using the -f option to the bsub command will not work when running under a different user account unless rcp(1) is set up to do the file copying.


[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.