This chapter describes the operating concepts and maintenance tasks of the batch queuing system, LSF Batch. This chapter requires you to understand concepts from `Managing LSF Base' on page 45. The topics covered in this chapter are:
xlsadmin
Managing error log files for LSF Batch daemons was described
in `Managing Error Logs' on page 45. This
section discusses the other important log files LSF Batch daemons produce. The
LSF Batch log files are found in the directory LSB_SHAREDIR/
cluster/logdir
.
Each time a batch job completes or exits, an entry is appended to the lsb.acct
file. This file can be used to create accounting summaries of LSF Batch system use. The bacct(1)
command produces one form of summary. The lsb.acct
file is a text file suitable for processing with awk
, perl
, or similar tools. See the lsb.acct(5)
manual page for details of the contents of this file. Additionally, the LSF Batch API supports calls to process the lsb.acct
records. See the LSF Programmer's Guide for details of LSF Batch API.
You should move the lsb.acct
file to a backup location, and then run your accounting on the backup copy. The daemon automatically creates a new lsb.acct
file to replace the moved file. This prevents problems that might occur if the daemon writes new log entries while the accounting programs are running. When the accounting is complete, you can remove or archive the backup copy.
The LSF Batch daemons keep an event log in the lsb.events
file. The mbatchd
daemon uses this information to recover from server failures, host reboots, and LSF Batch reconfiguration. The lsb.events
file is also used by the bhist
command to display detailed information about the execution history of batch jobs, and by the badmin
command to display the operational history of hosts, queues, and LSF Batch daemons.
For performance reasons, the mbatchd
automatically backs up and rewrites the lsb.events
file after every 1000 batch job completions (this is the default; the value is controlled by the MAX_JOB_NUM
parameter in the lsb.params
file). The old lsb.events
file is moved to lsb.events.1
, and each old lsb.events.
n file is moved to lsb.events.
n+1. The mbatchd
never deletes these files. If disk storage is a concern, the LSF administrator should arrange to archive or remove old lsb.events.
n files occasionally.
Do not remove or modify the lsb.events file. Removing or modifying the lsb.events file could cause batch jobs to be lost.
By default, LSF Batch stores all state information needed to recover from server failures, host reboots, or reconfiguration in a file in the LSB_SHAREDIR
directory. Typically, the LSB_SHAREDIR
directory resides on a reliable file server that also contains other critical applications necessary for running user's jobs. This is performed because, if the central file server is unavailable, user's applications cannot run, and the failure of LSF Batch to continue processing user's jobs is a secondary issue.
For sites not wishing to rely solely on a central file server for recovery information, LSF can be configured to maintain a replica of the recovery file. The replica is stored on the file server, and used if the primary copy is unavailable--referred to as duplicate event logging. When LSF is configured this way, the primary event log is stored on the first master host, and re-sychronized with the replicated copy when the host recovers.
To enable the replication feature, define LSB_LOCALDIR
in the lsf.conf
file. LSB_LOCALDIR
should be a local
directory and it should exist only on the first master host (that is,
the first host configured in the lsf.cluster.
cluster
file).
LSB_LOCALDIR
is used to store the primary copy of the batch state information. The contents of LSB_LOCALDIR
are copied to a replica in LSB_SHAREDIR
which resides on a central file server. As before, LSB_SHAREDIR
is assumed to be accessible from all hosts which can potentially become the master.
With the replication feature enabled the following scenarios can occur:
If the file server containing LSB_SHAREDIR
goes down, LSF will continue to process jobs. Client commands such as bhist(1)
and bacct(1)
which directly read LSB_SHAREDIR
will not work. When the file server recovers, the replica in LSB_SHAREDIR
will be updated.
If the first master host fails, then the primary copy of the recovery file in the LSB_LOCALDIR
directory becomes unavailable. A new master host will be selected which will use the recovery file replica in LSB_SHAREDIR
to restore its state and to log future events. Note that there is no replication by the second master.
When the first master host becomes available again, it will update the primary copy in LSB_LOCALDIR
from the replica in LSB_SHAREDIR
and continue operations as before.
The replication feature improves the reliability of LSF Batch operations provided that the following assumptions hold:
LSB_LOCALDIR
and the file server containing LSB_SHAREDIR
do not fail simultaneously. In this situation, LSF Batch will be unavailable.
mbatchd
does not occur. This may happen given certain network topologies and failure modes. For example, connectivity is lost between the first master, M1, and both the file server and the secondary master, M2. Both M1 and M2 will run the mbatchd
service with M1 logging events to LSB_LOCALDIR
and M2 logging to LSB_SHAREDIR
. When connectivity is restored, the changes made by M2 to LSB_SHAREDIR
will be lost when M1 updates LSB_SHAREDIR
from its copy in LSB_LOCALDIR
.
The lsadmin
command is used to control LSF Base daemons, LIM, and RES. LSF Batch has the badmin
command to perform similar operations on LSF Batch daemons.
To check the status of LSF Batch server hosts and queues, use the bhosts
and bqueues
commands:
% bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok 2 1 0 0 0 0 0
hostB closed 2 2 2 2 0 0 0
hostD ok - 8 1 1 0 0 0 % bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
night 30 Open:Inactive - - - - 4 4 0 0
short 10 Open:Active 50 5 - - 1 0 1 0
simulation 10 Open:Active - 2 - - 0 0 0 0
default 1 Open:Active - - - - 6 4 2 0
If the status of a batch server is `closed', then it will not accept more jobs. The LSF administrator can force a job to run using the brun(1)
command. See `Forcing Job Execution -- brun -f' on page 98 for details.
Use the bhosts -l
command to see more information about the status of closed servers. One of the following conditions will be indicated:
lsbatch
system.
lsb.hosts
.
sbatchd
is ok.
An inactive queue will accept new job submissions, but will not dispatch any new jobs. A queue can become inactive if the LSF cluster administrator explicitly inactivates it via badmin
command, or if the queue has a dispatch or run window defined and the current time is outside the time window.
mbatchd
automatically logs the history of the LSF Batch daemons in the LSF Batch event log. You can display the administrative history of the batch system using the badmin
command.
The badmin hhist
command displays the times when LSF Batch server hosts are opened and closed by the LSF administrator.
The badmin qhist
command displays the times when queues are opened, closed, activated, and inactivated.
The badmin mbdhist
command displays the history of the mbatchd
daemon, including the times when the master starts, exits, reconfigures, or changes to a different host.
The badmin hist
command displays all LSF Batch history information, including all the events listed above.
You can use badmin hstartup
command to start sbatchd
on some or all remote hosts from one host:
% badmin hstartup all
Start up slave batch daemon on <hostA> ......done
Start up slave batch daemon on <hostB> ......done
Start up slave batch daemon on <hostD> ......done
Note that you do not have to be root to use the badmin
command to start LSF Batch daemons.
For remote startup to work, file /etc/lsf.sudoers
has to be set up properly and you have to be able to run rsh
across all LSF hosts without having to enter a password. See `The lsf.sudoers File' on page 189 for configuration details of lsf.sudoers
.
mbatchd
is restarted by the badmin
reconfig
command. sbatchd
can be restarted using the badmin
hrestart
command:
%badmin hrestart hostD
Restart slave batch daemon on <hostD> ...... done
You can specify more than one host name to restart sbatchd
on multiple hosts, or use all
to refer to all LSF Batch server hosts. Restarting sbatchd
on a host does not affect batch jobs that are running on that host.
The badmin hshutdown
command shuts down the sbatchd
.
% badmin hshutdown hostD
Shut down slave batch daemon on <hostD> .... done
If sbatchd
is shutdown, that particular host will not be available for running new jobs. Existing jobs running on that host will continue to completion, but the results will not be sent to the user until sbatchd
is later restarted.
To shut down mbatchd
you must first use the badmin hshutdown
command to shut down the sbatchd
on the master host, and then run the badmin reconfig
command. The mbatchd
is normally restarted by sbatchd
; if there is no sbatchd
running on the master host, badmin reconfig
causes mbatchd
to exit.
If mbatchd
is shut down, all LSF Batch services will be temporarily unavailable. However, all existing jobs will not be affected. When mbatchd
is later restarted, previous status will be restored from the event log file and job scheduling will continue.
Occasionally, you might want to drain a batch server host for the purposes of rebooting, maintenance, or host removal. This can be achieved by running the badmin hclose
command:
% badmin hclose hostB
Close <hostB> ...... done
When a host is open, LSF Batch can dispatch jobs to that host. When a host is closed, no new batch jobs are dispatched, but jobs already dispatched to the host continue to execute. To reopen a batch server host, run badmin hopen
command:
% badmin hopen hostB
Open <hostB> ...... done
To view the history of a batch server host, run badmin hhist
command:
% badmin hhist hostB
Wed Nov 20 14:41:58: Host <hostB> closed by administrator <lsf>.
Wed Nov 20 15:23:39: Host <hostB> opened by administrator <lsf>.
Each batch queue can be open or closed, active or inactive. Users can submit jobs to open queues, but not to closed queues. Active queues start jobs on available server hosts, and inactive queues hold all jobs. The LSF administrator can change the state of any queue. Queues can also become active or inactive because of queue run or dispatch windows.
The current status of a particular queue or all queues is displayed by the bqueues(1)
command. The bqueues -l
option also gives current statistics about the jobs in a particular queue such as the total number of jobs in this queue, the number of jobs running, suspended, and so on.
% bqueues normal
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
normal 30 Open:Active - - - 2 6 4 2 0
When a batch queue is open, users can submit jobs to the queue. When a queue is closed, users cannot submit jobs to the queue. If a user tries to submit a job to a closed queue, an error message is printed and the job is rejected. If a queue is closed but still active, previously submitted jobs continue to be processed. This allows the LSF administrator to drain a queue.
% badmin qclose normal
Queue <normal> is closed
% bqueues normal
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
normal 30 Closed:Active - - - 2 6 4 2 0
% bsub -q normal hostname
normal: Queue has been closed
% badmin qopen normal
Queue <normal> is opened
When a queue is active, jobs in the queue are started if appropriate hosts are available. When a queue is inactive, jobs in the queue are not started. Queues can be activated and inactivated by the LSF administrator using the badmin qact
and badmin qinact
commands, or by configured queue run or dispatch windows.
If a queue is open and inactive, users can submit jobs to the queue but no new jobs are dispatched to hosts. Currently running jobs continue to execute. This allows the LSF administrator to let running jobs complete before removing queues or making other major changes.
% badmin qinact normal
Queue <normal> is inactivated
% bqueues normal
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
normal 30 Open:Inctive - - - - 0 0 0 0
% badmin qact normal
Queue <normal> is activated
The LSF Batch cluster is a subset of the LSF Base cluster. All servers used by LSF Batch must belong to the base cluster; however, not all servers in the base cluster must provide LSF Batch services.
LSF Batch configuration consists of four files: lsb.params
,
lsb.hosts
, lsb.users
, and lsb.queues
.
These files are stored in LSB_CONFDIR/
cluster/
configdir
,
where cluster is the name of your cluster.
All these files are optional. If any of these files do not exist, LSF Batch will assume a default configuration.
The lsb.params
file defines general parameters about LSF Batch system operation, such as the name of the default queue when the user does not specify one, scheduling intervals for mbatchd
and sbatchd
, and so on. Detailed parameters are described in `The lsb.params File' on page 193.
The lsb.hosts
file defines LSF Batch server hosts together with their attributes. Not all LSF hosts defined by LIM configuration have to be configured to run batch jobs. Batch server host attributes include scheduling load thresholds, dispatch windows, and job slot limits. This file is also used to define host groups and host partitions. See `The lsb.hosts File' on page 202 for details of this file.
The lsb.users
file contains user-related parameters such as user groups, user job slot limits, and account mapping. See `The lsb.users File' on page 198 for details.
The lsb.queues
file defines job queues. Numerous controls are available at the queue level to allow cluster administrators to customize site resource allocation policies. See `The lsb.queues File' on page 208 for more details.
When you first install LSF on your cluster, some example queues are already configured for you. You should customize these queues or define new queues to meet your site needs.
After changing any of the LSF Batch configuration files, you need to run badmin reconfig
to tell mbatchd
to pick up the new configuration. You must also run this every time you change LIM configuration.
You can add a batch server host to LSF Batch configuration following the steps below:
LSB_CONFDIR/
cluster/configdir/lsb.hosts
file to add the new host together with its attributes. If you want to limit
the added host for use only by some queues, you should also update lsb.queues
file. Since host types and host models as well as the virtual name `default
'
can be used to refer to all hosts of that type, model, or every other LSF
host not covered by the definitions, you might not need to change any of the
files if the host is already covered.
badmin reconfig
to tell mbatchd
to pick up the new configuration.
sbatchd
on the added host by running badmin hstartup
or simply start it manually.
To remove a host as a batch server host, follow the steps below:
badmin hclose
to prevent new batch jobs from starting on
the host, and wait for any running jobs on that host to finish. If you want
to shut the host down before all jobs complete, use bkill
to
kill the running jobs.
Modify
lsb.hosts
and lsb.queues
in
LSB_CONFDIR
/
cluster
/configdir
directory and remove the host from any of the sections.
badmin hshutdown
to shutdown sbatchd
on that
host.
You should never remove the master host from LSF Batch. Change LIM configuration to assign a different default master host if you want to remove your current default master from the LSF Batch server pool.
Adding a batch queue does not affect pending or running LSF Batch jobs. To add a batch queue to a cluster:
LSB_CONFDIR/
cluster
/configdir/lsb.queues
file to add the new queue definition. You can copy another queue definition
from this file as a starting point; remember to change the QUEUE_NAME
of the copied queue. Save the changes to lsb.queues
. See `The
lsb.queues File' on page 208 for a complete description of LSF Batch
queue configuration.
badmin ckconfig
to check the new queue
definition. If any errors are reported, fix the problem and check the configuration
again. See `Overview of LSF Configuration Files'
on page 50 for an example of normal output from badmin ckconfig
.
badmin reconfig
.
The master batch daemon (mbatchd
) is unavailable for approximately
one minute while it reconfigures. Pending and running jobs are not affected.
Before removing a queue, you should make sure there are no jobs in that queue. If you remove a queue that has jobs in it, the jobs are temporarily moved to a lost and found queue. Jobs in the lost and found queue remain pending until the user or the LSF administrator uses the bswitch
command to switch the jobs into regular queues. Jobs in other queues are not affected.
In this example, move all pending and running jobs in the night queue to the idle queue, and then delete the night queue.
% badmin qclose night
Queue <night> is closed
bswitch -q night
argument chooses jobs from the night queue, and the job ID number 0
specifies that all jobs should be switched:
% bjobs -u all -q night
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
5308 user5 RUN night hostA hostD sleep 500 Nov 21 18:16
5310 user5 PEND night hostA sleep 500 Nov 21 18:17
% bswitch -q night idle 0
Job <5308> is switched to queue <idle>
Job <5310> is switched to queue <idle>
LSB_CONFDIR
/
cluster
/configdir/lsb.queues
file. Remove (or comment out) the definition for the queue being removed.
Save the changes.
badmin reconfig
. If any problems are reported,
fix them and run badmin reconfig
again. The batch system
is unavailable for about one minute while the system rereads the configuration.
A user's job can be rejected at submission time if the submission parameters cannot be validated. Sites can implement their own policy to determine valid values or combinations of submission parameters. The validation checking is performed by an external submission program (esub)
located in LSF_SERVERDIR
(see `External Submission and Execution Executables' on page 42).
The esub
is invoked at job submission and modification time. It is also invoked when a checkpointed job is restarted. In each of these cases the user is allowed to specify parameters that affect the scheduling or execution of the job. To validate these parameters, the esub
is invoked with two environment variables set:
option_name=value
" with the option names listed in the following table. The esub
can read this file.
esub
whenever an operation is aborted as a result of the parameters encountered by the esub
when the LSB_SUB_PARM_FILE
file is read.
The LSB_SUB_PARM_FILE Option Names are shown below:
LSB_SUB_JOB_NAME
the specified job name
LSB_SUB_QUEUE
value is the specified queue name
LSB_SUB_IN_FILE
the specified standard input file name
LSB_SUB_OUT_FILE
the specified standard output file name
LSB_SUB_ERR_FILE
the specified standard error file name
LSB_SUB_EXCLUSIVE
"Y
" specifies exclusive execution
LSB_SUB_NOTIFY_END
ends
LSB_SUB_NOTIFY_BEGIN
"Y
" specifies email notification when job begins
LSB_SUB_USER_GROUP
the specified user group name
LSB_SUB_CHKPNT_PERIOD
the specified checkpoint period
LSB_SUB_CHKPNT_DIR
the specified checkpoint directory
LSB_SUB_RESTART_FORCE
"Y
" specifies forced restart job
LSB_SUB_RESTART
"Y
" specifies a restart job.
LSB_SUB_RERUNNABLE
"Y
" specifies a rerunnable job.
LSB_SUB_WINDOW_SIG
the specified window signal number
LSB_SUB_HOST_SPEC
the specified hostspec
LSB_SUB_DEPEND_COND
the specified dependency condition
LSB_SUB_RES_REQ
the specified resource requirement string
LSB_SUB_PRE_EXEC
the specified pre-execution command
LSB_SUB_LOGIN_SHELL
the specified login shell
LSB_SUB_MAIL_USER
the specified user for sending email
LSB_SUB_MODIFY
"Y
" specifies a modification request
LSB_SUB_MODIFY_ONCE
"Y
" specifies a modification-once request
LSB_SUB_PROJECT_NAME
the specified project name
LSB_SUB_INTERACTIVE
"Y
" specifies an interactive job
LSB_SUB_PTY
"Y
" specifies an interactive job with PTY support
LSB_SUB_PTY_SHELL
"Y
" specifies an interactive job with PTY shell support
LSB_SUB_TIME_EVENT
the time event expression
LSB_SUB_HOSTS
the list of execution host names
LSB_SUB_NUM_PROCESSORS
the minimum number of processors requested
LSB_SUB_MAX_NUM_PROCESSORS
the maximum number of processors requested
LSB_SUB_BEGIN_TIME
the begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_TERM_TIME
the termination time, in seconds since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_OTHER_FILES
always "SUB_RESET
" if defined to indicate abmod
is being performed to reset the number of files to be transferred
LSB_SUB_OTHER_FILES_nn
is an index number indicating the particular file transfer
nn
value is the specified file transfer expression; for example, for 'bsub -f
"a > b
"-f
"c < d
"', the following would be defined:LSB_SUB_OTHER_FILES_0
="a > b
"LSB_SUB_OTHER_FILES_1
="c < d
"
LSB_SUB_EXCEPTION
the specified exception condition
LSB_SUB_RLIMIT_CPU
the specified cpu limit
LSB_SUB_RLIMIT_FSIZE
the specified file limit
LSB_SUB_RLIMIT_DATA
the specified data size limit
LSB_SUB_RLIMIT_STACK
the specified stack size limit
LSB_SUB_RLIMIT_CORE
the specified core file size limit
LSB_SUB_RLIMIT_RSS
the specified resident size limit
LSB_SUB_RLIMIT_RUN
the specified wall clock run limit
Any messages that need to be provided to the user should be directed to the standard error stream and not the standard output stream.
One use of this feature is to support project-based accounting. The user can request that the resources used by a job be charged to a particular project. Projects are defined outside of the LSF configuration files, so LSF will accept any arbitrary string for a project name. In order to ensure that only valid projects are entered and the user is eligible to charge to that project, an esub
can be written.
The following is an example of an external submission program written in UNIX bare shell to do this.
#!/bin/sh
. $LSB_SUB_PARM_FILE
# Redirect stderr to stdout so echo can be used for error messages
exec 1>&2
# Check valid projects
if [ $LSB_PROJECT -ne "proj1" -o $LSB_PROJECT -ne "proj2" ] then
echo "Invalid project name specified"
exit $LSB_SUB_ABORT_VALUE
fi USER=`whoami`
if [ $LSB_PROJECT -eq "proj1" ]; then
# Only user1 and user2 can charge to proj1
if [$USER -ne "user1" -a $USER -ne "user2" ]; then
echo "You are not allowed to charge to this project"
exit $LSB_SUB_ABORT_VALUE
fi
fi
The LSF administrator can control batch jobs belonging to any user. Other users can control only their own jobs. Jobs can be suspended, resumed, killed, and moved within and between queues.
The bswitch
command moves pending and running jobs from queue to queue. The btop
and bbot
commands change the dispatching order of pending jobs within a queue. The LSF administrator can move any job. Other users can move only their own jobs.
The btop
and bbot
commands do not allow users to move their own jobs ahead of those submitted by other users. Only the execution order of the user's own jobs is changed. The LSF administrator can move one user's job ahead of another user's. The btop
, bbot
, and bswitch
commands are described in the LSF Batch User's Guide and in the btop(1)
and bswitch(1)
manual pages.
The bstop
, bresume,
and bkill
commands send signals to batch jobs. See the kill(1)
manual page for a discussion of the signals on UNIX.
bstop
sends SIGSTOP
to sequential jobs and SIGTSTP
to parallel jobs.
bkill
sends the specified signal to the process group of the specified jobs. If the -s
option is not present, the default operation of bkill
is to send a SIGKILL
signal to the specified jobs to kill these jobs. Twenty seconds before SIGKILL
is sent, SIGTERM
and SIGINT
are sent to give the job a chance to catch the signals and clean up.
On Windows NT, job control messages replace the SIGINT
and SIGTERM
signals, but only customized applications will be able to process them. Termination is implemented by the TerminateProcess( )
system call.
Users are only allowed to send signals to their own jobs. The LSF administrator can send signals to any job. See the LSF Batch User's Guide and the manual pages for more information about these commands.
This example shows the use of the bstop
and bkill
commands:
% bstop 5310
Job <5310> is being stopped
% bjobs 5310
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
5310 user5 PSUSP night hostA sleep 500 Nov 21 18:17
% bkill 5310
Job <5310> is being terminated
% bjobs 5310
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
5310 user5 EXIT night hostA sleep 500 Nov 21 18:17
A pending batch job can be forced to run by using the brun
command. This operation can only be performed by an LSF administrator. To force a job to run, you must specify the host on which that job will run. For parallel jobs, a list of hosts can be specified. The number of host names in the list must be at least equal to the minimum number of processors requested by the job. For example, the following command will force the sequential job 104 to run on hostA
:
%
brun -m hostA 104
The following command will force the parallel job 105 to run on hostA
, hostB
, hostC
, and hostD
.
%
brun -m "hostA hostB hostC hostD" 105
If the job had requested more than 4 processors at a minimum, the request would have been rejected. If the number of hosts specified for a parallel job is greater than the maximum number of processors the job requested, the extra hosts are ignored.
When a job is forced to run, any other constraints associated with the job (such as resource requirements or dependency conditions) are ignored. Moreover, any scheduling policy (such as fairshare or job limit) specified in the batch configuration is also ignored. In this situation you might see some job slot limits, such as the maximum number of jobs that can run on a host, being violated. See `Job Slot Limits' on page 26 for details on job slot limits. However, after a job is forced to run, it can still be suspended due to the underlying queue's run window and threshold conditions and the job's execution hosts' threshold conditions. To override these so that a job can be run until completion, ignoring these load conditions, use the -f
option. An example of a job forced to run until completion is shown below:
Cluster Administrator (xlsadmin
) is a graphical tool designed to assist you in the management of your LSF cluster. This tool allows you to perform the management tasks described in Chapter 2, `Managing LSF Base', on page 45 and preceding portion of Chapter 3, `Managing LSF Batch', on page 79.
xlsadmin
consists of the following operating modes: management and configuration.
Management mode provides the tools to:
lshosts
, lsload
, bhosts
, and bqueues
commands.
lsadmin
and badmin
commands.
Configuration mode provides the tools to:
lsadmin reconfig
and badmin reconfig
commands.
Figure 2 shows the xlsadmin
Manage Base tab which displays all cluster hosts defined by LIM configuration. Figure 3 shows the xlsadmin
Manage Batch tab which displays all the configured batch hosts and queues.
System messages and LSF command responses are displayed:
In the message area at the bottom of the xlsadmin window.
In the message dialog activated by choosing View | Show Message Box...
Figure 2. xlsadmin
Manage Base Tab
Figure 3.xlsadmin
Manage Batch Tab
On the Manage Base and Manage Batch tabs, double-click any item to display status dialogs.
Right-click any item to perform the following control tasks:
Figure 4 shows the xlsadmin
Configure Base tab. This tab displays the base hosts defined by LIM configuration and provides tools to add, modify, and delete base hosts and global objects (host types, host models, and resources).
Figure 4. xlsadmin
Configure Base Tab
Figure 5 shows the xlsadmin
Configure Batch tab. This tab displays the configured batch hosts and queues and provides the tools to add, modify, and delete batch hosts, queues, host groups, user groups, host partitions, and batch parameters.
Figure 5. xlsadmin
Configure Batch Tab
To add, modify and delete base hosts, batch hosts and queues use the right-click menu and choose the appropriate command. Figure 6 shows the Cluster Host dialog used to edit and add base hosts, and Figure 7 shows the Batch Queue dialog used to edit and add queues.
To add, modify and delete host types, host models, resources (configured in lsf.shared
), host groups, host partitions (configured in lsb.hosts
), user groups (configured in lsb.users
), and batch parameters (configured in lsb.params
) choose the appropriate tool button. For example, Figure 7 shows the Resources dialog used to edit, add, and delete resources.
After making modifications to the cluster configuration, complete the following steps: