[Contents] [Index] [Top] [Bottom] [Prev] [Next]


3. Managing LSF Batch

This chapter describes the operating concepts and maintenance tasks of the batch queuing system, LSF Batch. This chapter requires you to understand concepts from `Managing LSF Base' on page 45. The topics covered in this chapter are:

Managing LSF Batch Logs

Managing error log files for LSF Batch daemons was described in `Managing Error Logs' on page 45. This section discusses the other important log files LSF Batch daemons produce. The LSF Batch log files are found in the directory LSB_SHAREDIR/cluster/logdir.

LSF Batch Accounting Log

Each time a batch job completes or exits, an entry is appended to the lsb.acct file. This file can be used to create accounting summaries of LSF Batch system use. The bacct(1)command produces one form of summary. The lsb.acct file is a text file suitable for processing with awk, perl, or similar tools. See the lsb.acct(5) manual page for details of the contents of this file. Additionally, the LSF Batch API supports calls to process the lsb.acct records. See the LSF Programmer's Guide for details of LSF Batch API.

You should move the lsb.acct file to a backup location, and then run your accounting on the backup copy. The daemon automatically creates a new lsb.acct file to replace the moved file. This prevents problems that might occur if the daemon writes new log entries while the accounting programs are running. When the accounting is complete, you can remove or archive the backup copy.

LSF Batch Event Log

The LSF Batch daemons keep an event log in the lsb.events file. The mbatchd daemon uses this information to recover from server failures, host reboots, and LSF Batch reconfiguration. The lsb.events file is also used by the bhist command to display detailed information about the execution history of batch jobs, and by the badmin command to display the operational history of hosts, queues, and LSF Batch daemons.

For performance reasons, the mbatchd automatically backs up and rewrites the lsb.events file after every 1000 batch job completions (this is the default; the value is controlled by the MAX_JOB_NUM parameter in the lsb.params file). The old lsb.events file is moved to lsb.events.1, and each old lsb.events.n file is moved to lsb.events.n+1. The mbatchd never deletes these files. If disk storage is a concern, the LSF administrator should arrange to archive or remove old lsb.events.n files occasionally.

CAUTION!

Do not remove or modify the lsb.events file. Removing or modifying the lsb.events file could cause batch jobs to be lost.

Duplicate Event Logging

By default, LSF Batch stores all state information needed to recover from server failures, host reboots, or reconfiguration in a file in the LSB_SHAREDIR directory. Typically, the LSB_SHAREDIR directory resides on a reliable file server that also contains other critical applications necessary for running user's jobs. This is performed because, if the central file server is unavailable, user's applications cannot run, and the failure of LSF Batch to continue processing user's jobs is a secondary issue.

For sites not wishing to rely solely on a central file server for recovery information, LSF can be configured to maintain a replica of the recovery file. The replica is stored on the file server, and used if the primary copy is unavailable--referred to as duplicate event logging. When LSF is configured this way, the primary event log is stored on the first master host, and re-sychronized with the replicated copy when the host recovers.

Configuring Duplicate Event Logging

To enable the replication feature, define LSB_LOCALDIR in the lsf.conf file. LSB_LOCALDIR should be a local directory and it should exist only on the first master host (that is, the first host configured in the lsf.cluster.cluster file).

LSB_LOCALDIR is used to store the primary copy of the batch state information. The contents of LSB_LOCALDIR are copied to a replica in LSB_SHAREDIR which resides on a central file server. As before, LSB_SHAREDIR is assumed to be accessible from all hosts which can potentially become the master.

How Duplicate Event Logging Works

With the replication feature enabled the following scenarios can occur:

Failure of File Server

If the file server containing LSB_SHAREDIR goes down, LSF will continue to process jobs. Client commands such as bhist(1) and bacct(1) which directly read LSB_SHAREDIR will not work. When the file server recovers, the replica in LSB_SHAREDIR will be updated.

Failure of First Master Host

If the first master host fails, then the primary copy of the recovery file in the LSB_LOCALDIR directory becomes unavailable. A new master host will be selected which will use the recovery file replica in LSB_SHAREDIR to restore its state and to log future events. Note that there is no replication by the second master.

Recovery of First Master Host

When the first master host becomes available again, it will update the primary copy in LSB_LOCALDIR from the replica in LSB_SHAREDIR and continue operations as before.

The replication feature improves the reliability of LSF Batch operations provided that the following assumptions hold:

Controlling LSF Batch Servers

The lsadmin command is used to control LSF Base daemons, LIM, and RES. LSF Batch has the badmin command to perform similar operations on LSF Batch daemons.

LSF Batch System Status

To check the status of LSF Batch server hosts and queues, use the bhosts and bqueues commands:

bhosts
HOST_NAME          STATUS    JL/U  MAX  NJOBS  RUN  SSUSP USUSP  RSV
hostA                ok        2     1     0     0     0     0     0
hostB              closed      2     2     2     2     0     0     0
hostD                ok        -     8     1     1     0     0     0 bqueues QUEUE_NAME     PRIO      STATUS     MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
night           30    Open:Inactive  -     -    -    -    4     4     0    0
short           10    Open:Active    50    5    -    -    1     0     1    0
simulation      10    Open:Active    -     2    -    -    0     0     0    0
default          1    Open:Active    -     -    -    -    6     4     2    0

If the status of a batch server is `closed', then it will not accept more jobs. The LSF administrator can force a job to run using the brun(1) command. See `Forcing Job Execution -- brun -f' on page 98 for details.

Use the bhosts -l command to see more information about the status of closed servers. One of the following conditions will be indicated:

An inactive queue will accept new job submissions, but will not dispatch any new jobs. A queue can become inactive if the LSF cluster administrator explicitly inactivates it via badmin command, or if the queue has a dispatch or run window defined and the current time is outside the time window.

mbatchd automatically logs the history of the LSF Batch daemons in the LSF Batch event log. You can display the administrative history of the batch system using the badmin command.

The badmin hhist command displays the times when LSF Batch server hosts are opened and closed by the LSF administrator.

The badmin qhist command displays the times when queues are opened, closed, activated, and inactivated.

The badmin mbdhist command displays the history of the mbatchd daemon, including the times when the master starts, exits, reconfigures, or changes to a different host.

The badmin hist command displays all LSF Batch history information, including all the events listed above.

Remote Start-up of sbatchd

You can use badmin hstartup command to start sbatchd on some or all remote hosts from one host:

% badmin hstartup all
Start up slave batch daemon on <hostA> ......done
Start up slave batch daemon on <hostB> ......done
Start up slave batch daemon on <hostD> ......done

Note that you do not have to be root to use the badmin command to start LSF Batch daemons.

For remote startup to work, file /etc/lsf.sudoers has to be set up properly and you have to be able to run rsh across all LSF hosts without having to enter a password. See `The lsf.sudoers File' on page 189 for configuration details of lsf.sudoers.

Restarting sbatchd

mbatchd is restarted by the badmin reconfig command. sbatchd can be restarted using the badmin hrestart command:

% badmin hrestart hostD
Restart slave batch daemon on <hostD> ...... done

You can specify more than one host name to restart sbatchd on multiple hosts, or use all to refer to all LSF Batch server hosts. Restarting sbatchd on a host does not affect batch jobs that are running on that host.

Shutting Down LSF Batch Daemons

The badmin hshutdown command shuts down the sbatchd.

% badmin hshutdown hostD
Shut down slave batch daemon on <hostD> .... done

If sbatchd is shutdown, that particular host will not be available for running new jobs. Existing jobs running on that host will continue to completion, but the results will not be sent to the user until sbatchd is later restarted.

To shut down mbatchd you must first use the badmin hshutdown command to shut down the sbatchd on the master host, and then run the badmin reconfig command. The mbatchd is normally restarted by sbatchd; if there is no sbatchd running on the master host, badmin reconfig causes mbatchd to exit.

If mbatchd is shut down, all LSF Batch services will be temporarily unavailable. However, all existing jobs will not be affected. When mbatchd is later restarted, previous status will be restored from the event log file and job scheduling will continue.

Opening and Closing of Batch Server Hosts

Occasionally, you might want to drain a batch server host for the purposes of rebooting, maintenance, or host removal. This can be achieved by running the badmin hclose command:

% badmin hclose hostB
Close <hostB> ...... done

When a host is open, LSF Batch can dispatch jobs to that host. When a host is closed, no new batch jobs are dispatched, but jobs already dispatched to the host continue to execute. To reopen a batch server host, run badmin hopen command:

% badmin hopen hostB
Open <hostB> ...... done

To view the history of a batch server host, run badmin hhist command:

% badmin hhist hostB
Wed Nov 20 14:41:58: Host <hostB> closed by administrator <lsf>.
Wed Nov 20 15:23:39: Host <hostB> opened by administrator <lsf>.

Controlling LSF Batch Queues

Each batch queue can be open or closed, active or inactive. Users can submit jobs to open queues, but not to closed queues. Active queues start jobs on available server hosts, and inactive queues hold all jobs. The LSF administrator can change the state of any queue. Queues can also become active or inactive because of queue run or dispatch windows.

bqueues -- Queue Status

The current status of a particular queue or all queues is displayed by the bqueues(1) command. The bqueues  -l option also gives current statistics about the jobs in a particular queue such as the total number of jobs in this queue, the number of jobs running, suspended, and so on.

bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30    Open:Active      -    -    -    2     6     4    2     0

Opening and Closing Queues

When a batch queue is open, users can submit jobs to the queue. When a queue is closed, users cannot submit jobs to the queue. If a user tries to submit a job to a closed queue, an error message is printed and the job is rejected. If a queue is closed but still active, previously submitted jobs continue to be processed. This allows the LSF administrator to drain a queue.

badmin qclose normal
Queue <normal> is closed
bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30   Closed:Active     -    -    -    2     6     4    2     0
bsub -q normal hostname
normal: Queue has been closed
badmin qopen normal
Queue <normal> is opened

Activating and Inactivating Queues

When a queue is active, jobs in the queue are started if appropriate hosts are available. When a queue is inactive, jobs in the queue are not started. Queues can be activated and inactivated by the LSF administrator using the badmin qact and badmin qinact commands, or by configured queue run or dispatch windows.

If a queue is open and inactive, users can submit jobs to the queue but no new jobs are dispatched to hosts. Currently running jobs continue to execute. This allows the LSF administrator to let running jobs complete before removing queues or making other major changes.

badmin qinact normal
Queue <normal> is inactivated
bqueues normal
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
normal          30   Open:Inctive      -    -    -    -     0     0     0     0
badmin qact normal
Queue <normal> is activated

Managing LSF Batch Configuration

The LSF Batch cluster is a subset of the LSF Base cluster. All servers used by LSF Batch must belong to the base cluster; however, not all servers in the base cluster must provide LSF Batch services.

LSF Batch configuration consists of four files: lsb.params, lsb.hosts, lsb.users, and lsb.queues. These files are stored in LSB_CONFDIR/cluster/configdir, where cluster is the name of your cluster.

All these files are optional. If any of these files do not exist, LSF Batch will assume a default configuration.

The lsb.params file defines general parameters about LSF Batch system operation, such as the name of the default queue when the user does not specify one, scheduling intervals for mbatchd and sbatchd, and so on. Detailed parameters are described in `The lsb.params File' on page 193.

The lsb.hosts file defines LSF Batch server hosts together with their attributes. Not all LSF hosts defined by LIM configuration have to be configured to run batch jobs. Batch server host attributes include scheduling load thresholds, dispatch windows, and job slot limits. This file is also used to define host groups and host partitions. See `The lsb.hosts File' on page 202 for details of this file.

The lsb.users file contains user-related parameters such as user groups, user job slot limits, and account mapping. See `The lsb.users File' on page 198 for details.

The lsb.queues file defines job queues. Numerous controls are available at the queue level to allow cluster administrators to customize site resource allocation policies. See `The lsb.queues File' on page 208 for more details.

When you first install LSF on your cluster, some example queues are already configured for you. You should customize these queues or define new queues to meet your site needs.

Note

After changing any of the LSF Batch configuration files, you need to run badmin reconfig to tell mbatchd to pick up the new configuration. You must also run this every time you change LIM configuration.

Adding a Batch Server Host

You can add a batch server host to LSF Batch configuration following the steps below:

  1. If you are adding a host that has not been added to the LSF Base cluster yet, perform steps described in `Adding a Host to a Cluster' on page 56.
  2. Modify LSB_CONFDIR/cluster/configdir/lsb.hosts file to add the new host together with its attributes. If you want to limit the added host for use only by some queues, you should also update lsb.queues file. Since host types and host models as well as the virtual name `default' can be used to refer to all hosts of that type, model, or every other LSF host not covered by the definitions, you might not need to change any of the files if the host is already covered.
  3. Run badmin reconfig to tell mbatchd to pick up the new configuration.
  4. Start sbatchd on the added host by running badmin hstartup or simply start it manually.

Removing a Batch Server Host

To remove a host as a batch server host, follow the steps below:

  1. If you need to permanently remove a host from your cluster, you should use badmin hclose to prevent new batch jobs from starting on the host, and wait for any running jobs on that host to finish. If you want to shut the host down before all jobs complete, use bkill to kill the running jobs.
  2. Modify lsb.hosts and lsb.queues in LSB_CONFDIR/cluster/configdir directory and remove the host from any of the sections.
  3. Run badmin hshutdown to shutdown sbatchd on that host.

CAUTION!

You should never remove the master host from LSF Batch. Change LIM configuration to assign a different default master host if you want to remove your current default master from the LSF Batch server pool.

Adding a Batch Queue

Adding a batch queue does not affect pending or running LSF Batch jobs. To add a batch queue to a cluster:

  1. Log in as the LSF administrator on any host in the cluster.
  2. Edit the LSB_CONFDIR/cluster/configdir/lsb.queues file to add the new queue definition. You can copy another queue definition from this file as a starting point; remember to change the QUEUE_NAME of the copied queue. Save the changes to lsb.queues. See `The lsb.queues File' on page 208 for a complete description of LSF Batch queue configuration.
  3. Run the command badmin ckconfig to check the new queue definition. If any errors are reported, fix the problem and check the configuration again. See `Overview of LSF Configuration Files' on page 50 for an example of normal output from badmin ckconfig.
  4. When the configuration files are ready, run badmin reconfig. The master batch daemon (mbatchd) is unavailable for approximately one minute while it reconfigures. Pending and running jobs are not affected.

Removing a Batch Queue

Before removing a queue, you should make sure there are no jobs in that queue. If you remove a queue that has jobs in it, the jobs are temporarily moved to a lost and found queue. Jobs in the lost and found queue remain pending until the user or the LSF administrator uses the bswitch command to switch the jobs into regular queues. Jobs in other queues are not affected.

In this example, move all pending and running jobs in the night queue to the idle queue, and then delete the night queue.

  1. Log in as the LSF administrator on any host in the cluster.
  2. Close the queue to prevent any new jobs from being submitted:
% badmin qclose night
Queue <night> is closed
  1. Move all pending and running jobs into another queue. The bswitch -q night argument chooses jobs from the night queue, and the job ID number 0 specifies that all jobs should be switched:
bjobs -u all -q night
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
5308  user5 RUN   night    hostA       hostD       sleep 500  Nov 21 18:16
5310  user5 PEND  night    hostA                   sleep 500  Nov 21 18:17

% bswitch -q night idle 0
Job <5308> is switched to queue <idle>
Job <5310> is switched to queue <idle>
  1. Edit the LSB_CONFDIR/cluster/configdir/lsb.queues file. Remove (or comment out) the definition for the queue being removed. Save the changes.
  2. Run the command badmin reconfig. If any problems are reported, fix them and run badmin reconfig again. The batch system is unavailable for about one minute while the system rereads the configuration.

Validating Job Submissions

A user's job can be rejected at submission time if the submission parameters cannot be validated. Sites can implement their own policy to determine valid values or combinations of submission parameters. The validation checking is performed by an external submission program (esub) located in LSF_SERVERDIR (see `External Submission and Execution Executables' on page 42).

The esub is invoked at job submission and modification time. It is also invoked when a checkpointed job is restarted. In each of these cases the user is allowed to specify parameters that affect the scheduling or execution of the job. To validate these parameters, the esub is invoked with two environment variables set:

The LSB_SUB_PARM_FILE Option Names are shown below:

LSB_SUB_JOB_NAME
the specified job name
LSB_SUB_QUEUE
value is the specified queue name
LSB_SUB_IN_FILE
the specified standard input file name
LSB_SUB_OUT_FILE
the specified standard output file name
LSB_SUB_ERR_FILE
the specified standard error file name
LSB_SUB_EXCLUSIVE
"Y" specifies exclusive execution
LSB_SUB_NOTIFY_END
ends
LSB_SUB_NOTIFY_BEGIN
"Y" specifies email notification when job begins
LSB_SUB_USER_GROUP
the specified user group name
LSB_SUB_CHKPNT_PERIOD
the specified checkpoint period
LSB_SUB_CHKPNT_DIR
the specified checkpoint directory
LSB_SUB_RESTART_FORCE
"Y" specifies forced restart job
LSB_SUB_RESTART
"Y" specifies a restart job.
LSB_SUB_RERUNNABLE
"Y" specifies a rerunnable job.
LSB_SUB_WINDOW_SIG
the specified window signal number
LSB_SUB_HOST_SPEC
the specified hostspec
LSB_SUB_DEPEND_COND
the specified dependency condition
LSB_SUB_RES_REQ
the specified resource requirement string
LSB_SUB_PRE_EXEC
the specified pre-execution command
LSB_SUB_LOGIN_SHELL
the specified login shell
LSB_SUB_MAIL_USER
the specified user for sending email
LSB_SUB_MODIFY
"Y" specifies a modification request
LSB_SUB_MODIFY_ONCE
"Y" specifies a modification-once request
LSB_SUB_PROJECT_NAME
the specified project name
LSB_SUB_INTERACTIVE
"Y" specifies an interactive job
LSB_SUB_PTY
"Y" specifies an interactive job with PTY support
LSB_SUB_PTY_SHELL
"Y" specifies an interactive job with PTY shell support
LSB_SUB_TIME_EVENT
the time event expression
LSB_SUB_HOSTS
the list of execution host names
LSB_SUB_NUM_PROCESSORS
the minimum number of processors requested
LSB_SUB_MAX_NUM_PROCESSORS
the maximum number of processors requested
LSB_SUB_BEGIN_TIME
the begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_TERM_TIME
the termination time, in seconds since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_OTHER_FILES
always "SUB_RESET" if defined to indicate a bmod is being performed to reset the number of files to be transferred
LSB_SUB_OTHER_FILES_nn
nn
is an index number indicating the particular file transfer

value is the specified file transfer expression; for example, for 'bsub -f "a > b" -f "c < d"', the following would be defined:
LSB_SUB_OTHER_FILES_0="a > b"
LSB_SUB_OTHER_FILES_1="c < d"
LSB_SUB_EXCEPTION
the specified exception condition
LSB_SUB_RLIMIT_CPU
the specified cpu limit
LSB_SUB_RLIMIT_FSIZE
the specified file limit
LSB_SUB_RLIMIT_DATA
the specified data size limit
LSB_SUB_RLIMIT_STACK
the specified stack size limit
LSB_SUB_RLIMIT_CORE
the specified core file size limit
LSB_SUB_RLIMIT_RSS
the specified resident size limit
LSB_SUB_RLIMIT_RUN
the specified wall clock run limit

Any messages that need to be provided to the user should be directed to the standard error stream and not the standard output stream.

One use of this feature is to support project-based accounting. The user can request that the resources used by a job be charged to a particular project. Projects are defined outside of the LSF configuration files, so LSF will accept any arbitrary string for a project name. In order to ensure that only valid projects are entered and the user is eligible to charge to that project, an esub can be written.

The following is an example of an external submission program written in UNIX bare shell to do this.

#!/bin/sh
. $LSB_SUB_PARM_FILE

# Redirect stderr to stdout so echo can be used for error messages
exec 1>&2
# Check valid projects
if [ $LSB_PROJECT -ne "proj1" -o $LSB_PROJECT -ne "proj2" ] then
   echo "Invalid project name specified"
   exit $LSB_SUB_ABORT_VALUE
fi USER=`whoami`
if [ $LSB_PROJECT -eq "proj1" ]; then
   # Only user1 and user2 can charge to proj1
   if [$USER -ne "user1" -a $USER -ne "user2" ]; then
      echo "You are not allowed to charge to this project"
      exit $LSB_SUB_ABORT_VALUE
   fi
fi

Controlling LSF Batch Jobs

The LSF administrator can control batch jobs belonging to any user. Other users can control only their own jobs. Jobs can be suspended, resumed, killed, and moved within and between queues.

Moving Jobs -- bswitch, btop, and bbot

The bswitch command moves pending and running jobs from queue to queue. The btop and bbot commands change the dispatching order of pending jobs within a queue. The LSF administrator can move any job. Other users can move only their own jobs.

The btop and bbot commands do not allow users to move their own jobs ahead of those submitted by other users. Only the execution order of the user's own jobs is changed. The LSF administrator can move one user's job ahead of another user's. The btop, bbot, and bswitch commands are described in the LSF Batch User's Guide and in the btop(1)and bswitch(1)manual pages.

Signalling Jobs -- bstop, bresume, and bkill

The bstop, bresume, and bkill commands send signals to batch jobs. See the kill(1) manual page for a discussion of the signals on UNIX.

bstop sends SIGSTOP to sequential jobs and SIGTSTP to parallel jobs.

bresume sends a SIGCONT.

bkill sends the specified signal to the process group of the specified jobs. If the -s option is not present, the default operation of bkill is to send a SIGKILL signal to the specified jobs to kill these jobs. Twenty seconds before SIGKILL is sent, SIGTERM and SIGINT are sent to give the job a chance to catch the signals and clean up.

On Windows NT, job control messages replace the SIGINT and SIGTERM signals, but only customized applications will be able to process them. Termination is implemented by the TerminateProcess( ) system call.

Users are only allowed to send signals to their own jobs. The LSF administrator can send signals to any job. See the LSF Batch User's Guide and the manual pages for more information about these commands.

This example shows the use of the bstop and bkill commands:

bstop 5310
Job <5310> is being stopped
bjobs 5310
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST  JOB_NAME   SUBMIT_TIME
5310  user5 PSUSP night    hostA                  sleep 500  Nov 21 18:17
bkill 5310
Job <5310> is being terminated
bjobs 5310
JOBID USER  STAT  QUEUE    FROM_HOST   EXEC_HOST  JOB_NAME   SUBMIT_TIME
5310  user5 EXIT  night    hostA                  sleep 500  Nov 21 18:17

Forcing Job Execution -- brun -f

A pending batch job can be forced to run by using the brun command. This operation can only be performed by an LSF administrator. To force a job to run, you must specify the host on which that job will run. For parallel jobs, a list of hosts can be specified. The number of host names in the list must be at least equal to the minimum number of processors requested by the job. For example, the following command will force the sequential job 104 to run on hostA:

% brun -m hostA 104

The following command will force the parallel job 105 to run on hostA, hostB, hostC, and hostD.

% brun -m "hostA hostB hostC hostD" 105

If the job had requested more than 4 processors at a minimum, the request would have been rejected. If the number of hosts specified for a parallel job is greater than the maximum number of processors the job requested, the extra hosts are ignored.

When a job is forced to run, any other constraints associated with the job (such as resource requirements or dependency conditions) are ignored. Moreover, any scheduling policy (such as fairshare or job limit) specified in the batch configuration is also ignored. In this situation you might see some job slot limits, such as the maximum number of jobs that can run on a host, being violated. See `Job Slot Limits' on page 26 for details on job slot limits. However, after a job is forced to run, it can still be suspended due to the underlying queue's run window and threshold conditions and the job's execution hosts' threshold conditions. To override these so that a job can be run until completion, ignoring these load conditions, use the -f option. An example of a job forced to run until completion is shown below:

% brun -f -m hostA 124

Managing an LSF Cluster Using xlsadmin

Cluster Administrator (xlsadmin) is a graphical tool designed to assist you in the management of your LSF cluster. This tool allows you to perform the management tasks described in Chapter 2, `Managing LSF Base', on page 45 and preceding portion of Chapter 3, `Managing LSF Batch', on page 79.

xlsadmin consists of the following operating modes: management and configuration.

Management mode provides the tools to:

Configuration mode provides the tools to:

xlsadmin Management Mode

Figure 2 shows the xlsadmin Manage Base tab which displays all cluster hosts defined by LIM configuration. Figure 3 shows the xlsadmin Manage Batch tab which displays all the configured batch hosts and queues.

System messages and LSF command responses are displayed:

In the message area at the bottom of the xlsadmin window.
In the message dialog activated by choosing View | Show Message Box...

Figure 2. xlsadmin Manage Base Tab

Figure 3.xlsadmin Manage Batch Tab

On the Manage Base and Manage Batch tabs, double-click any item to display status dialogs.

Right-click any item to perform the following control tasks:

xlsadmin Configuration Mode

Figure 4 shows the xlsadmin Configure Base tab. This tab displays the base hosts defined by LIM configuration and provides tools to add, modify, and delete base hosts and global objects (host types, host models, and resources).

Figure 4. xlsadmin Configure Base Tab

Figure 5 shows the xlsadmin Configure Batch tab. This tab displays the configured batch hosts and queues and provides the tools to add, modify, and delete batch hosts, queues, host groups, user groups, host partitions, and batch parameters.

Figure 5. xlsadmin Configure Batch Tab

To add, modify and delete base hosts, batch hosts and queues use the right-click menu and choose the appropriate command. Figure 6 shows the Cluster Host dialog used to edit and add base hosts, and Figure 7 shows the Batch Queue dialog used to edit and add queues.

Figure 6. Cluster Host Dialog

To add, modify and delete host types, host models, resources (configured in lsf.shared), host groups, host partitions (configured in lsb.hosts), user groups (configured in lsb.users), and batch parameters (configured in lsb.params) choose the appropriate tool button. For example, Figure 7 shows the Resources dialog used to edit, add, and delete resources.

Figure 7. Batch Queue Dialog

After making modifications to the cluster configuration, complete the following steps:

  1. Choose File | Check. View the messages displayed in the message area and correct any errors.
  2. Save all modifications by choosing File | Save to Files...
  3. Reconfigure the LSF cluster using the modified configuration by choosing File | Commit.



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.