[Contents] [Index] [Top] [Bottom] [Prev] [Next]


6. LSF Base Configuration Reference

This chapter contains a detailed description of the contents of the LSF Base configuration files. These include the installation file lsf.conf; the LIM configuration files lsf.shared, lsf.cluster.cluster, lsf.task, and lsf.task.cluster; and the optional LSF hosts file for additional host name information.

The lsf.conf File

Installation of and operation of LSF is controlled by the lsf.conf file. The lsf.conf file is created during installation, and records all the settings chosen when LSF is installed. This information is used by LSF daemons and commands to locate other configuration files, executables, and network services.

lsf.conf contains LSF installation settings as well as some system-wide options. This file is initially created by the lsfsetup utility during LSF installation and updated, if necessary, when you upgrade to a new version. Many of the parameters are set during the installation. This file can also be expanded to include LSF application specific parameters.

LSB_CONFDIR

LSF Batch configuration directories are installed under LSB_CONFDIR. Configuration files for each LSF cluster are stored in a subdirectory of LSB_CONFDIR. This subdirectory contains several files that define the LSF Batch user and host lists, operation parameters, and batch queues.

All files and directories under LSB_CONFDIR must be readable from all hosts in the cluster. LSB_CONFDIR/cluster/configdir must be owned by the LSF administrator.

Default: LSF_CONFDIR/lsbatch

You should not try to redefine this parameter once LSF has been installed. If you want to move these directories to another location, you must make sure the permissions of directories and files are set properly. See Appendix B, `LSF Directories', beginning on page 255 for details.

LSB_DEBUG

If this is defined, LSF Batch will run in single user mode. In this mode, no security checking is performed, so the LSF Batch daemons should not run as root. When LSB_DEBUG is defined, LSF Batch will not look in the system services database for port numbers. Instead, it uses port number 40000 for mbatchd and port number 40001 for sbatchd unless LSB_MBD_PORT/LSB_SBD_PORT are defined in the file lsf.conf. The valid values for LSB_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF Batch.

Default: undefined

LSB_MAILPROG

LSF Batch normally uses /usr/lib/sendmail as the mail transport agent to send mail to users. If your site does not use sendmail, configure LSB_MAILPROG with the name of a sendmail-compatible transport program. LSF Batch calls LSB_MAILPROG with the following arguments:

LSB_MAILPROG -F "LSF Batch system" -f Manager@host dest_addr

The -F "LSF Batch System" argument sets the full name of the sender; the -f Manager@host argument gives the return address for LSF Batch mail, which is the LSF administrator's mailbox. dest_addr is the destination address, generated by the rules given for LSB_MAILTO above.

LSB_MAILPROG must read the body of the mail message from the standard input. The end of the message is marked by end-of-file. Any program or shell script that accepts the arguments and input and, delivers the mail correctly, can be used. LSB_MAILPROG must be executable by any user.

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: /usr/lib/sendmail

LSB_MAILTO

LSF Batch sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF Batch system. The default is to send mail to the user who submitted the job, on the host where the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.

The LSB_MAILTO parameter changes the mailing address used by LSF Batch. LSB_MAILTO is a format string that is used to build the mailing address. The substring !U, if found, is replaced with the user's account name; the substring !H is replaced with the name of the submission host. All other characters (including any other `!') are copied exactly. Common formats are:

!U
Mail is sent to the submitting user's account name on the local host.
!U@!H
Mail is sent to user@submission_hostname
!U@company_name.com
Mail is sent to user@company_name.com

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: !U

LSB_SHAREDIR

LSF Batch keeps job history and accounting log files for each cluster. These files are necessary for correct operation of the system. Like the organization under LSB_CONFDIR, there is one subdirectory for each cluster.

The LSB_SHAREDIR/cluster/logdir directory must be owned by the LSF administrator.

Default: LSF_INDEP/work

Note

All files and directories under LSB_SHAREDIR must allow read and write access from the LSF master host. See `Fault Tolerance' on page 5 and `Resource and Resource Requirements' on page 8.

LSF_AFS_CELLNAME

This must be defined to AFS cell name if the AFS file system is in use.

Default: undefined

LSB_LOCALDIR

This parameter needs to be defined if you want to use the duplicate event logging feature. This parameter specifies a directory that is local to the default master lost, that is, the first host configured in your lsf.cluster.<cluster> file. See `Duplicate Event Logging' on page 81 for more information about this topic.

LSF_AUTH

This is an optional definition. By default, external user authentication is used, and LSF_AUTH is defined to be eauth. External authentication is the only way to provide security for clusters that contain Windows NT hosts. See `External Authentication' on page 11 for details.

If this parameter is changed, all the LSF daemons must be shut down and restarted by running lsf_daemons start on each of the LSF server hosts so that the daemons will use the new authentication method.

If LSF_AUTH is defined as ident, RES uses the RFC 1413 identification protocol to verify the identity of the remote user. RES is also compatible with the older RFC 931 authentication protocol. The name, ident, must be registered in the system services database. See `Resource Requirements' on page 24 for instructions on registering service names.

If LSF_AUTH is not defined, LSF uses privileged ports for user authentication. LSF commands must be installed setuid to root to operate correctly. If the LSF commands are installed in an NFS mounted shared file system, the file system must be mounted with setuid execution allowed (that is, without the nosuid option). See the manual page for mount for more details.

Windows NT does not have the concept of setuid binaries and does not restrict access to privileged ports, so this method does not provide any security on Windows NT.

Default: eauth

LSF_EAUTH_KEY

This defines a key the eauth uses to encrypt and decrypt the user authentication data. If you want to improve the security of your site by specifying a key, make sure it is at least six characters long and uses only printable characters (like choosing a normal UNIX password).

If this parameter is not defined, then eauth will use an internal key.

Default: undefined

LSF_BINDIR

Directory where all user commands are installed.

Default: LSF_MACHDEP/bin

LSF_CONFDIR

The directory where all LIM configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster.

Default: LSF_INDEP/conf

LSF_CROSS_UNIX_NT

Optional. If this exists and has the value no, No, or NO, all cross-platform job submissions and requests will fail.

This means that in a mixed UNIX/NT cluster, jobs submitted from a UNIX user account on a UNIX host must be run on a UNIX host, and requests to stop or modify the job must be also submitted from a UNIX user account. Windows NT jobs can only be started, stopped, or modified by Windows NT user accounts on Windows NT hosts.

If this parameter is undefined, or defined as any other value, mixed UNIX/NT clusters operate properly, and only the user name is used for authentication of the user account.

Default: undefined

LSF_ECHKPNTDIR

Optional. Specifies the directory where the echkpnt and erestart executable files are installed, if they are not in the default location LSF_SERVERDIR.

Default: undefined

LSF_ENVDIR

LSF normally installs the lsf.conf file in the /etc directory. The lsf.conf file is installed by creating a shared copy in LSF_SERVERDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf.

Default: /etc

LSF_INCLUDEDIR

Directory under which the LSF API header file <lsf/lsf.h> is installed.

Default: LSF_INDEP/include

LSF_INDEP

Specifies the default top-level directory for all host-type independent LSF files. This includes manual pages, configuration files, working directories, and examples. For example, defining LSF_INDEP as /usr/local/lsf places manual pages in /usr/local/lsf/man, configuration files in /usr/local/lsf/conf, and so on.

Default: /usr/local/lsf

LSF_LIBDIR

Directory where the LSF application programming interface library liblsf.a is installed.

Default: LSF_MACHDEP/lib

LSF_LICENSE_FILE

Either the full path name of the FLEXlm license file used by LSF, or the host name of the license server host machine and port number of the license service (format: port_number@host_name). If this variable is not defined, on UNIX LIM looks for the license in /usr/local/flexlm/licenses/license.dat. On NT, LIM looks for license in C:\flexlm\licensed.

Default: LSF_CONFDIR/license.dat

LSF_LIM_DEBUG

If LSF_LIM_DEBUG is defined, the Load Information Manager (LIM) will operate in single user mode. No security checking is performed, so LIM should not run as root. LIM will not look in the services database for the LIM service port number. Instead, it uses port number 36000 unless LSF_LIM_PORT has been defined. The valid values for LSF_LIM_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF.

Default: undefined

LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT

Internet port numbers to use for communication with the LSF daemons. The port numbers are normally obtained by looking up the LSF service names in the /etc/services file or the NIS (UNIX). If it is not possible to modify the service database, these variables can be defined to set the port numbers.

With careful use of these settings along with the LSF_ENVDIR and PATH environment variables, it is possible to run two versions of the LSF software on a host, selecting between the versions by setting the PATH environment variable to include the correct version of the commands and the LSF_ENVDIR environment variable to point to the directory containing the appropriate lsf.conf file.

Default: get port numbers from services database on UNIX. On NT, these parameters are mandatory.

LSF_LOGDIR

This is an optional definition on UNIX and a mandatory parameter on NT.

Error messages from all servers are logged into files in this directory. If a server is unable to write in this directory, then the error logs are created in /tmp on UNIX and C:\temp on NT.

If LSF_LOGDIR is not defined, then syslog is used to log everything to the system log using the LOG_DAEMON facility. The syslog facility is available by default on most UNIX systems. The /etc/syslog.conf file controls the way messages are logged, and the files they are logged to. See the manual pages for the syslogd daemon and the syslog function for more information.

Default: log messages go to syslog

LSF_LOG_MASK

The message log level for LSF daemons. On UNIX, this is similar to syslog. All messages logged at the specified level or higher are recorded; lower level messages are discarded. The log levels in order from highest to lowest are:

The most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging.

Note that although message log level implements similar functionalities to UNIX syslog, there is no dependency on UNIX syslog. It works even if messages are being logged to files instead of syslog.

Default: LOG_WARNING

LSF_MACHDEP

Specifies the directory where host type dependent files are installed. In clusters with a single host type, LSF_MACHDEP is usually the same as LSF_INDEP. The machine dependent files are the user programs, daemons, and libraries. You should not need to modify this parameter.

Default: /usr/local/lsf

LSF_MANDIR

Directory under which all manual pages are installed. The manual pages are placed in the man1, man3, man5 and man8 subdirectories of the LSF_MANDIR directory. This is created by the LSF installation process and you should not need to modify this parameter.
Default: LSF_INDEP/man

Note

Manual pages are installed in a format suitable for BSD style man commands.

LSF_MISC

Directory where miscellaneous machine independent files such as LSF example source programs and scripts are installed.

Default: LSF_CONFDIR/misc

LSF_RES_ACCT

If defined, RES will log task information by default (see lsf.acct(5)). If this parameter is not defined, the LSF administrator must use the lsadmin command (see lsadmin(8)) to turn task logging on after the RES has started up. A CPU time (in msec) can be specified for the value for this parameter; only tasks that have consumed more than the specified CPU time will be logged. If it is defined as LSF_RES_ACCT=0, all tasks will be logged.

Default: undefined

LSF_RES_ACCTDIR

The directory where the RES task log file is stored. If LSF_RES_ACCTDIR is not defined, log file is stored in the /tmp directory.

Default: /tmp on UNIX. C:\temp on NT.

LSF_RES_DEBUG

If LSF_RES_DEBUG is defined, the Remote Execution Server (RES) will operate in single user mode. No security checking is performed, so RES should not run as root. RES will not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT has been defined. The valid values for LSF_RES_DEBUG are 1 and 2. You should always choose 1 unless you are testing RES.

Default: undefined

LSF_ROOT_REX

This is an optional definition.
If LSF_ROOT_REX is defined, RES accepts requests from the superuser (root) on remote hosts, subject to identification checking. If LSF_ROOT_REX is undefined, remote execution requests from user root are refused. Sites that have separate root accounts on different hosts within the cluster should not define LSF_ROOT_REX. Otherwise, this setting should be based on local security policies. If the value of this parameter is defined to `all', then root remote execution across the cluster is enabled. This applies to LSF MultiCluster only. Setting LSF_ROOT_REX to any other value only enables root remote execution within the local cluster.
Default: undefined. Root execution is not allowed.

LSF_SERVERDIR

Directory where all server binaries are installed. These include lim, res, nios, sbatchd, mbatchd, and eeventd (for LSF JobScheduler only). If you use elim, eauth, eexec, esub, etc, they should also be installed in this directory.

Default: LSF_MACHDEP/etc

LSF_SERVER_HOSTS

This defines one or more LSF server hosts that the application should contact to find a Load Information Manager (LIM). This is used on client hosts where no LIM is running on the local host. The LSF server hosts are hosts that run LSF daemons and provide loading-sharing services. Client hosts are hosts that only run LSF commands or applications but do not provide services to any hosts.

If LSF_SERVER_HOSTS is not defined, the application tries to contact the LIM on the local host. See `Associating Resources with Hosts' on page 60 for more details about server and client hosts.

The host names in LSF_SERVER_HOSTS must be enclosed in quotes and separated by white space; for example:

LSF_SERVER_HOSTS="hostA hostD hostB"

Default: undefined

LSF_STRIP_DOMAIN

This is an optional definition.

If all the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain, or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon `:'.

For example, given this definition of LSF_STRIP_DOMAIN:

LSF_STRIP_DOMAIN=.foo.com:.bar.com

LSF accepts hostA, hostA.foo.com, and hostA.bar.com as names for host hostA, and uses the name hostA in all output. The leading period `.' is required.

Default: undefined

LSF_USE_HOSTEQUIV

This is an optional definition.
If LSF_USE_HOSTEQUIV is defined, RES and mbatchd call the ruserok(3) function to decide if a user is allowed to run remote jobs. If LSF_USE_HOSTEQUIV is not defined, all normal users in the cluster can execute remote jobs on any host. If LSF_ROOT_REX is set, root can also execute remote jobs with the same permission test as for normal users.
Default: undefined

XLSF_APPDIR

The directory where X application default files for LSF products are installed. The LSF commands that use X look in this directory to find the application defaults. Users do not need to set environment variables to use the LSF X applications. The application default files are platform-independent.
Default: LSF_INDEP/misc

XLSF_UIDDIR

The directory where Motif User Interface Definition files are stored. These files are platform-specific.
Default: LSF_LIBDIR/uid

LSF_RES_RLIMIT_UNLIM

By default, the RES sets the hard limits for a remote task to be the same as the hard limits of the local process. This parameter specifies those hard limits which are to be set to unlimited, instead of inheriting those of the local process. Valid values are cpu, fsize, data, stack, core, and vmem, for cpu, file size, data size, stack, core size, and virtual memory limits, respectively.

For example:

LSF_RES_RLIMIT_UNLIM="cpu core stack"

will set the cpu, core size, and stack hard limits to be unlimited for all remote tasks.

Default: undefined

Note

The LSF_RES_RLIMIT_UNLIM parameter applies to LSF Base only.

The lsf.shared File

The lsf.shared file contains definitions that are used by all load sharing clusters. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.

Clusters

The mandatory Cluster section defines all cluster names recognized by the LSF system, with one line for each cluster.

The ClusterName keyword is mandatory. All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.

Begin Cluster
ClusterName
cluster1
cluster2
End Cluster

Host Types

The mandatory HostType section lists the valid host type names in the cluster. Each host is assigned a host type in the lsf.cluster.cluster file. All hosts that can run the same binary programs should have the same host type, even if they have different models of processor. LSF uses the host type as a default requirement for task placement. Unless specified otherwise, jobs are always run on hosts of the same type.

The TYPENAME keyword is mandatory. Host types are usually based on a combination of the hardware name and operating system. For example, a HP-PA system runs the HP-UX operating system, so you could assign the host type HPPA. If your site already has a system for naming host types, you can use the same names for LSF.

Begin HostType
TYPENAME
SUN41
SOLSPARC
ALPHA
HPPA
NTX86
End HostType

Host Models

The mandatory HostModel section lists the various models of machines and gives the relative CPU speed for each model. LSF uses the relative CPU speed to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The MODELNAME and CPUFACTOR keywords are mandatory.

Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system, and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

Begin HostModel
MODELNAME  CPUFACTOR
SparcIPC   1.0
Sparc10    2.0
End HostModel

The CPU factor affects the calculation of job execution time limits and accounting. Using large values for the CPU factor can cause confusing results when CPU time limits or accounting are used. See `Resource Limits' on page 217 for more information.

Resources

The section Resource is optional. This section is used to define resource names. The following keywords are supported:

RESOURCENAME

This parameter is mandatory for each resource to be configured. A resource name is an arbitrary character string, except the following reserved names:

r15s
The 15-second exponentially averaged CPU run queue length.
r1m
The 1-minute exponentially averaged CPU run queue length.
r15m
The 15-minute exponentially averaged CPU run queue length.
cpu
Alias for r1m.
ut
The CPU utilization, exponentially averaged over the last minute, between 0 and 1.
pg
The memory paging rate, exponentially averaged over the last minute, in pages per second.
io
The disk I/O rate exponentially averaged over the last minute, in KBytes per second.
ls
The number of current login users.
logins
Alias for ls.
it
The idle time of the host (keyboard not touched on all logged in sessions), in minutes.
idle
Alias for it.
tmp
The amount of free space in /tmp, in MBytes.
swp
The amount of currently available swap space, in MBytes.
swap
Alias for swp.
mem
The amount of currently available memory, in MBytes.
ncpus
The number of CPUs on the host.
ndisks
The number of local disks on the host.
maxmem
The maximum physical memory, in MBytes.
maxswp
The maximum swap space, in MBytes.
maxtmp
The maximum space in the disk partition containing the /tmp directory, in MBytes.
cpuf
The processor CPU factor.
type
The host type.
model
The host model.
status
The host status.

A resource name cannot begin with a number, and cannot contain any of the following characters:
: . ( ) [ + - * / ! & | < > @ =

TYPE
The TYPE is either boolean, numeric, or string. A boolean resource has a value of 1 on hosts which have that resource, and 0 otherwise. If TYPE is not given, the default type is boolean. Examples of boolean resource names include sparc (architecture), sysv (System V Unix), fs (file server), cs (compute server), and solaris (operating system).
INTERVAL
This parameter defines the time interval (in seconds) at which the resource is sampled by the external LIM. This keyword applies to dynamic resources only. A dynamic resource changes its value over time. An ELIM needs to be configured to sample and report this value to the LIM. If the resource has type numeric and has INTERVAL defined, then this resource becomes an external load index. This way of defining an external load index obsoletes the NewIndex section in the lsf.shared file. If INTERVAL is not given, the resource is considered static.
INCREASING
This parameter applies to numeric resources only. If a larger value means a greater load, then INCREASING should be defined as `Y', otherwise `N'.
DESCRIPTION
This is a brief description of the resource. The information defined here will be returned by the ls_info() API call or printed out by the lsinfo command as an explanation of the meaning of the resource.
RELEASE
This parameter controls whether a shared resource is released when a job is suspended. RELEASE applies to numeric resources only, such as floating licenses. When a job using a shared resource is suspended the resource is held or released by the job depending on the configuration of this parameter.
Specify N to hold the resource.
Specify Y to release the resource.
Default: Y

The lsf.cluster.cluster File

This is the load-sharing cluster configuration file. There is one such file for each load-sharing cluster in the system. The cluster suffix must agree with the name defined in the Cluster section of the lsf.shared file.

Parameters

The Parameters section is optional. This section contains miscellaneous parameters for the LIM.

PRODUCTS

The PRODUCTS line specifies which LSF product(s) will be enabled in the cluster. The PRODUCTS line can specify any combination of the strings `LSF_Base', `LSF_Batch', `LSF_JobScheduler', `LSF_MultiCluster', and `LSF_Analyzer' to enable the operation of LSF Base, LSF Batch, LSF JobScheduler, LSF MultiCluster, and LSF Analyzer, respectively. If any of `LSF_Batch', `LSF_JobScheduler', or `LSF_MultiCluster' are specified then `LSF_Base' is automatically enabled as well. Specifying the PRODUCTS line enables the feature for all hosts in the cluster. Individual hosts can be configured to run as LSF Batch servers or LSF JobScheduler servers within the same cluster. LSF MultiCluster is either enabled or disabled for multicluster operation for the entire cluster.

The PRODUCTS line is created automatically by the installation program lsfsetup. For example:

Begin Parameters
PRODUCTS=LSF_Base LSF_Batch
End Parameters

If the PRODUCTS line is not specified, the default is to enable the operation of `LSF_Base' and `LSF_Batch'.

Note

The features defined by the PRODUCTS line must match the license file used to serve the cluster. A host will be unlicensed if the license is unavailable for the component it was configured to run. For example, if you configure a cluster to run LSF JobScheduler on all hosts, and the license file does not contain the LSF JobScheduler feature, then the hosts will be unlicensed, even if there are licenses for LSF Base or LSF Batch.

Default: LSF_Base LSF_Batch

ELIMARGS

The ELIMARGS parameter specifies any necessary command line arguments for the external LIM. This parameter is ignored if no external load indices are configured.

Default: none

EXINTERVAL

The time interval (in seconds) at which the LIM daemons exchange load information. On extremely busy hosts or networks, load may interfere with the periodic communication between LIM daemons. Setting EXINTERVAL to a longer interval can reduce network load and slightly improve reliability, at the cost of slower reaction to dynamic load changes.

Default: 15 seconds

ELIM_POLL_INTERVAL

The time interval in seconds in which the LIM daemon samples load information. This parameter only needs to be set if an ELIM is being used to report information more frequently than every 5 seconds.

Default: 5 seconds

HOST_INACTIVITY_LIMIT

An integer reflecting a multiple of EXINTERVAL. This parameter controls the maximum time a slave LIM will take to send its load information to the master LIM as well as the frequency at which the master LIM will send a heartbeat message to its slaves. A slave LIM can send its load information any time from EXINTERVAL to (HOST_INACTIVITY_LIMIT-2)*EXINTERVAL seconds. A master LIM will send a master announce to each host at least every EXINTERVAL*HOST_INACTIVITY_LIMIT seconds.

Default: 5

MASTER_INACTIVITY_LIMIT

An integer reflecting a multiple of EXINTERVAL. A slave will attempt to become master if it does not hear from the previous master after (HOST_INACTIVITY_LIMIT +hostNo*MASTER_INACTIVITY_LIMIT)*EXINTERVAL seconds where hostNo is the position of the host in the lsf.cluster.cluster file.

Default: 2

PROBE_TIMEOUT

Before taking over as the master, a slave LIM will try to connect to the last known master via TCP. This parameter specifies the time-out in seconds to be used for the connect(2) system call.

Default: 2 seconds

RETRY_LIMIT

An integer reflecting a multiple of EXINTERVAL. This parameter controls the number of retries a master (slave) LIM makes before assuming the slave (master) is unavailable. If the master does not hear from a slave for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the slave for RETRY_LIMIT exchange intervals before it will declare the slave as unavailable. If a slave does not hear from the master for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the master for RETRY_LIMIT intervals before assuming the master is down.

Default: 2

LSF Administrators

The ClusterAdmins section defines the LSF administrator(s) for this cluster. Both UNIX user and group names may be specified with the ADMINISTRATORS keyword. The LIM will expand the definition of a group name using the getgrnam(3) call. The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/cluster. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/cluster are changed as well. All LSF administrators have the same authority to perform actions on LSF daemons, jobs, queues, or hosts in the system.

For backwards compatibility, ClusterManager and Manager are synonyms for ClusterAdmins and ADMINISTRATOR respectively. It is possible to have both sections present in the same lsf.cluster.cluster file to allow daemons from different LSF versions to share the same file.

If this section is not present, the default LSF administrator is root. For flexibility, each cluster may have its own LSF administrator(s), identified by a user name, although the same administrator(s) can be responsible for several clusters.

The ADMINISTRATOR parameter is normally set during the installation procedure.

Use the -l option of the lsclusters(1) command to display all the administrators within a cluster.

The following gives an example of a cluster with three LSF administrators. The user listed first, user2, is the primary administrator.

Begin ClusterAdmins
ADMINISTRATORS = user2 lsfgrp user7
End ClusterAdmins

Hosts

The Host section is the last section in lsf.cluster.cluster and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.

The order in which the hosts are listed in this section is important. The LIM on the first host listed becomes the master LIM if this host is up; otherwise, that on the second becomes the master if its host is up, and so on.

Since the master LIM makes all placement decisions for the cluster, you want it on a fast machine. Also, to avoid the delays involved in switching masters if the first machine goes down, you want the master to be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids a situation where the second host takes over as master when there are communication problems between subnets.

Configuration information is of two types. Some fields in a host entry simply describe the machine and its configuration. Other fields set thresholds for various resources. Both types are listed below.

Descriptive Fields

The HOSTNAME, model, type, and RESOURCES fields must be defined in the Host section. The server, nd, RUNWINDOW and REXPRI fields are optional.

HOSTNAME
The official name of the host as returned by hostname(1). Must be listed in lsf.shared as belonging to this cluster.
model
Host model. Must be one of those defined in the lsf.shared file. This determines the CPU speed scaling factor applied in load and placement calculations.
type
A host type as defined in the HostType section of lsf.shared. The strings used for host types are decided by the system administrator. For example, SPARC, DEC, or HPPA. The host type is used to identify binary-compatible hosts.
The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request then the task is run on a host of the same type as the sending host.
Often one host type can be used for many machine models. For example, the host type name SUN41 might be used for any computer with a SPARC processor running SunOS 4.1. This would include many Sun models and quite a few from other vendors as well.
server
1 if the host can receive jobs from other hosts, 0 otherwise. If server is set to 0, the host is an LSF client. Client hosts do not run the LSF daemons. Client hosts can submit interactive and batch jobs to an LSF cluster, but cannot execute jobs sent from other hosts. If this field is not defined, then the default is 1.
nd
The number of local disks. This corresponds to the ndisks static resource. On most host types, LSF automatically determines the number of disks, and the nd parameter is ignored.
nd should only count local disks with file systems on them. Do not count either disks used only for swapping or disks mounted with NFS.
Default: the number of disks determined by the LIM, or 1 if the LIM cannot determine this
RESOURCES
Boolean resources available on this host. The resource names are strings defined in the Resource section of the file lsf.shared. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs. For example, (fs frame hpux).
RUNWINDOW
Dispatch window during which the LIM recommends this host for task execution. When the host is not available for remote execution, the host status is lockW (locked by run window). LIM does not schedule interactive tasks on hosts locked by dispatch windows. Note that LSF Batch uses its own (optional) host dispatch windows to control batch job processing on batch server hosts.
A dispatch window consists of one or more time windows. See `How LSF Batch Schedules Jobs' on page 19 for a description of the format of time window specifications.
Default: always accept remote jobs
REXPRI
The default execution priority for interactive remote jobs run under the RES. Range: -20 to 20. REXPRI corresponds to the BSD style nice value used for remote jobs. For hosts with System V style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0 and +20 corresponds to 39. Higher values of REXPRI correspond to lower execution priority; -20 gives the highest priority, 0 is the default priority for login sessions, and +20 is the lowest priority.
Default: 0

Threshold Fields

The LIM uses these thresholds in determining whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host.

Note

The CPU run queue length threshold values (r15s, r1m, and r15m) are taken as effective queue lengths as reported by lsload -E.

All of these fields are optional; you only need to configure thresholds for load indices you wish to use for determining whether hosts are busy. Fields that are not configured are not considered when determining host status.

Thresholds can be set for any load index supported internally by the LIM, and for any external load index (see `Load Thresholds' on page 216).

This example Host section contains descriptive and threshold information for two hosts.

Begin Host
HOSTNAME model    type   server r1m pg tmp RESOURCES     RUNWIN DOW
hostA    SparcIPC Sparc      1  3.5 15   0 (sunos)       ()
hostD    Sparc10  Sparc      1  3.5 15   0 (sunos frame) (18:00 -08:00)
End Host

Resource Map

ResourceMap section is needed when you define shared resources in your cluster. This section specifies the mapping between shared resources and their sharing hosts. When you define resources in the Resources section of lsf.shared file, there is no distinction between a shared and non-shared resource. By default, all resources are not shared and are local to each host. By defining ResourceMap section, you can define resources that are shared by all hosts in the cluster, or resources that are shared by only some of the hosts in the cluster.

This section must appear after the Host section of the lsf.cluster.cluster file because it has a dependency on host names defined in the Host section. The following parameters must be defined in the ResourceMap section:

RESOURCENAME
The name of the resource. This resource name must be defined in the Resource section of the lsf.shared file.
LOCATION
This defines the hosts that share the resource. For a static resource, the value must be defined here as well. The syntax is:
(value@[instance] ...) ...
You must not define a value for a dynamic resource. instance is a list of host names that share an instance of the resource. The reserved words, all, others, and default can be specified for the instance:
all
Indicates that there is only one instance of the resource in the whole cluster, and that this resource is shared by all of the hosts.
others
Indicates that the rest of the server hosts not explicitly listed in the LOCATION field comprise one instance of the resource.
For example,
2@[apple] 4@[others]
Indicates that there are 2 units of the resource on apple, and 4 units of the resource shared by all other hosts.
default
Indicates an instance of a resource on each host in the cluster. This specifies a special case where the resource is in effect not shared and is local to every host. default means at each host. Normally you should not need to use default because by default all resources are local to each host. You might want to use ResourceMap for a non-shared static resource if you need to specify different values for the resource on different hosts.

The ResourceMap section may be specified as demonstrated in the following example:

Begin ResourceMap
RESOURCENAME   LOCATION
verilog        [all]
local          ([apple orange] [others])
End ResourceMap

The resources, "verilog" and "synopsys" must already have been defined in the RESOURCE section of the lsf.shared file. "verilog" is a static numeric resource shared by all hosts. The value for verilog is 5. "local" is a numeric shared resource that contains two instances in the cluster. The first instance is shared by two machines, apple and orange. The second instance is shared by all other hosts.

Resources defined in the ResourceMap section can be viewed by "-s" option of the lshosts (for static resource) and lsload (for dynamic resource) commands.

The lsf.task and lsf.task.cluster Files

Users should not have to specify a resource requirement each time they submit a job. LSF supports the concept of a task list.

A task list is a list maintained by LSF that keeps track of the default resource requirements for different applications. The term task refers to an application name. With a task list defined, LSF automatically supplies the resource requirement of the job whenever users submit a job unless one is explicitly specified together with the job submission.

LSF takes the job's command name as the task name and uses that name to find the matching resource requirement for the job from the task list. If a task does not have an entry in the task list, then LSF assumes the default resource requirement, that is a host that has the same host type as the submission host will be chosen to run the job.

LSF's task list can be configured at three levels: a system-wide task list that applies to all clusters and all users, a cluster-wide task list that applies to all users in the same cluster, and a user task list that applies only to the user. The system-wide task list and the cluster-wide task list are configured by the lsf.task and lsf.task.cluster files and are only modified by the cluster administrator. The user-specific task list is maintained in the .lsftask file in the user's home directory. Users use the lsrtasks command to manipulate his/her own task list.

LSF combines the system-wide, cluster-wide, and user-specific task lists for each user's view of the task list. In cases of conflicts, such as different resource requirements specified for the same task name in different lists, the cluster-wide list overrides system-wide list, and user-specific list overrides both.

Each task list file contains a RemoteTasks section that maps task names to resource requirements, one task per line. Each line in the section is an entry consisting of a task name and a resource requirement string separated by a slash `/'. A plus sign `+' or a minus sign `-' can optionally precede each entry. If no `+' or `-' is specified, then `+' is assumed. A `+' before a task name means adding a new entry (if non-existent) or replacing an entry (if already existent) in the task list. A `-' before a task name means removing an entry from the application's task lists if it was already created by reading higher level task files.

Below is an example of a task list file:

Begin RemoteTasks
+ "newjob/mem>25"
+ "verilog/select[type==any && swp>100]"
+ "f77/type==any"
+ "compressdir/fs"
End RemoteTasks

The hosts File

If your LSF clusters include hosts that have more than one interface and are configured with more than one official host name, you must either modify the host name configuration or create a private hosts file for LSF to use. The LSF hosts file is stored in LSF_CONFDIR. The format of LSF_CONFDIR/hosts is the same as for the /etc/hosts file.

For every host that has more than one official name, you must duplicate the hosts database information except that all entries for the host should use the same official name. Configure all the other names for the host as aliases so that people can still refer to the host by any name. For example, if your /etc/hosts file contains

AA.AA.AA.AA  host-AA host # first interface
BB.BB.BB.BB  host-BB      # second interface

then the LSF_CONFDIR/hosts file should contain:

AA.AA.AA.AA  host host-AA # first interface
BB.BB.BB.BB  host host-BB # second interface

The LSF hosts file should only contain entries for host with more than one official name. All other hosts names and addresses are resolved using the default method for your hosts. See `Hosts, Machines, and Computers' on page 3 for a detailed discussion of official host names.

The lsf.sudoers File

The format of this file is very similar to that of the lsf.conf file (see `The lsf.conf File' on page 161). Each line of the file is a NAME=VALUE statement, where NAME describes an authorized operation and VALUE is a single string or multiple strings enclosed in quotes. On UNIX, lines starting with `#' are comments and are ignored. On UNIX, the lsf.sudoers file is optional.

On Windows NT, except for the LSF_LOCAL_ADMIN_GROUP variable, the parameters described in the lsf.sudoers file are described in a Registry key located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LSF Service\lsf.sudoers.

The following variables are defined:

LSF_LOCAL_ADMIN_GROUP
Windows NT only. This is a Registry key that defines the local LSF administrators group. Members of this user group are assigned privileges that allow them to start and stop the LSF services.

The location of this value in the Registry is:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LSF Service

Default value: LSF Local Admins
LSF_STARTUP_USERS
UNIX only. This variable is equivalent to the local LSF administrators group in Windows NT, and it enables a list of specified users to start LSF daemons using the LSF administration commands lsadmin and badmin. By default, root is the only user who can start up the LSF daemons as root, and lsadmin and badmin must be installed as setuid root programs.
LSF_STARTUP_USERS="user1 user2"
This allows listed users to perform the startup operations. If this list contains only one user, quotes are not necessary.
LSF_STARTUP_USERS = all_admins
This allows all the LSF administrators configured in the lsf.cluster.cluster file to start up LSF daemons using the lsadmin and badmin commands.

CAUTION!

Defining LSF_STARTUP_USERS as all_admins incurs some security risk because administrators can be configured by a primary LSF administrator who is not root. You should explicitly list the login names of all authorized administrators here so that you have full control of who can start daemons as root.

LSF_STARTUP_PATH
The absolute pathname of the directory where the server binaries, namely lim, res, sbatchd, are installed. This is normally LSF_SERVERDIR as defined in your lsf.conf file. LSF will allow the specified administrators (see LSF_STARTUP_USERS or LSF_LOCAL_ADMIN_GROUP) to start the daemons installed in the LSF_STARTUP_PATH directory.
On UNIX, both LSF_STARTUP_USERS and LSF_STARTUP_PATH must be defined for this feature to work.
LSB_PRE_POST_EXEC_USER
This parameter defines the authorized user for the LSF Batch queue level pre-execution and post-execution commands. These commands can be configured at the queue level by the LSF administrator. If LSB_PRE_POST_EXEC_USER is defined, the queue level pre-execution and post-execution commands will be run as the user defined. If this parameter is not defined, the commands will be run as the user who submitted the job. In particular, you can define this variable if you need to run commands as root on UNIX.
See `Pre- and Post-execution Commands' on page 36 for details of pre-execution and post-execution.
You can only define a single username in this parameter.
LSF_EAUTH_USER
This defines the username to run the external authentication executable, eauth. If this is parameter is not defined, then eauth will be run as the primary LSF administrator. See `External Authentication' on page 11 for an explanation of external authentication.
LSF_EAUTH_KEY
This defines a key the eauth uses to encrypt and decrypt the user authentication data. If this parameter is not defined, then eauth will encrypt and decrypt authentication data using an internal key.
This parameter gives the user site a chance to improve their security. The rule of choosing the key is same as choosing the password. If you want to change the key, you should modify the lsf.sudoers file on every host. For the hosts to work together, they must all use the same key.
See `External Authentication' on page 11 for an explanation of external authentication.
LSF_EEXEC_USER
This defines the user name to run the external execution command, eexec. If this parameter is not defined, then eexec will be run as the user who submitted the job. See `External Submission and Execution Executables' on page 42 for an explanation of external execution.


[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.