[Contents] [Index] [Top] [Bottom] [Prev] [Next]


4. LSF Base Configuration Reference

This chapter contains a detailed description of the contents of the LSF Base configuration files. These include the installation file lsf.conf; the LIM configuration files lsf.shared, lsf.cluster.cluster, and the optional LSF hosts file for additional host name information.

The lsf.conf File

Installation of and operation of LSF is controlled by the lsf.conf file. The lsf.conf file is created during installation, and records all the settings chosen when LSF is installed. This information is used by LSF daemons and commands to locate other configuration files, executables, and network services.

lsf.conf contains LSF installation settings as well as some system wide options. This file is initially created by the lsfsetup utility during LSF installation, and updated if necessary when you upgrade to a new version. Many of the parameters are set during the installation. This file can also be expanded to include LSF application specific parameters.

LSB_CONFDIR

LSF JobScheduler configuration directories are installed under LSB_CONFDIR. Configuration files for each LSF cluster are stored in a subdirectory of LSB_CONFDIR. This subdirectory contains several files that define the LSF JobScheduler user and host lists, operation parameters, and queues.

All files and directories under LSB_CONFDIR must be readable from all hosts in the cluster. LSB_CONFDIR/cluster/configdir must be owned by the LSF administrator.

Default: LSF_CONFDIR/lsbatch

You should not try to redefine this parameter once LSF has been installed.

LSB_DEBUG

If this is defined, LSF JobScheduler will run in single user mode. In this mode, no security checking is performed, so the LSF JobScheduler daemons should not run as root. When LSB_DEBUG is defined, LSF JobScheduler will not look in the system services database for port numbers. Instead, it uses port number 40000 for mbatchd and port number 40001 for sbatchd unless LSB_MBD_PORT/LSB_SBD_PORT are defined in the file lsf.conf. The valid values for LSB_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF JobScheduler.

Default: undefined

LSB_MAILPROG

LSF JobScheduler normally uses /usr/lib/sendmail as the mail transport agent to send mail to users. If your site does not use sendmail, configure LSB_MAILPROG with the name of a sendmail-compatible transport program. LSF JobScheduler calls LSB_MAILPROG with the following arguments:

LSB_MAILPROG -F "LSF  system" -f Manager@host dest_addr

The -F "LSF System" argument sets the full name of the sender; the -f Manager@host argument gives the return address for LSF JobScheduler mail, which is the LSF administrator's mailbox. dest_addr is the destination address, generated by the rules given for LSB_MAILTO above.

LSB_MAILPROG must read the body of the mail message from the standard input. The end of the message is marked by end-of-file. Any program or shell script that accepts the arguments and input and delivers the mail correctly can be used. LSB_MAILPROG must be executable by any user.

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: /usr/lib/sendmail

LSB_MAILTO

LSF JobScheduler sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF JobScheduler system. The default is to send mail to the user who submitted the job, on the host where the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.

The LSB_MAILTO parameter changes the mailing address used by LSF JobScheduler. LSB_MAILTO is a format string that is used to build the mailing address. The substring !U, if found, is replaced with the user's account name; the substring !H is replaced with the name of the submission host. All other characters (including any other `!') are copied exactly. Common formats are:

!U - mail is sent to the submitting user's account name on the local host

!U@!H - mail is sent to user@submission_hostname

!U@company_name.com - mail is sent to user@company_name.com

If this parameter is modified, the LSF administrator must restart the sbatchd daemons on all hosts to pick up the new value.

Default: !U

LSB_SHAREDIR

LSF JobScheduler keeps job history and accounting log files for each cluster. These files are necessary for correct operation of the system. Like the organization under LSB_CONFDIR, there is one subdirectory for each cluster.

The LSB_SHAREDIR/cluster/logdir directory must be owned by the LSF administrator.

Default: LSF_INDEP/work

Note

All files and directories under LSB_SHAREDIR must allow read and write access from the LSF master host. See `Fault Tolerance' on page 2 and `Using LSF JobScheduler without Shared File Systems' on page 5.

LSF_BINDIR

Directory where all user commands are installed.

Default: LSF_MACHDEP/bin

LSF_CONFDIR

The directory where all LIM configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster.

Default: LSF_INDEP/conf

LSF_ENVDIR

LSF normally installs the lsf.conf file in the /etc directory. The lsf.conf file is installed by creating a shared copy in LSF_SERVERDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf.

Default: /etc

LSF_INCLUDEDIR

Directory under which the LSF API header file <lsf/lsf.h> is installed.

Default: LSF_INDEP/include

LSF_INDEP

Specifies the default top-level directory for all host type independent LSF files. This includes manual pages, configuration files, working directories, and examples. For example, defining LSF_INDEP as /usr/local/lsf/mnt places manual pages in /usr/local/lsf/mnt/man, configuration files in /usr/local/lsf/mnt/conf, and so on.

Default: /usr/local/lsf/mnt

LSF_LIBDIR

Directory where the LSF application programming interface library liblsf.a is installed.

Default: LSF_MACHDEP/lib

LSF_LICENSE_FILE

The full path name of the FLEXlm license file used by LSF. This variable is set to LSF_CONFDIR/license.dat by default at installation time.

Default: /usr/local/flexlm/licenses/license.dat
Default: C:\Flexlm\License.dat

LSF_LIM_DEBUG

If LSF_LIM_DEBUG is defined, the Load Information Manager (LIM) will operate in single user mode. No security checking is performed, so LIM should not run as root. LIM will not look in the services database for the LIM service port number. Instead, it uses port number 36000 unless LSF_LIM_PORT has been defined. The valid values for LSF_LIM_DEBUG are 1 and 2. You should always choose 1 unless you are testing LSF.

Default: undefined

LSF_LIM_PORT,
LSF_RES_PORT,
LSB_MBD_PORT,
LSB_SBD_PORT

Internet port numbers are used for communication with the LSF daemons. The port numbers are normally obtained by looking up the LSF service names in the /etc/services file or the services YP map. If it is not possible to modify the service database, these variables can be defined to set the port numbers.

With careful use of these settings along with the LSF_ENVDIR and PATH environment variables, it is possible to run two versions of the LSF software on a host, selecting between the versions by setting the PATH environment variable to include the correct version of the commands and the LSF_ENVDIR environment variable to point to the directory containing the appropriate lsf.conf file.

Default: get port numbers from services database

LSF_LOGDIR

This is an optional definition.

If LSF_LOGDIR is defined, error messages from all servers are logged into files in this directory. If a server is unable to write in this directory, then the error logs are created in /tmp.

If LSF_LOGDIR is not defined, then syslog is used to log everything to the system log using the LOG_DAEMON facility. The syslog facility is available by default on most UNIX systems. The /etc/syslog.conf file controls the way messages are logged, and the files they are logged to. See the manual pages for the syslogd daemon and the syslog function for more information.

UNIX Default: log messages go to syslog
Windows NT Default: log messages lost if LSF_LOGDIR undefined

LSF_LOG_MASK

The error message log level for LSF daemons. This definition applies no matter where the LSF daemons are logging messages. All messages logged at the specified level or higher are recorded; lower level messages are discarded. The log levels in order from highest to lowest are:

Most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging.

Default: LOG_WARNING

LSF_MACHDEP

Specifies the directory where host type dependent files are installed. The machine dependent files are the user programs, daemons, and libraries.

Default: /usr/local/lsf

LSF_MANDIR

Directory under which all manual pages are installed. The manual pages are placed in the man1, man3, man5 and man8 subdirectories of the LSF_MANDIR directory.

Default: LSF_INDEP/man

LSF_MISC

Directory where miscellaneous machine independent files such as LSF example source programs and scripts are installed.

Default: LSF_CONFDIR/misc

LSF_RES_DEBUG

If LSF_RES_DEBUG is defined, the Remote Execution Server (RES) will operate in single user mode. No security checking is performed, so RES should not run as root. RES will not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT has been defined. The valid values for LSF_RES_DEBUG are 1 and 2. You should always choose 1 unless you are testing RES.

Default: undefined

LSF_SERVERDIR

Directory where all server binaries are installed. These include lim, res, nios, sbatchd, mbatchd, eeventd. If you use elim, eauth, eexec, esub, etc, they should also be installed in this directory.

Default: LSF_MACHDEP/etc

LSF_SERVER_HOSTS

This defines one or more LSF server hosts that the application should contact to get in touch with a Load Information Manager (LIM). This is used on client hosts where no LIM is running on the local host. The LSF server hosts are hosts that run LSF daemons and provide load sharing services. Client hosts are hosts that only run LSF commands or applications but do not provide services to any hosts.

If LSF_SERVER_HOSTS is not defined, the application tries to contact the LIM on the local host.

The host names in LSF_SERVER_HOSTS must be enclosed in quotes and separated by white space; for example:

LSF_SERVER_HOSTS="hostA hostD hostB"

Default: undefined

LSF_STRIP_DOMAIN

This is an optional definition.

If all the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain, or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon `:'.

For example, given this definition of LSF_STRIP_DOMAIN:

LSF_STRIP_DOMAIN=.foo.com:.bar.com

LSF accepts hostA, hostA.foo.com, and hostA.bar.com as names for host hostA, and uses the name hostA in all output. The leading period `.' is required.

Default: undefined

XLSF_APPDIR

The directory where X application default files for LSF products are installed. The LSF commands that use X look in this directory to find the application defaults. Users do not need to set environment variables to use the LSF X applications. The application default files are platform-independent.

Default: LSF_INDEP/misc

XLSF_UIDDIR

The directory where Motif User Interface Definition files are stored. These files are platform specific.
Default: LSF_LIBDIR/uid

The lsf.shared File

The lsf.shared file contains definitions that are used by all load sharing clusters. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.

Clusters

The mandatory "Cluster" section defines all cluster names recognized by the LSF system, with one line for each cluster.

The ClusterName keyword is mandatory. All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.

Begin Cluster
ClusterName
cluster1
cluster2
End Cluster

Host Types

The mandatory HostType section lists the valid host type names in the cluster. Each host is assigned a host type in the lsf.cluster.cluster file. All hosts that can run the same binary programs should have the same host type, even if they have different models of processor. LSF uses the host type as a default requirement for task placement. Unless specified otherwise, jobs are always run on hosts of the same type.

The TYPENAME keyword is mandatory. Host types are usually based on a combination of the hardware name and operating system. If a job does not have a resource requirement specified, LSF runs the job on a host of the same type as the submission host, so you should give careful consideration to the host type for each host in the cluster. If your site already has a system for naming host types, you can use the same names for LSF.

Begin HostType
TYPENAME
SUN41
NT86
ALPHA
HPPA
End HostType

Host Models

The mandatory HostModel section lists the various models of machines and gives the relative CPU speed for each model. LSF uses the relative CPU speed to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The MODELNAME and CPUFACTOR keywords are mandatory.

It is up to you to identify the different host models in your system, but generally you need to identify first the distinct host types, such as HPPA and SPARC, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system, and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

Begin HostModel
MODELNAME  CPUFACTOR
SparcIPC   1.0
Sparc10    2.0
End HostModel

The CPU factor affects the calculation of job execution time limits and accounting. Using large values for the CPU factor can cause confusing results when CPU time limits or accounting are used.

Resources

The optional "Resource" section contains a list of resource names. Resource names are character strings chosen by the LSF administrator. You can use any name other than the reserved resource names. The keywords RESOURCENAME and DESCRIPTION are mandatory.

For a more general discussion of boolean resources, see Section 4, `Resources', beginning on page 45 of the LSF JobScheduler User's Guide.

Resource names must be strings of numbers and letters, beginning with a letter and no more than 29 characters long. You can define up to 32 resource names in lsf.shared.

This sample Resource section defines boolean resources to represent processor types, operating system versions, and software licenses:

Begin Resource
RESOURCENAME  DESCRIPTION
sparc         (Sparc CPU)
sunos4        (Running SunOS 4.x)
solaris       (Running Solaris 2.x)
frame         (FrameMaker license)
End Resource

The lsf.cluster.cluster File

This is the load-sharing cluster configuration file. There is one such file for each load sharing cluster in the system. The cluster suffix must agree with the name defined in the Cluster section of the lsf.shared file.

Parameters

The Parameters section is optional. This section contains miscellaneous parameters for the LIM.

PRODUCTS

The PRODUCTS line specifies which LSF products will be enabled in the cluster. The PRODUCT can specify any combination of the strings `LSF_Base', `LSF_Batch', `LSF_JobScheduler', `LSF_MultiCluster', and `LSF_Analyzer' to enable the operation of these products. If any of `LSF_Batch', `LSF_JobScheduler', or `LSF_MultiCluster' are specified, then `LSF_Base' is automatically enabled as well. Specifying the PRODUCTS line enables the product for all hosts in the cluster. Individual hosts can be configured to run as LSF JobScheduler servers or LSF Batch servers within the same cluster. LSF MultiCluster is either enabled or disabled for multicluster operation for the entire cluster.

The PRODUCTS line is created automatically by the installation program lsfsetup. For example:

Begin Parameters
PRODUCTS=LSF_Base LSF_JobScheduler
End Parameters

If the PRODUCTS line is not specified, the default is to enable the operation of LSF Base and LSF Batch.

Note

The products defined by the PRODUCTS line must match the license file used to serve the cluster. A host will be unlicensed if the license is unavailable for the component it was configured to run. For example, if you configure a cluster to run LSF JobScheduler on all hosts, and the license file does not contain the LSF JobScheduler product, then the hosts will be unlicensed, even if there are licenses for LSF Base or LSF Batch.

Default: LSF_Base LSF_Batch

LSF Administrators

The ClusterAdmins section defines the LSF administrator(s) for this cluster. Both UNIX user and group names may be specified with the ADMINISTRATORS keyword. The LIM will expand the definition of a group name using the getgrnam(3) call. The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/cluster. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/cluster are changed as well. All LSF administrators have the same authority to perform actions on LSF daemons, jobs, queues, or hosts in the system.

For backwards compatibility, ClusterManager and Manager are synonyms for ClusterAdmins and ADMINISTRATOR respectively. It is possible to have both sections present in the same lsf.cluster.cluster file to allow daemons from different LSF versions to share the same file.

If this section is not present, the default LSF administrator is root. For flexibility, each cluster may have its own LSF administrator(s), identified by a username, although the same administrator(s) can be responsible for several clusters.

The ADMINISTRATOR parameter is normally set during the installation procedure.

Use the -l option of the lsclusters(1) command to display all the administrators within a cluster.

The following gives an example of a cluster with three LSF administrators. The user listed first, user2, is the primary administrator.

Begin ClusterAdmins
ADMINISTRATORS = user2 lsfgrp user7
End ClusterAdmins

Hosts

The Host section is the last section in lsf.cluster.cluster and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.

The order in which the hosts are listed in this section is important. The LIM on the first host listed becomes the master LIM if this host is up; otherwise, that on the second becomes the master if its host is up, and so on.

Since the master LIM makes all placement decisions for the cluster, you want it on a fast machine. Also, to avoid the delays involved in switching masters if the first machine goes down, you want the master to be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids the situation where the second host takes over the master when there are communication problems between subnets.

Configuration information is of two types. Some fields in a host entry simply describe the machine and its configuration. Other fields set thresholds for various resources. Both types are listed below.

Descriptive Fields

The HOSTNAME, model, type, and RESOURCES fields must be defined in the Host section. The server, nd, RUNWINDOW and REXPRI fields are optional.

HOSTNAME - the official name of the host as returned by hostname(1). Must be listed in lsf.shared as belonging to this cluster.

model - host model. Must be one of those defined in the lsf.shared file. This determines the CPU speed scaling factor applied in load and placement calculations.

type - a host type as defined in the HostType section of lsf.shared. The strings used for host types are decided by the system administrator, e.g. SPARC, DEC, HPPA. The host type is used to identify binary-compatible hosts.

The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request then the task is run on a host of the same type as the sending host.

Often one host type can be used for many machine models. For example, the host type name SUN41 might be used for any computer with a SPARC processor running SunOS 4.1. This would include many Sun models and quite a few from other vendors as well.

server - 1 if the host can receive jobs from other hosts; 0 otherwise. If server is set to 0, the host is an LSF client. Client hosts do not run the LSF daemons. Client hosts can submit jobs to an LSF cluster, but cannot execute jobs sent from other hosts. If this field is not defined, then the default is 1.

RESOURCES - boolean resources available on this host. The resource names are strings defined in the Resource section of the file lsf.shared. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs. For example, (fs frame hpux).

The lsf.sudoers File

This file allows a list of permitted users to perform certain privileged operations in the LSF cluster as either the superuser or any other designated user. This file is optional.

The lsf.sudoers file must be located in /etc and it must be owned by root.

The format of this file is very similar to that of the lsf.conf file (see `The lsf.conf File' on page 75). Each line of the file is a NAME=VALUE statement, where NAME describes an authorized operation and VALUE is a single string or multiple strings enclosed in quotes. Lines starting with `#' are comments and are ignored.

The currently recognized variables in this file include:

LSF_STARTUP_USERS

This parameter is used to enable a list of specified users to start up LSF daemons as root using the LSF administrative commands lsadmin and badmin.

By default, the superuser is the only user who can start up the LSF as root.

Note that lsadmin and badmin must be installed as setuid root programs for this to work. Possible values for this variable include:

all_admins - this allows all LSF administrators configured in the lsf.cluster.cluster file to start up LSF daemons as root by running the lsadmin and badmin commands.

user1 user2 - this allows listed user(s) to perform the startup operations. If this list contains more than one user, it must be enclosed with quotes. For example:

LSF_STARTUP_USERS="user1 user2"

CAUTION!

Defining LSF_STARTUP_USERS as all_admins incurs some security risk because administrators can be configured by a primary LSF administrator who is not root. You should explicitly list the login names of all authorized administrators here so that you have full control of who can start daemons as root.

LSF_STARTUP_PATH

The absolute pathname of the directory where the server binaries, namely, lim, res, sbatchd, are installed. This is normally LSF_SERVERDIR as defined in your lsf.conf file. LSF will allow the users defined in LSF_STARTUP_USERS to start the daemons installed in the LSF_STARTUP_PATH directory as root.

Note

Both LSF_STARTUP_USERS and LSF_STARTUP_PATH must be defined for this feature to work.

LSB_PRE_POST_EXEC_USER

This parameter defines the authorized user for the LSF JobScheduler queue level pre-execution and post-execution commands. These commands can be configured at the queue level by the LSF administrator. If LSB_PRE_POST_EXEC_USER is defined, the queue level pre-execution and post-execution commands will be run as the user defined. If this parameter is not defined, the commands will be run as the user who submitted the job. In particular, you can define this variable if you need to run commands as root.

See `Pre- and Post-Execution Commands' on page 17 for details of pre-execution and post-execution.

You can only define a single user name in this parameter.

LSF_EAUTH_USER

This defines the user name to run the external authentication executable, eauth. If this is parameter is not defined, then eauth will be run as the primary LSF administrator.

LSF_EEXEC_USER

This defines the user name to run the external execution command, eexec. If this parameter is not defined, then eexec will be run as the user who submitted the job.



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.