This chapter contains a detailed description of the contents
of the LSF Base configuration files. These include the installation file lsf.conf
;
the LIM configuration files lsf.shared
, lsf.cluster.
cluster
,
and the optional LSF hosts file for additional host name information.
Installation of and operation of LSF is controlled by the lsf.conf
file. The lsf.conf
file is created during installation, and records all the settings chosen when LSF is installed. This information is used by LSF daemons and commands to locate other configuration files, executables, and network services.
lsf.conf
contains LSF installation settings as well as some system wide options. This file is initially created by the lsfsetup
utility during LSF installation, and updated if necessary when you upgrade to a new version. Many of the parameters are set during the installation. This file can also be expanded to include LSF application specific parameters.
LSF JobScheduler configuration directories are installed under LSB_CONFDIR
. Configuration files for each LSF cluster are stored in a subdirectory of LSB_CONFDIR
. This subdirectory contains several files that define the LSF JobScheduler user and host lists, operation parameters, and queues.
All files and directories under LSB_CONFDIR
must be readable from all hosts in the cluster. LSB_CONFDIR/
cluster/configdir
must be owned by the LSF administrator.
You should not try to redefine this parameter once LSF has been installed.
If this is defined, LSF JobScheduler will run in single user mode. In this mode, no security checking is performed, so the LSF JobScheduler daemons should not run as root
. When LSB_DEBUG is defined, LSF JobScheduler will not look in the system services database for port numbers. Instead, it uses port number 40000 for mbatchd
and port number 40001 for sbatchd
unless LSB_MBD_PORT
/LSB_SBD_PORT
are defined in the file lsf.conf
. The valid values for LSB_DEBUG
are 1 and 2. You should always choose 1 unless you are testing LSF JobScheduler.
LSF JobScheduler normally uses /usr/lib/sendmail
as the mail transport agent to send mail to users. If your site does not use sendmail
, configure LSB_MAILPROG
with the name of a sendmail
-compatible transport program. LSF JobScheduler calls LSB_MAILPROG
with the following arguments:
LSB_MAILPROG-F "LSF system" -f
Manager@
hostdest_addr
The -F "LSF System"
argument sets the full name of the sender; the -f
Manager@
host argument gives the return address for LSF JobScheduler mail, which is the LSF administrator's mailbox. dest_addr is the destination address, generated by the rules given for LSB_MAILTO
above.
LSB_MAILPROG
must read the body of the mail message from the standard input. The end of the message is marked by end-of-file. Any program or shell script that accepts the arguments and input and delivers the mail correctly can be used. LSB_MAILPROG
must be executable by any user.
If this parameter is modified, the LSF administrator must restart the sbatchd
daemons on all hosts to pick up the new value.
LSF JobScheduler sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF JobScheduler system. The default is to send mail to the user who submitted the job, on the host where the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.
The LSB_MAILTO
parameter changes the mailing address used by LSF JobScheduler. LSB_MAILTO
is a format string that is used to build the mailing address. The substring !U
, if found, is replaced with the user's account name; the substring !H
is replaced with the name of the submission host. All other characters (including any other `!
') are copied exactly. Common formats are:
!U
- mail is sent to the submitting user's account name on the local host
!U@!H
- mail is sent to user@submission_hostname
!U@company_name.com
- mail is sent to user@company_name.com
If this parameter is modified, the LSF administrator must restart the sbatchd
daemons on all hosts to pick up the new value.
LSF JobScheduler keeps job history and accounting log files for each cluster. These files are necessary for correct operation of the system. Like the organization under LSB_CONFDIR
, there is one subdirectory for each cluster.
The LSB_SHAREDIR/
cluster/logdir
directory must be owned by the LSF administrator.
All files and directories under LSB_SHAREDIR
must allow read and write access from the LSF master host. See `Fault Tolerance' on page 2 and `Using LSF JobScheduler without Shared File Systems' on page 5.
Directory where all user commands are installed.
The directory where all LIM configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster.
LSF normally installs the lsf.conf
file in the /etc
directory. The lsf.conf
file is installed by creating a shared copy in LSF_SERVERDIR
and adding a symbolic link from /etc/lsf.conf
to the shared copy. If LSF_ENVDIR
is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf
.
Directory under which the LSF API header file <lsf/lsf.h>
is installed.
Specifies the default top-level directory for all host type independent LSF files. This includes manual pages, configuration files, working directories, and examples. For example, defining LSF_INDEP
as /usr/local/lsf/mnt
places manual pages in /usr/local/lsf/mnt/man
, configuration files in /usr/local/lsf/mnt/conf
, and so on.
Directory where the LSF application programming interface library liblsf.a
is installed.
The full path name of the FLEXlm license file used by LSF. This variable is set to LSF_CONFDIR/license.dat
by default at installation time.
Default:
/usr/local/flexlm/licenses/license.dat
Default:
C:\Flexlm\License.dat
If LSF_LIM_DEBUG
is defined, the Load Information Manager (LIM) will operate in single user mode. No security checking is performed, so LIM should not run as root. LIM will not look in the services database for the LIM service port number. Instead, it uses port number 36000 unless LSF_LIM_PORT
has been defined. The valid values for LSF_LIM_DEBUG
are 1 and 2. You should always choose 1 unless you are testing LSF.
Internet port numbers are used for communication with the LSF daemons. The port numbers are normally obtained by looking up the LSF service names in the /etc/services
file or the services YP map. If it is not possible to modify the service database, these variables can be defined to set the port numbers.
With careful use of these settings along with the LSF_ENVDIR
and PATH
environment variables, it is possible to run two versions of the LSF software on a host, selecting between the versions by setting the PATH
environment variable to include the correct version of the commands and the LSF_ENVDIR
environment variable to point to the directory containing the appropriate lsf.conf
file.
Default: get port numbers from services database
This is an optional definition.
If LSF_LOGDIR
is defined, error messages from all servers are logged into files in this directory. If a server is unable to write in this directory, then the error logs are created in /tmp
.
If
LSF_LOGDIR
is not defined, thensyslog
is used to log everything to the system log using theLOG_DAEMON
facility. The syslog facility is available by default on most UNIX systems. The/etc/syslog.conf
file controls the way messages are logged, and the files they are logged to. See the manual pages for thesyslogd
daemon and thesyslog
function for more information.
UNIX Default: log messages go to syslog
Windows NT Default: log messages lost if LSF_LOGDIR undefined
The error message log level for LSF daemons. This definition applies no matter where the LSF daemons are logging messages. All messages logged at the specified level or higher are recorded; lower level messages are discarded. The log levels in order from highest to lowest are:
Most important LSF log messages are at the LOG_ERR
or LOG_WARNING
level. Messages at the LOG_INFO
and LOG_DEBUG
level are only useful for debugging.
Specifies the directory where host type dependent files are installed. The machine dependent files are the user programs, daemons, and libraries.
Directory under which all manual pages are installed. The manual pages are placed in the man1
, man3
, man5
and man8
subdirectories of the LSF_MANDIR
directory.
Directory where miscellaneous machine independent files such as LSF example source programs and scripts are installed.
If LSF_RES_DEBUG
is defined, the Remote Execution Server (RES) will operate in single user mode. No security checking is performed, so RES should not run as root. RES will not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT
has been defined. The valid values for LSF_RES_DEBUG
are 1 and 2. You should always choose 1 unless you are testing RES.
Directory where all server binaries are installed. These include lim
, res
, nios
, sbatchd
, mbatchd
, eeventd
. If you use elim
, eauth
, eexec
, esub
, etc, they should also be installed in this directory.
This defines one or more LSF server hosts that the application should contact to get in touch with a Load Information Manager (LIM). This is used on client hosts where no LIM is running on the local host. The LSF server hosts are hosts that run LSF daemons and provide load sharing services. Client hosts are hosts that only run LSF commands or applications but do not provide services to any hosts.
If LSF_SERVER_HOSTS
is not defined, the application tries to contact the LIM on the local host.
The host names in LSF_SERVER_HOSTS
must be enclosed in quotes and separated by white space; for example:
LSF_SERVER_HOSTS="hostA hostD hostB"
This is an optional definition.
If all the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain, or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon `:'.
For example, given this definition of LSF_STRIP_DOMAIN
:
LSF_STRIP_DOMAIN=.foo.com:.bar.com
LSF accepts hostA, hostA.foo.com, and hostA.bar.com as names for host hostA, and uses the name hostA in all output. The leading period `.' is required.
The directory where X application default files for LSF products are installed. The LSF commands that use X look in this directory to find the application defaults. Users do not need to set environment variables to use the LSF X applications. The application default files are platform-independent.
The directory where Motif User Interface Definition files are stored. These files are platform specific.
Default: LSF_LIBDIR/uid
The lsf.shared
file contains definitions that are used by all load sharing clusters. This includes lists of cluster names, host types, host models, the special resources available, and external load indices.
The mandatory "Cluster" section defines all cluster names recognized by the LSF system, with one line for each cluster.
The ClusterName keyword is mandatory. All cluster names referenced anywhere in the LSF system must be defined here. The file names of cluster-specific configuration files must end with the associated cluster name.
Begin Cluster
ClusterName
cluster1
cluster2
End Cluster
The mandatory HostType
section lists the valid
host type names in the cluster. Each host is assigned a host type in the lsf.
cluster.
cluster
file. All hosts that can run the same binary programs should have the
same host type, even if they have different models of processor. LSF uses the
host type as a default requirement for task placement. Unless specified otherwise,
jobs are always run on hosts of the same type.
The TYPENAME keyword is mandatory. Host types are usually based on a combination of the hardware name and operating system. If a job does not have a resource requirement specified, LSF runs the job on a host of the same type as the submission host, so you should give careful consideration to the host type for each host in the cluster. If your site already has a system for naming host types, you can use the same names for LSF.
Begin HostType
TYPENAME
SUN41
NT86
ALPHA
HPPA
End HostType
The mandatory HostModel
section lists the various models of machines and gives the relative CPU speed for each model. LSF uses the relative CPU speed to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The MODELNAME
and CPUFACTOR
keywords are mandatory.
It is up to you to identify the different host models in your system, but generally you need to identify first the distinct host types, such as HPPA and SPARC, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.
Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system, and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.
Begin HostModel
MODELNAME CPUFACTOR
SparcIPC 1.0
Sparc10 2.0
End HostModel
The CPU factor affects the calculation of job execution time limits and accounting. Using large values for the CPU factor can cause confusing results when CPU time limits or accounting are used.
The optional "Resource" section contains a list of resource names. Resource names are character strings chosen by the LSF administrator. You can use any name other than the reserved resource names. The keywords RESOURCENAME
and DESCRIPTION
are mandatory.
For a more general discussion of boolean resources, see Section 4, `Resources', beginning on page 45 of the LSF JobScheduler User's Guide.
Resource names must be strings of numbers and letters, beginning with a letter and no more than 29 characters long. You can define up to 32 resource names in lsf.shared
.
This sample Resource
section defines boolean resources to represent processor types, operating system versions, and software licenses:
Begin Resource
RESOURCENAME DESCRIPTION
sparc (Sparc CPU)
sunos4 (Running SunOS 4.x)
solaris (Running Solaris 2.x)
frame (FrameMaker license)
End Resource
This is the load-sharing cluster configuration file. There is one such file for each load sharing cluster in the system. The cluster suffix must agree with the name defined in the Cluster
section of the lsf.shared
file.
The Parameters
section is optional. This section contains miscellaneous parameters for the LIM.
The PRODUCTS
line specifies which LSF products will be enabled in the cluster. The PRODUCT
can specify any combination of the strings `LSF_Base', `LSF_Batch', `LSF_JobScheduler', `LSF_MultiCluster', and `LSF_Analyzer' to enable the operation of these products. If any of `LSF_Batch', `LSF_JobScheduler', or `LSF_MultiCluster' are specified, then `LSF_Base' is automatically enabled as well. Specifying the PRODUCTS
line enables the product for all hosts in the cluster. Individual hosts can be configured to run as LSF JobScheduler servers or LSF Batch servers within the same cluster. LSF MultiCluster is either enabled or disabled for multicluster operation for the entire cluster.
The PRODUCTS
line is created automatically by the installation program lsfsetup
. For example:
Begin Parameters
PRODUCTS=LSF_Base LSF_JobScheduler
End Parameters
If the PRODUCTS
line is not specified, the default is to enable the operation of LSF Base and LSF Batch.
The products defined by the PRODUCTS
line must match the license file used to serve the cluster. A host will be unlicensed if the license is unavailable for the component it was configured to run. For example, if you configure a cluster to run LSF JobScheduler on all hosts, and the license file does not contain the LSF JobScheduler product, then the hosts will be unlicensed, even if there are licenses for LSF Base or LSF Batch.
The ClusterAdmins
section defines the LSF administrator(s) for this cluster. Both UNIX user and group names may be specified with the ADMINISTRATORS k
eyword. The LIM will expand the definition of a group name using the getgrnam
(3
) call. The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/
cluster. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/
cluster
are changed as well. All LSF administrators have the same authority to perform actions on LSF daemons, jobs, queues, or hosts in the system.
For backwards compatibility, ClusterManager
and Manager
are synonyms for ClusterAdmins
and ADMINISTRATOR
respectively. It is possible to have both sections present in the same lsf.
cluster.
cluster
file to allow daemons from different LSF versions to share the same file.
If this section is not present, the default LSF administrator is root. For flexibility, each cluster may have its own LSF administrator(s), identified by a username, although the same administrator(s) can be responsible for several clusters.
The ADMINISTRATOR
parameter is normally set during the installation procedure.
Use the -l
option of the lsclusters(1)
command to display all the administrators within a cluster.
The following gives an example of a cluster with three LSF administrators. The user listed first, user2, is the primary administrator.
Begin ClusterAdmins
ADMINISTRATORS = user2 lsfgrp user7
End ClusterAdmins
The Host
section is the last section in lsf.
cluster.
cluster
and is the only required section. It lists all the hosts in the cluster and
gives configuration information for each host.
The order in which the hosts are listed in this section is important. The LIM on the first host listed becomes the master LIM if this host is up; otherwise, that on the second becomes the master if its host is up, and so on.
Since the master LIM makes all placement decisions for the cluster, you want it on a fast machine. Also, to avoid the delays involved in switching masters if the first machine goes down, you want the master to be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids the situation where the second host takes over the master when there are communication problems between subnets.
Configuration information is of two types. Some fields in a host entry simply describe the machine and its configuration. Other fields set thresholds for various resources. Both types are listed below.
The HOSTNAME
, model
, type
, and RESOURCES
fields must be defined in the Host
section. The server
, nd
, RUNWINDOW
and REXPRI
fields are optional.
HOSTNAME
- t
he official name of the host as returned by hostname
(1
). Must be listed in lsf.shared
as belonging to this cluster.
model
- h
ost model. Must be one of those defined in the lsf.shared
file. This determines the CPU speed scaling factor applied in load and placement calculations.
type
- a host type as defined in the HostType
section of lsf.shared
. The strings used for host types are decided by the system administrator, e.g. SPARC
, DEC
, HPPA
. The host type is used to identify binary-compatible hosts.
The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request then the task is run on a host of the same type as the sending host.
Often one host type can be used for many machine models. For example, the host type name SUN41
might be used for any computer with a SPARC processor running SunOS 4.1. This would include many Sun models and quite a few from other vendors as well.
server
- 1 if the host can receive jobs from other hosts; 0 otherwise. If server
is set to 0, the host is an LSF client. Client hosts do not run the LSF daemons. Client hosts can submit jobs to an LSF cluster, but cannot execute jobs sent from other hosts. If this field is not defined, then the default is 1.
RESOURCES
- boolean resources available on this host. The resource names are strings defined in the Resource section of the file lsf.shared
. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs. For example, (fs frame hpux)
.
This file allows a list of permitted users to perform certain privileged operations in the LSF cluster as either the superuser or any other designated user. This file is optional.
The lsf.sudoers
file must be located in /etc
and it must be owned by root.
The format of this file is very similar to that of the lsf.conf
file (see `The lsf.conf File' on page 75). Each line of the file is a NAME=VALUE
statement, where NAME
describes an authorized operation and VALUE
is a single string or multiple strings enclosed in quotes. Lines starting with `#
' are comments and are ignored.
The currently recognized variables in this file include:
This parameter is used to enable a list of specified users to start up LSF daemons as root
using the LSF administrative commands lsadmin
and badmin
.
By default, the superuser is the only user who can start up the LSF as
root
.
Note that lsadmin
and badmin
must be installed as setuid root programs for this to work. Possible values for this variable include:
all_admins
- this allows all LSF administrators
configured in the lsf.
cluster.
cluster
file to start up LSF daemons as root
by running the lsadmin
and badmin
commands.
user1 user2
- this allows listed user(s) to perform the startup operations. If this list contains more than one user, it must be enclosed with quotes. For example:
LSF_STARTUP_USERS="user1 user2"
Defining LSF_STARTUP_USERS
as all_admins
incurs some security risk because administrators can be configured by a primary LSF administrator who is not root. You should explicitly list the login names of all authorized administrators here so that you have full control of who can start daemons as root.
The absolute pathname of the directory where the server binaries, namely, lim
, res
, sbatchd
, are installed. This is normally LSF_SERVERDIR
as defined in your lsf.conf
file. LSF will allow the users defined in LSF_STARTUP_USERS
to start the daemons installed in the LSF_STARTUP_PATH
directory as root
.
Both LSF_STARTUP_USERS
and LSF_STARTUP_PATH
must be defined for this feature to work.
This parameter defines the authorized user for the LSF JobScheduler queue level pre-execution and post-execution commands. These commands can be configured at the queue level by the LSF administrator. If LSB_PRE_POST_EXEC_USER
is defined, the queue level pre-execution and post-execution commands will be run as the user defined. If this parameter is not defined, the commands will be run as the user who submitted the job. In particular, you can define this variable if you need to run commands as root
.
See `Pre- and Post-Execution Commands' on page 17 for details of pre-execution and post-execution.
You can only define a single user name in this parameter.
This defines the user name to run the external authentication executable, eauth
. If this is parameter is not defined, then eauth
will be run as the primary LSF administrator.
This defines the user name to run the external execution command, eexec
. If this parameter is not defined, then eexec
will be run as the user who submitted the job.