[Contents] [Index] [Top] [Bottom] [Prev] [Next]


2. Managing LSF Base

This chapter describes the operation, maintenance and tuning of an LSF Base cluster. The correct operation of LSF Base is essential to LSF JobScheduler. This chapter should be read by all LSF cluster administrators.

Managing Error Logs

Error logs contain important information about daemon operations. When you see any abnormal behavior related to any of the LSF daemons, you should check the relevant error logs to find out the cause of the problem.

LSF log files grow over time. These files should occasionally be cleared, either by hand or using automatic scripts. You can also define a calendar-driven job to do the cleanup regularly.

LSF JobScheduler Daemon Error Log

All LSF log files are reopened each time a message is logged, so if you rename or remove a log file of an LSF daemon, the daemons will automatically create a new log file.

The LSF daemons log messages when they detect problems or unusual situations. The daemons can be configured to put these messages into files.

The daemons can also be configured to send error messages to the system error logs using the syslog facility.

If LSF_LOGDIR is defined in the lsf.conf file, LSF daemons try to store their messages in files in that directory. Note that LSF_LOGDIR must be writable by root. The error log file names for the LSF Base system daemons, LIM and RES, are lim.log.hostname, and res.log.hostname.

The error log file names for LSF JobScheduler daemons are sbatchd.log.hostname, mbatchd.log.hostname, pim.log.hostname, and eeventd.log.hostname.

LSF daemons log error messages in different levels so that you can choose to log all messages or only log messages that are critical enough. This is controlled by parameter LSF_LOG_MASK in the lsf.conf. Possible values for this parameter are discussed in `LSF_LOG_MASK' on page 80.

If LSF_LOGDIR is defined but the daemons cannot write to files there, the error log files are created in /tmp.
If LSF_LOGDIR is not defined, then errors are logged to syslog using the LOG_DAEMON facility. syslog messages are highly configurable, and the default configuration varies widely from system to system. Start by looking for the file /etc/syslog.conf, and read the manual pages for syslog and/or syslogd.
If LSF daemons cannot find the lsf.conf file when they start, they will not find the definition of LSF_LOGDIR. In this case, error messages go to syslog. If you cannot find any error messages in the log files, they are likely in the syslog.
If LSF_LOGDIR is defined but the daemons cannot write to files there, the error log files are created in C:\TEMP.
LSF_LOGDIR must be defined, or all error messages will be lost.

FLEXlm Log

The FLEXlm license server daemons log messages about the state of the license servers, and when licenses are checked in or out. This log helps to resolve problems with the license servers and to track license use.

The FLEXlm log is configured by the lsflicsetup command. This log file grows over time. You can remove or rename the existing FLEXlm log file at any time. The script lsf_license uses to run the FLEXlm daemons creates a new log file when necessary.

Note

If you already have FLEXlm server running for other products and LSF JobScheduler licenses are added to the existing license file, then the log messages for FLEXlm should go to the same place as you previously set up for other products.

Controlling LIM and RES Daemons

The LSF cluster administrator can monitor the status of the hosts in a cluster, start and stop the LSF daemons, and reconfigure the cluster. Many operations are done using the lsadmin command, which performs administrative operations on LSF Base daemons, LIM and RES.

Checking Host Status

The lshosts and lsload commands report the current status and load levels of hosts in an LSF cluster. The lsmon and xlsmon commands provide a running display of the same information. The LSF administrator can find unavailable or overloaded hosts with these tools.

lsload
HOST_NAME  status  r15s   r1m  r15m   ut    pg   ls   it   tmp   swp   mem
hostD          ok   1.3   1.2   0.9  92%   0.0    2   20    5M  148M   88M
hostB         -ok   0.1   0.3   0.7  0%    0.0    1   67   45M   25M   34M
hostA        busy   8.0  *7.0   4.9  84%   4.6    6   17    1M   81M   27M

When the status of a host is proceeded by a `-', it means RES is not running on that host. In the above example, RES on hostB is down.

Restarting LIM and RES

LIM and RES can be restarted to upgrade software or clear persistent errors. Jobs running on the host are not affected by restarting the daemons. The LIM and RES daemons are restarted using the lsadmin command:

% lsadmin
lsadmin>limrestart hostD
Checking configuration files ...
No errors found.
Restart LIM on <hostD> ...... done
lsadmin>resrestart hostD
Restart RES on <hostD> ...... done
lsadmin>quit

Note

You must login as the LSF cluster administrator to run the lsadmin command.

The lsadmin command can be applied to all available hosts by using the host name all, as follows:

% lsadmin limrestart all

If a daemon is not responding to network connections lsadmin displays an error message with the host name. In this case you must kill and restart the daemon by hand.

Remote Startup of LIM and RES

LSF administrators can start up any, or all, LSF daemons, on any, or all, LSF hosts, from any host in the LSF JobScheduler cluster. For this to work, file /etc/lsf.sudoers has to be set up properly to allow you to start up daemons as root and you should be able to run rsh across LSF hosts without having to enter a password. See `The lsf.sudoers File' on page 89 for configuration details of lsf.sudoers.

The `limstartup' and `resstartup' options in lsadmin allow for the startup of the LIM and RES daemons respectively. Specifying a host name allows for starting up a daemon on particular host. For example,

% lsadmin limstartup hostA
Starting up LIM on <hostA> ...... done % lsadmin resstartup hostA
Starting up RES on <hostA> ...... done

The lsadmin command can be used to start up all available hosts by using the host name `all'; for example, `lsadmin limstartup all'. All LSF daemons, including LIM, RES, and sbatchd, can be started on all LSF hosts using the command lsfstartup.

Shutting down LIM and RES

All LSF daemons can be shut down at any time. If the LIM daemon on the current master host is shut down, another host automatically takes over as master. If the RES daemon is shut down while remote interactive tasks are running on the host, the running tasks continue but no new tasks are accepted. To shutdown LIM and RES, use lsadmin command:

% lsadmin
lsadmin>limshutdown hostD
Shut down LIM on <hostD> ...... done
lsadmin>resshutdown hostD
Shut down RES on <hostD> ...... done
lsadmin>quit

You can run lsadmin reconfig while the LSF system is in use; users may be unable to submit new jobs for a short time, but all current remote executions are unaffected.

Managing LSF Configuration

Overview of LSF Configuration Files

LSF configuration consists of several levels:

lsf.conf

This is the generic LSF environment configuration file. This file defines general installation parameters so that all LSF executables can find the necessary information. This file is typically installed in the directory in which all LSF server binaries are installed, and a symbolic link is made from a convenient directory as defined by the environment variable LSF_ENVDIR, or the default directory /etc. This file is created by the lsfsetup during installation. Note that many of the parameters in this file are machine specific. Detailed contents of this file are described in `The lsf.conf File' on page 75.

LIM Configuration Files

LIM is the kernel of your cluster that provides the single system image to all applications. LIM reads the LIM configuration files and determines your cluster and the cluster master host.

LIM files include lsf.shared, and lsf.cluster.cluster, where cluster is the name of your LSF JobScheduler cluster. These files define the host members, general host attributes, and resource definitions for your cluster. The individual functions of each of the files are described below.

lsf.shared defines the available resource names, host types, host models, cluster names and external load indices that can be used by all clusters. This file is shared by all clusters.

lsf.cluster.cluster file is a per cluster configuration file. It contains two types of configuration information: cluster definition information and LIM policy information. Cluster definition information impacts all LSF applications, while LIM policy information impacts applications that rely on LIM's policy for job placement.

The cluster definition information defines cluster administrators, all the hosts that make up the cluster, attributes of each individual host such as host type, host model, and resources using the names defined in lsf.shared.

LIM policy information defines the load sharing and job placement policies provided by LIM. More details about LIM policies are described in `Controlling LIM and RES Daemons' on page 25.

LIM configuration files are stored in directory LSF_CONFDIR as defined in lsf.conf file. Details of LIM configuration files are described in `The lsf.shared File' on page 83.

LSF JobScheduler Configuration Files

These files define LSF JobScheduler specific configuration such as queues and server hosts. These files are only read by mbatchd. The LSF JobScheduler configuration relies on LIM configuration. LSF JobScheduler daemons get the cluster configuration information from the LIM via the LSF API.

LSF JobScheduler configuration files are stored in directory LSB_CONFDIR/cluster, where LSB_CONFDIR is defined in lsf.conf, and cluster is the name of your cluster. Details of LSF JobScheduler configuration files are described in Section 5, `LSF JobScheduler Configuration Reference', beginning on page 93.

Configuration File Formats

All configuration files except lsf.conf use a section-based format. Each file contains a number of sections. Each section starts with a line beginning with the reserved word Begin followed by a section name, and ends with a line beginning with the reserved word End followed by the same section name. Begin, End, section names and keywords are all case insensitive.

Sections can either be vertical or horizontal. A horizontal section contains a number of lines, each having the format: keyword = value, where value is one or more strings. For example:

Begin exampleSection
key1 = string1
key2 = string2 string3
key3 = string4
End exampleSection Begin exampleSection
key1 = STRING1
key2 = STRING2 STRING3
End exampleSection

In many cases you can define more than one object of the same type by giving more than one horizontal section with the same section name.

A vertical section has a line of keywords as the first line. The lines following the first line are values assigned to the corresponding keywords. Values that contain more than one string must be bracketed with `(' and `)'. The above examples can also be expressed in one vertical section:

Begin exampleSection
key1     key2               key3
string1  (string2 string3)  string4
STRING1  (STRING2 STRING3)  -
End exampleSection

Each line in a vertical section is equivalent to a horizontal section with the same section name.

Some keys in certain sections are optional. For a horizontal section, an optional key does not appear in the section if its value is not defined. For a vertical section, an optional keyword must appear in the keyword line if any line in the section defines a value for that keyword. To specify the default value use `-' or `()' in the corresponding column, as shown for key3 in the example above.

Each line may have multiple columns, separated by either spaces or TAB characters. Lines can be extended by a `\' (back slash) at the end of a line. A `#' (pound sign) indicates the beginning of a comment; characters up to the end of the line are not interpreted. Blank lines are ignored.

Example Configuration Files

Below are some examples of LIM configuration and LSF JobScheduler configuration files. Detailed explanations of the variables are given in Section 4, `LSF Base Configuration Reference', beginning on page 75.

Sample lsf.shared file

Begin Cluster
ClusterName                          # This line is keyword(s)
test_cluster
End Cluster
Begin HostType
TYPENAME                             # This line is keyword(s)
hppa
SUNSOL
sgi
rs6000
alpha
NTX86
End HostType
Begin HostModel
MODELNAME          CPUFACTOR          # This line is keyword(s)
HP735              4.0
DEC3000            5.0
ORIGIN2K           8.0
PENTI120           3.0
End HostModel
Begin Resource
RESOURCENAME        DESCRIPTION        #This line is keyword(s)
hpux                (HP-UX operating system)
decunix             (Digital Unix)               
solaris             (Sun Solaris operating system)
NT                  (Windows NT operating system)
fserver             (File Server)
cserver             (Compute Server)  
End Resource Sample lsf.cluster.test_cluster file: Begin ClusterManager
Manager = lsf user7
End ClusterManager
Begin Host
HostName   Model      Type     server    swp    Resources 
hostA      HP735      hppa     1         2     (fserver hpux)
hostD      ORIGIN2K   sgi      1         2     (cserver)
hostB      PENT200    NTX86    1         2     (NT)
End Host

Changing LIM Configuration

This section gives procedures for some common changes to the LIM configuration. There are different ways for you to change LIM configuration:

The following procedures focus on changing configuration files using an editor so that you can understand the concepts behind the configuration changes.

Adding a Host to a Cluster

To add a host to an existing LSF JobScheduler cluster, use the following procedure.

  1. If you are adding a host of a new host type, make sure you do the steps described in `Installing an Additional Host Type' on page 73 of the LSF Installation Guide first.
  2. If you are adding a host of a type for which you already installed LSF binaries, make sure that the LSF binaries, configuration files, and working directories are NFS-mounted on the new host. For each new host you add, follow the host setup procedure as described in `Adding an Additional Host to an Existing Cluster' on page 79 in the LSF Installation Guide.
  3. If you are adding a new host type to the cluster, modify the "HostType" section of the lsf.shared file to add the new host type. A host type can be any alphanumeric string up to 29 characters long.
  4. If you are adding a new host model, modify the "HostModel" section of your lsf.shared file to add in the new model together with its CPU speed factor relative to other models.
  5. For each host you add into the cluster, you should add a line to the "Host" section of the lsf.cluster.cluster file with host name, host type, and all other attributes defined, as shown in `Example Configuration Files' on page 31.

    The master LIM and mbatchd daemons run on the first available host in the "Host" section of your lsf.cluster. clusterfile, so you should list reliable batch server hosts first. For more information, see `Fault Tolerance' on page 5.

If you are adding a client host, set the SERVER field for the host to 0 (zero).

  1. Reconfigure your LSF JobScheduler cluster so that LIM knows that you have added a new host to the cluster. Follow the instructions in `Reconfiguring an LSF Cluster' on page 39. If you are adding more than one host, do this step after you have done step 1 to 5 for all added hosts.
  2. If you are adding hosts as LSF JobScheduler server hosts, add these hosts to LSF JobScheduler configuration by following steps described in `Restarting sbatchd' on page 56.
  3. Start the LSF daemons on the newly added host(s) by running the following command, and using ps to make sure that res, lim and sbatchd have started:
LSF_SERVERDIR/lsf_daemons start

CAUTION!

LSF daemons start must be run as root. If you are creating a private cluster, do not attempt to use lsf_daemons to start your daemons. Start them manually.

Removing Hosts From a Cluster

To remove a host from an existing LSF JobScheduler cluster, use the following procedure.

  1. If you are running LSF JobScheduler, make sure you remove unwanted hosts from the LSF JobScheduler first following steps described in `Restarting sbatchd' on page 56.
  2. Edit your lsf.cluster.cluster file and remove the unwanted hosts from the "Host" section.
  3. Log in to any host in the cluster as the LSF JobScheduler administrator. Run the following command:

    lsadmin resshutdown  host1 host2 ...

    Here, host1, host2, ... are hosts you want to remove from your cluster.

  4. Follow instructions in `Reconfiguring an LSF Cluster' on page 39 to reconfigure your LSF JobScheduler cluster. The LIMs on the removed hosts will quit upon reconfiguration.
  5. Remove the LSF section from the host's system startup files. This undoes what you have done previously to start LSF daemons at boot time. See `Starting LSF Servers at Boot Time' on page 86 in the LSF Installation Guide for details.

Host Resources

Your cluster is most likely heterogeneous. Even if your computers are all the same, it may still be heterogeneous. For example, some machines are configured as file servers, while others are compute servers; some have more memory, others have less, some have four CPUs, others have only one; some have host-locked software licenses installed, others do not. LSF JobScheduler provides powerful resource selection mechanisms so that correct hosts with required resources are chosen to run your jobs.

Customizing Host Resources

For maximum flexibility, you should characterize your resources clearly enough so that users have enough choices. For example, if some of your machines are connected to both Ethernet and FDDI, while others are only connected to Ethernet, then you probably want to define a resource called fddi and associate the fddi resource to machines connected to FDDI. This way, users can specify resource fddi if they want their jobs to run on machines connected to FDDI.

To customize host resources for your cluster, use the following procedure.

  1. Log in to any host in the cluster as the LSF JobScheduler administrator.
  2. Define new resource names by modifying the "Resource" section of the lsf.shared file. Add a brief description to each of the added resource names. Resource descriptions will be displayed to a user by lsinfo command.
  3. If you want to associate added resource names to an application, edit lsf.task file properly to reflect the resource into the resource requirements of the application. Alternatively, you can leave this to individual users who can use lsrtasks command to customize his/her own file.
  4. Edit the lsf.cluster.cluster file to modify the RESOURCES column of the "Host" section so that all hosts that have the added resources will now have the added resource names in that column.
  5. Follow instructions in `Reconfiguring an LSF Cluster' on page 39 to reconfigure your LSF JobScheduler cluster.

Configuring Resources in LSF Base

Resources are defined in the "Resource" section of the lsf.shared file. The definition of a resource involves specifying a name and description, as well as, optionally, the type of its value, its update interval, and whether a higher or lower value indicates greater availability.

The mandatory resource information fields are:

The optional resource information fields are:

When the optional attributes are not specified, the resource is treated as static and boolean-valued.

The following is a sample of a "Resource" section from an lsf.shared file:

Begin Resource
RESOURCENAME  TYPE    INTERVAL  INCREASING  DESCRIPTION
mips          Boolean ()        ()          (MIPS architecture)
dec           Boolean ()        ()          (DECStation system)
sparc         Boolean ()        ()          (SUN SPARC)
hppa          Boolean ()        ()          (HPPA architecture)
bsd           Boolean ()        ()          (BSD unix)
sysv          Boolean ()        ()          (System V UNIX)
hpux          Boolean ()        ()          (HP-UX UNIX)
aix           Boolean ()        ()          (AIX UNIX)
nt            Boolean ()        ()          (Windows NT)
scratch       Numeric 30        N           (Shared scratch space on server)
synopsys      Numeric 30        N           (Floating licenses for Synopsys)
verilog       Numeric 30        N           (Floating licenses for Verilog)
console       String  30        N           (User Logged in on console)
End Resource

There is no distinction between shared and non-shared resources in the resource definition in the lsf.shared file.

Note

The NewIndex section in the lsf.shared file is obsolete. To achieve the same effect, the "Resource" section of the lsf.shared file can be used to define a dynamic numeric resource, and the "default" keyword can be used in the LOCATION field of the "ResourceMap" section of the lsf.cluster. clusterfile.

Associating Resources with Hosts

Resources are associated with the host(s) on which they are available in the "ResourceMap" section of the lsf.cluster.cluster file (where cluster is the name of the cluster). The following fields must be completed for each resource:

The following is an example of a "ResourceMap" section from an lsf.cluster.clusterfile:

Begin ResourceMap
RESOURCENAME   LOCATION
verilog        5@[all]
synopsys       (2@[apple] 2@[others])
console        (1@[apple] 1@[orange])
End ResourceMap

The possible states of a resource that may be specified in the LOCATION column are:

For static resources, the LOCATION column should contain the value of the resource.

The syntax of the information in the fields of the LOCATION column takes one of two forms. For static resources, where the value must be specified, use:

(value1@[host1 host2 ...] value2@[host3 host4] ...)

For dynamic resources, where the value is updated by an ELIM, use:

([host1 host2 ...] [host3 host4 ...] ...)

Each set of hosts listed within the square brackets specifies an instance of the resource. All hosts within the instance share the resource whose quantity is indicated by its value. In the above example, host1, host2,... form one instance of the resource, host3, host4,... form another instance and so on.

Note

The same host cannot be in more than one instance of a resource.

Three pre-defined words have special meaning in this specification:

These syntax examples assume that static resources (requiring values) are being specified. For dynamic resources, use the same syntax but omit the value.

The following items should be taken into consideration when configuring resources under LSF Base.

In the lsf.cluster.cluster file, the "Host" section must precede the "ResourceMap" section since the "ResourceMap" section uses the hostnames defined in the "Host" section.

If the "ResourceMap" section is not defined, then any dynamic resources specified in lsf.shared are considered to be host-based (i.e the resource is available on each host in the cluster).

Reconfiguring an LSF Cluster

After changing LIM configuration files you must tell LIM to read the new configuration. Use the lsadmin command to tell LIM to pick up the new configuration.

Operations can be specified on the command line or entered at a prompt. Run the lsadmin command with no arguments, and enter help to see the available operations.

The lsadmin reconfig command checks the LIM configuration files for errors. If no errors are found, the command confirms that you want to restart the LIMs on all hosts, and reconfigures all the LIM daemons:

% lsadmin reconfig
Checking configuration files ...
No errors found.
Do you really want to restart LIMs on all hosts? [y/n] y
Restart LIM on <hostD> ...... done
Restart LIM on <hostA> ...... done
Restart LIM on <hostC> ...... done


In the above example no errors are found. If any non-fatal errors are found, the command asks you to confirm the reconfiguration. If fatal errors are found, the reconfiguration is aborted.

If you want to see details on any errors, run the command lsadmin ckconfig -v. This reports all errors to your terminal.

If you change the configuration file of LIM, you should also reconfigure LSF JobScheduler by running badmin reconfig because LSF JobScheduler depends on LIM configuration. If you change the configuration of LSF JobScheduler, then you only need to run badmin reconfig.

External Resource Collection

The values of static external resources are specified through the lsf.cluster. cluster file. All dynamic resources, regardless of whether they are shared or host-based, are collected through an ELIM. An ELIM is started in the following situations:

Note that a maximum of one ELIM is started on each host, regardless of the type of resources on which it reports. If only cluster-wide resources are being used, then an ELIM will only be started on the master host. In order to simply write a single ELIM for all hosts which reports on a combination of shared and non-shared resources, the following variables must be set in the ELIM's environment:

Restrictions

The following restrictions apply to the use of shared resources in LSF products.

Writing an External LIM

The ELIM can be any executable program, either an interpreted script or compiled code. Example code for an ELIM is included in the misc directory in the LSF distribution. The elim.c file is an ELIM written in C. You can customize this example to collect the load indices you want.

The ELIM communicates with the LIM by periodically writing a load update string to its standard output. The load update string contains the number of indices followed by a list of name-value pairs in the following format:

N name1 value1 name2 value2 ... nameN valueN

For example:

3 tmp2 47.5 nio 344.0 licenses 5

This string reports 3 indices: tmp2, nio, and licenses, with values 47.5, 344.0, and 5 respectively. Index values must be numbers between -INFINIT_LOAD and INFINIT_LOAD as defined in the lsf.h header file.

If the ELIM is implemented as a C program, as part of initialization it should use setbuf(3) to establish unbuffered output to stdout.

The ELIM should ensure that the entire load update string is written successfully to stdout. This can be done by checking the return value of printf(3s) if the ELIM is implemented as a C program or the return code of /bin/echo(1) from a shell script. The ELIM should exit if it fails to write the load information.

Each LIM sends updated load information to the master every 15 seconds. Depending on how quickly your external load indices change, the ELIM should write the load update string once every 15 seconds at most. If the external load indices rarely change, the ELIM can write the new values only when a change is detected. The LIM continues to use the old values until new values are received.

The executable for the ELIM must be in LSF_SERVERDIR and must have the name `elim'. If any external load indices are defined in the LIM configuration file, the LIM invokes the ELIM automatically on startup. The ELIM runs with the same user id and file access permission as the LIM.

The LIM restarts the ELIM if it exits; to prevent problems in case of a fatal error in the ELIM, it is restarted once every 90 seconds at most. When the LIM terminates, it sends a SIGTERM signal to the ELIM. The ELIM must exit upon receiving this signal.

Overriding Built-In Load Indices

The ELIM can also return values for the built-in load indices. In this case the value produced by the ELIM overrides the value produced by the LIM. The ELIM must ensure that the semantics of any index it supplies is the same as that of the corresponding index returned by the lsinfo(1) command.

For example, some sites prefer to use /usr/tmp for temporary files. To override the tmp load index, write a program that periodically measures the space in the /usr/tmp file system, and writes the value to standard output. Name this program elim and put it in the LSF_SERVERDIR directory.

Note

The name of an external load index must not be one of the resource name aliases cpu, idle, logins, or swap. To override one of these indices, use its formal name: r1m, it, ls, or swp.

You must configure the external load index even if you are overriding a built-in load index.

Tuning CPU Factors

CPU factors are used to differentiate the relative speed of different machines. LSF JobScheduler runs jobs on the best possible machines so that the response time is minimized. To achieve this, it is important that you define correct CPU factors for each machine model in your cluster by changing the "HostModel" section of your lsf.shared file.

CPU factors should be set based on a benchmark that reflects your work load. (If there is no such benchmark, CPU factors can be set based on raw CPU power.) The CPU factor of the slowest hosts should be set to one, and faster hosts should be proportional to the slowest. For example, consider a cluster with two hosts, hostA and hostB, where hostA takes 30 seconds to run your favourite benchmark and hostB takes 15 seconds to run the same test. hostA should have a CPU factor of 1, and hostB (since it is twice as fast) should have a CPU factor of 2.

LSF JobScheduler uses a normalized CPU performance rating to decide which host has the most available CPU power. The normalized ratings can be seen by running the lsload -N command. The hosts in your cluster are displayed in order from best to worst. Normalized CPU run queue length values are based on an estimate of the time it would take each host to run one additional unit of work, given that an unloaded host with CPU factor 1 runs one unit of work in one unit of time.

Incorrect CPU factors can reduce performance in two ways. If the CPU factor for a host is too low, that host may not be selected for job placement when a slower host is available. This means that jobs would not always run on the fastest available host. If the CPU factor is too high, jobs are run on the fast host even when they would finish sooner on a slower but lightly loaded host. This causes the faster host to be overused while the slower hosts are underused.

Both of these conditions are self-correcting to some extent. If the CPU factor for a host is too high, jobs are sent to that host until the CPU load threshold is reached. The LIM then marks that host as busy, and no further jobs will be sent there. If the CPU factor is too low, jobs may be sent to slower hosts. This increases the load on the slower hosts, making LSF JobScheduler more likely to schedule future jobs on the faster host.

LSF License Management

LSF software is licensed using the FLEXlm license manager from Globetrotter Software, Inc. The LSF license key controls the hosts allowed to run LSF.

The procedures for obtaining, installing and upgrading license keys are described in `Getting License Key Information' on page 34 of, and `Setting Up the License Key' on page 36 of the LSF Installation Guide.
Information on installing and upgrading license keys can be found in `License Installation Options' on page 56 of the LSF Installation Guide.

FLEXlm controls the total number of hosts configured in all your LSF clusters. You can organize your hosts into clusters in whatever fashion you choose. Each server host requires at least one license. Multiprocessor hosts require more than one, as a function of the number of processors. Each client host requires 1/5 of a license.

LSF uses two kinds of FLEXlm license: file-based DEMO licenses and server-based permanent licenses.

DEMO Licenses

The DEMO license allows you to try LSF out on an unlimited number of hosts on any supported host type. The trial period has a fixed expiry date, and the LSF software will not function after that date. DEMO licenses do not require any additional daemons.

Permanent Licenses

Permanent licenses are the most common. A permanent license limits only the total number of hosts that can run the LSF software, and normally has no time limit. You can choose which hosts in your network will run LSF, and how they are arranged into clusters. Permanent licenses are counted by a license daemon running on one host on your network.

For permanent licenses, you need to choose a license server host and send hardware host identification numbers for the license server host to your software vendor. The vendor uses this information to create a permanent license that is keyed to the license server host. Some host types have a built-in hardware host ID; on others, the hardware address of the primary LAN interface is used.

How FLEXlm Works

FLEXlm is used by many UNIX software packages because it provides a simple and flexible method for controlling access to licensed software. A single FLEXlm license server can handle licenses for many software packages, even if those packages come from different vendors. This reduces the systems administration load, since you do not need to install a new license manager every time you get a new package.

The License Server Daemon

FLEXlm uses a daemon called lmgrd to manage permanent licenses. This daemon runs on one host on your network, and handles license requests from all applications. Each license key is associated with a particular software vendor. lmgrd automatically starts a vendor daemon; the LSF version is called lsf_ld and is provided by Platform Computing Corporation. The vendor daemon keeps track of all licenses supported by that vendor. DEMO licenses do not require you to run license daemons.

The license server daemons should be run on a reliable host, since licensed software will not run if it cannot contact the license server. The FLEXlm daemons create very little load, so they are usually run on the file server. If you are concerned about availability, you can run lmgrd on a set of three hosts. As long as a majority of the license server hosts are available, applications can obtain licenses.

The License File

The license file is named license.dat.

Location

Software licenses are stored in a text file.

The default location for this license file is /usr/local/flexlm/licenses/license.dat.
The default location for this license file is c:\flexlm\license.dat.

This may be different at your site, depending on what decisions were made when FLEXlm was initially installed.

The license file must be accessible from every host that runs licensed software. Normally, it is most convenient to place the license file in a shared directory. The variable LSF_LICENSE_FILE in the lsf.conf file should point to this location, allowing LSF to locate the license file.

Port@Host Configuration

An alternative to specifying a file pathname in the LSF_LICENSE_FILE variable is to use "port@host" notation to indicate the name of the license server host and port being used by the lmgrd daemon. For example:

LSF_LICENSE_FILE="1700@hostD, 1700@hostC, 1700@hostB"

The port number must be the same as that specified in the license file.

Contents

The license.dat file normally contains:

The FEATURE line contains an encrypted code to prevent tampering. For permanent licenses, the licenses granted by the FEATURE line can be accessed only through license servers listed on the SERVER lines.

For DEMO licenses no FLEXlm daemons are needed, so the license file contains only the FEATURE line.

Sample License Files

This sample DEMO license file contains one line for each separate product (see `Modifying LSF Products and Licensing' on page 49). However, no SERVER or DAEMON information is needed. The license is for LSF version 3.1 and is valid until Jun. 10, 1998.

FEATURE lsf_base lsf_ld 3.100 10-Jun-1998 0 5C51F231E238555BAD7F "Platform" DEMO
FEATURE lsf_jobscheduler lsf_ld 3.100 10-Jun-1998 0 6CC1D2C137651068E23C "Platform" DEMO
FEATURE lsf_jobscheduler_server lsf_ld 3.100 10-Jun-1998 0 6CC1D2C137651068E23C "Platform" DEMO
FEATURE lsf_multicluster lsf_ld 3.100 10-Jun-1998 0 2CC1F2E132C85B8D1806 "Platform" DEMO

In this sample permanent license file, the license server is configured to run on hostD, using TCP port 1700. This allows 10 hosts to run LSF, with no expiry date.

SERVER hostD 08000962cc47 1700 DAEMON lsf_ld /usr/local/lsf/etc/lsf_ld
FEATURE lsf_base lsf_ld 3.100 01-Jan-0000 0 51F2315CE238555BAD7F "Platform"
FEATURE lsf_jobscheduler lsf_ld 3.100 01-Jan-0000 0 C1D2C1376C651068E23C "Platform"
FEATURE lsf_jobscheduler_server lsf_ld 3.100 01-Jan-0000 0 C1D2C1376C651068E23C "Platform"
FEATURE lsf_multicluster lsf_ld 3.100 01-Jan-0000 0 C1F2E1322CC85B8D1806 "Platform"


License Management Utilities


FLEXlm provides several utility programs for managing software licenses. These utilities and their manual pages are included in the LSF software distribution.

Because these utilities can be used to shut down the FLEXlm license server, and thus prevent licensed software from running, they are installed in the LSF_SERVERDIR directory. The file permissions are set so that only root and members of group 0 can use them.

The utilities included are:

lmcksum - calculate check sums of the license key information

lmdown - shut down the FLEXlm server

lmhostid - display the hardware host ID

lmremove - remove a feature from the list of checked out features

lmreread - tell the license daemons to re-read the license file

lmstat - display the status of the license servers and checked out licenses

lmver - display the FLEXlm version information for a program or library

For complete details on these commands, see the on-line manual pages.

Updating an LSF License

FLEXlm only accepts one license key for each feature listed in a license key file. If there is more than one FEATURE line for the same feature, only the first FEATURE line is used. To add hosts to your LSF cluster, you must replace the old FEATURE line with a new one listing the new total number of licenses.

The procedure for updating a license key file to include new license keys is described in `Adding a Permanent License' on page 40 of the LSF Installation Guide.

Changing the FLEXlm Server TCP Port

The fourth field on the SERVER line specifies the TCP port number that the FLEXlm server uses. Choose an unused port number. LSF usually uses port numbers in the range 3879 to 3882, so the numbers from 3883 on are good choices. If the lmgrd daemon complains that the license server port is in use, you can choose another port number and restart lmgrd.

For example, if your license file contains the line:

SERVER hostname host-id 1700

and you want your FLEXlm server to use TCP port 3883, change the SERVER line to:

SERVER hostname host-id 3883

Modifying LSF Products and Licensing

LSF Suite V3.1 includes the following products: LSF Base, LSF Batch, LSF JobScheduler, LSF MultiCluster, LSF Make, and LSF Analyzer.

The configuration changes to enable a particular product in a cluster are handled during installation by lsfsetup. If at some later time you want to modify the products in your cluster, edit the PRODUCTS line in the `Parameters' section of the lsf.cluster.cluster file. You can specify any combination of the strings `LSF_Base', `LSF_Batch', `LSF_JobScheduler', `LSF_MultiCluster', and `LSF_Analyzer'. If any of `LSF_Batch', `LSF_JobScheduler', or `LSF_MultiCluster' are specified, then `LSF_Base' is automatically enabled as well.

If the lsf.cluster.cluster file is shared, adding a product name to the PRODUCTS line enables that product for all hosts in the cluster. For example, enable the operation of LSF Base, LSF Batch and LSF MultiCluster:

Begin Parameters
PRODUCTS=LSF_Base LSF_Batch LSF_MultiCluster
End Parameters

Enable the operation of LSF Base only:

Begin Parameters
PRODUCTS=LSF_Base
End Parameters

Enable the operation of LSF JobScheduler:

Begin Parameters
PRODUCTS=LSF_JobScheduler
End Parameters

Selected Hosts

It is possible to indicate that only certain hosts run LSF Batch or LSF JobScheduler within a cluster. This is done by specifying `LSF_Batch' or `LSF_JobScheduler' in the RESOURCES field of the "Hosts" section of the lsf.cluster.cluster file. For example, the following enables hosts hostA, hostB, and hostC to run LSF JobScheduler and hosts hostD, hostE, and hostF to run LSF Batch.

Begin Parameters
PRODUCTS=LSF_Batch
End Parameters
Begin   Host
HOSTNAME   model   type      server RESOURCES
hostA      SUN41   SPARCSLC  1      (sparc bsd LSF_JobScheduler)
hostB      HPPA9   HP735     1      (linux LSF_JobScheduler)
hostC      SGI     SGIINDIG  1      (irix cs LSF_JobScheduler)
hostD      SUNSOL  SunSparc  1      (solaris)
hostE      HP_UX   A900      1      (hpux cs bigmem)
hostF      ALPHA   DEC5000   1      (alpha)
End Hosts

The license file used to serve the cluster must have the corresponding features. A host will show as unlicensed if the license for the product it was configured to run is unavailable. For example, if a cluster is configured to run LSF JobScheduler on all hosts, and the license file does not contain the LSF JobScheduler product, than the hosts will be unlicensed, even if there are licenses for LSF Base or LSF Batch.



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.