This section describes the procedures that must be used to start up the LSF daemons, test the LSF cluster configuration, and provide LSF to users at your site. These procedures cannot be performed until after the LSF software has been installed and the hosts have been configured individually (see `Default Installation' on page 13 or `Custom Installation' on page 23).
 
Before you can start any LSF daemons, you should make sure that your cluster configuration is correct. The lsfsetup program includes an option to check the LSF configuration. The default LSF configuration should work as it is installed following the steps described in `Default Installation Procedures' on page 14.
lsf.cluster.cluster, 
    (cluster 
    is the name of the cluster) as the LSF administrator.   
  lsadmin ckconfig -v 
The
lsadminprogram is located in theLSF_TOP/bindirectory.
The output should look something like the following:
Checking configuration files ...
LSF v3.1, Sept 10, 1997
Copyright 1992-1997 Platform Computing Corporation
Reading configuration from /etc/lsf.conf
Dec 21 21:15:51 13412 /usr/local/lsf/etc/lim -C
Dec 21 21:15:52 13412 initLicense: Trying to get license for LIM from source
</usr/local/lsf/conf/license.dat>
Dec 21 21:15:52 13412 main: Got 1 licenses
Dec 21 21:15:52 13412 main: Configuration checked. No fatal errors found.---------------------------------------------------------No errors found.
The messages shown above are the normal output from lsadmin ckconfig -v. 
      Other messages may indicate problems with the LSF configuration.
Both LSF Batch and LSF JobScheduler require this check to be made.
To check the LSF Batch configuration files, LIM must be running on the master host.
LSF_SERVERDIR/lim. 
      
  lsid program to make sure LIM 
    is available.   
    The lsid program is located in the LSF_TOP/bin 
      directory.
badmin ckconfig -v
      
    The output should look something like the following:
Checking configuration files ...
Dec 21 21:22:14 13545 mbatchd: LSF_ENVDIR not defined; assuming /etc
Dec 21 21:22:15 13545 minit: Trying to call LIM to get cluster name ...
Dec 21 21:22:17 13545 readHostFile: 3 hosts have been specified in file 
</usr/local/lsf/conf/lsbatch/test_cluster/configdir/lsb.hosts>; only these 
hosts will be used by lsbatch
Dec 21 21:22:17 13545 Checking Done
---------------------------------------------------------
No fatal errors found.
      
  The above messages are normal; other messages may indicate problems with the LSF configuration.
 
The LSF daemons can be started using the lsf_daemons program. This program must be run from the root account, so if you are starting daemons for a private cluster, do not use lsf_daemons: start the daemons manually instead.
lsf_daemons start
  res, lim and sbatchd processes 
    have started using the ps command.   
    If you choose, you can start LSF daemons for all machines using the lsadmin 
      and badmin commands. Do this by executing the following commands 
      in order, instead of using the lsf_daemons command.
lsadmin limstartup lsadmin resstartup badmin hstartup
lsfsetup creates a default LSF Batch configuration (including 
      a set of batch queues) which is used by both LSF Batch and LSF JobScheduler. 
      You do not need to change any LSF Batch files to use the default configuration.
After you have started the LSF daemons in your cluster, you should run some simple tests. Wait a minute or two for all the LIMs to get in touch with each other, to elect a master, and to exchange some setup information.
 
The testing should be performed as a non-root user. This user's PATH must include the LSF user binaries (LSF_BINDIR as defined in LSF_ENVDIR/lsf.conf).
Testing consists of running a number of LSF commands and making sure that correct results are reported for all hosts in the cluster. This section shows suggested tests and examples of correct output. The output you see on your system will reflect your local configuration.
The following steps may be performed from any host in the cluster.
% lsid LSF 3.1, Dec 10, 1997 Copyright 1992-1997 Platform Computing Corporation
My cluster name is test_cluster My master name is hostA
The master name may vary but is usually the first host configured in the 
      Hosts section of the lsf.cluster.cluster 
      file.  
If the LIM is not available on the local host,
lsiddisplays the following message:lsid: ls_getmastername failed: LIM is down; try laterIf the LIM is not running, try running
lsida few more times.
The error message
lsid: ls_getmastername failed: Cannot locate master LIM now, try latermeans that local LIM is running, but the master LIM has not contacted the local LIM yet. Check the LIM on the first host listed in
lsf.cluster.cluster. If it is running, wait for 30 seconds and trylsidagain. Otherwise, another LIM will take over after one or two minutes.
lsinfo command displays cluster-wide configuration 
    information. 
    % lsinfo
RESOURCE_NAME TYPE ORDER DESCRIPTION
r15s Numeric Inc 15-second CPU run queue length
r1m Numeric Inc 1-minute CPU run queue length (alias: cpu)
r15m Numeric Inc 15-minute CPU run queue length
ut Numeric Inc 1-minute CPU utilization (0.0 to 1.0)
pg Numeric Inc Paging rate (pages/second)
ls Numeric Inc Number of login sessions (alias: login)
it Numeric Dec Idle time (minutes) (alias: idle)
tmp Numeric Dec Disk space in /tmp (Mbytes)
mem Numeric Dec Available memory (Mbytes)
ncpus Numeric Dec Number of CPUs
maxmem Numeric Dec Maximum memory (Mbytes)
maxtmp Numeric Dec Maximum /tmp space (Mbytes)
cpuf Numeric Dec CPU factor
type String N/A Host type
model String N/A Host model
status String N/A Host status
server Boolean N/A LSF server host
cserver Boolean N/A Compute Server
solaris Boolean N/A Sun Solaris operating system
fserver Boolean N/A File Server
NT Boolean N/A Windows NT operating system
TYPE_NAME
hppa
SUNSOL
alpha
sgi
NTX86
rs6000 MODEL_NAME CPU_FACTOR
HP735 4.0
ORIGIN2K 8.0
DEC3000 5.0
PENT200 3.0
The resource names, host types, and host models should be those configured 
      in LSF_CONFDIR/lsf.shared.
lshosts command displays configuration information about 
    your hosts: 
    % lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA hppa HP735 4.00 1 128M 256M Yes (fserver hpux)
hostD sgi ORIGIN2K 8.00 32 512M 1024M Yes (cserver)
hostB NTX86 PENT200 3.00 1 96M 180M Yes (NT)
  The output should contain one line for each host 
      configured in the cluster, and the type, model, 
      and RESOURCES should be those configured for that host in lsf.cluster.cluster. 
      cpuf should match the CPU factor given for the host model in 
      lsf.shared. 
% lsload
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostA ok 0.3 0.1 0.0 3% 1.0 1 12 122M 116M 56M
hostD ok 0.6 1.2 2.0 23% 3.0 14 0 63M 698M 344M
hostB ok 0.6 0.3 0.0 5% 0.3 1 0 55M 41M 37M
The output contains one line for each host in the cluster.
If any host has unavail in the status column, 
      the master LIM is unable to contact the LIM on that host. This can occur 
      if the LIM was started recently and has not yet contacted the master LIM, 
      or if no LIM was started on that host, or if that host was not configured 
      correctly.  
If the entry in the status column begins with - 
      (for example, -ok), the RES is not available on that host. 
      RES status is checked every 90 seconds, so allow enough time for STATUS 
      to reflect this.
If all these tests succeed, the LIMs on all hosts are running correctly.
lsgrun command runs a UNIX command on a group of hosts: 
    % lsgrun -v -m "hostA hostD hostB" hostname
<<Executing hostname on hostA>>
hostA
<<Executing hostname on hostD>>
hostD
<<Executing hostname on hostB>>
hostB
If remote execution fails on any host, check the RES error log on that host.
Testing consists of running a number of LSF commands and making sure that correct results are reported for all hosts in the cluster.
bhosts command lists the batch server hosts in the cluster: 
    % bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostD ok - 10 1 1 0 0 0
hostA ok - 10 4 2 2 0 0
hostC unavail - 3 1 1 0 0 0
The STATUS column shows the status of sbatchd 
      on that host. If the STATUS column contains unavail, 
      that host is not available. Either the sbatchd on that host 
      has not started or it has started but has not yet contacted the mbatchd. 
      If hosts are still listed as unavailable after roughly three minutes, check 
      the error logs on those hosts.
See the
bhosts(1)manual page for explanations of the other columns.
% bsub sleep 60
Job <1> is submitted to default queue <normal>
If the job you submitted was the first ever, it should have job ID 1. Otherwise, the number varies.
% bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
interactive 400 Open:Active - - - - 1 1 0 0
fairshare 300 Open:Active - - - - 2 0 2 0
owners 43 Open:Active - - - - 0 0 0 0
priority 43 Open:Active - - - - 29 29 0 0
night 40 Open:Inactive - - - - 1 1 0 0
short 35 Open:Active - - - - 0 0 0 0
normal 30 Open:Active - - - - 0 0 0 0
idle 20 Open:Active - - - - 0 0 0 0
See the bqueues(1) manual page for an explanation of the output.
% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1 fred RUN normal hostA hostD sleep 60 Dec 10 22:44
Note that if all hosts are busy, the job is not started immediately so 
      the STAT column says PEND. This job should take 
      one minute to run. When the job completes, you should receive mail reporting 
      the job completion.
You do not need to read this section if you are not using the LSF MultiCluster product.
LSF MultiCluster unites multiple LSF clusters so that they can share resources transparently, while at the same time, still maintain resource ownership and autonomy of individual clusters.
LSF MultiCluster extends the functionality of a single cluster. Configuration involves a few more steps. First you set up a single cluster as described above, then you need to do some additional steps specific to LSF MultiCluster.
You do not need to read this section if you are not using the LSF JobScheduler product.
LSF JobScheduler provides reliable production job scheduling according to user specified calendars and events. It runs user-defined jobs automatically at the right time, under the right conditions, and on the right machines.
The configuration of LSF JobScheduler is almost the same as that of the LSF Batch cluster, except that you may have to define system-level calendars for your cluster and you might need to add additional events to monitor your site.
When you have finished installing and testing LSF cluster, you can let users try it out. LSF users must add LSF_BINDIR to their PATH environment variables to run the LSF utilities.
 
Users also need access to the on-line manual pages, which were installed in LSF_MANDIR (as defined in lsf.conf) by the lsfsetup installation procedure. For most versions of UNIX, users should add the directory LSF_MANDIR to their MANPATH environment variable. If your system has a man command that does not understand MANPATH, you should either install the manual pages in the /usr/man directory or get one of the freely available man programs.
 
The /etc/lsf.conf file (or LSF_CONFDIR/lsf.conf if you used the Default installation procedure) must be available.
 
You can use the xlsadmin graphical tool to do most of the cluster configuration and management work that has been described in this chapter.