Within a single organization, divisions, departments, or sites may have separate LSF clusters managed independently. Many organizations have realized it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:
LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.
LSF MultiCluster is a separate product in the LSF product suite. You must obtain a specific license for LSF MultiCluster before you can use it.
This chapter describes the configuration and operational details of LSF MultiCluster. The topics covered are:
The following steps should be followed to enable the sharing of load information, interactive tasks and batch jobs between clusters:
lsf.cluster.
cluster
file. Your licence must have multicluster support.
The LIM configuration files lsf.shared
and
lsf.cluster.
cluster
(stored in
LSF_CONFDIR
) are affected by multicluster operation.
For sharing to take place between clusters, they must share common definitions
in terms of host types, models, and resources. For this reason, it is desirable
to make the lsf.shared
file the same on each cluster, often by
putting it into a shared file system, or replicating it across all clusters.
Where it is not possible to maintain a common lsf.shared
file, and each cluster maintains its own, the exchange of system information and jobs between clusters is based on the common definitions. A resource, host type, or model defined in one cluster is considered to be equivalent to that defined in another cluster if the name is the same. It is possible, for example, to define a host model with the same name but with different CPU factors so that each cluster considers the relative CPU speed differently.
In such cases, each cluster will interpret resource, host type or model information received from another cluster based on its local lsf.shared
file. If the definition is not found locally, then it is ignored.
For example, if the remote cluster defines a static boolean resource local_res
and associates it with hostA
, then when hostA
is viewed from the local cluster, local_res
will not be associated with it. Similarly, a user will not be able to submit a job locally specifying a resource which is only defined in a remote cluster.
Each LIM reads the lsf.shared
file and its
own lsf.cluster.
cluster
file. All information about a remote cluster is retrieved dynamically by the
master LIM's on each cluster communicating with each other. However, before
this can occur a master LIM must know the name of at least some of the LSF server
hosts in each remote cluster with which it will interact. The names of the servers
in a remote cluster are used to locate the current master LIM on that cluster
as well as to ensure that any remote master is a valid host for that cluster.
The latter is necessary to ensure security and prevent a bogus LIM from interacting
with your cluster.
The
lsf.shared
file in LSF_CONFDIR
should list the names of all clusters. For example:
Begin Cluster
ClusterName
clus1
clus2
End Cluster
The LIM will read the
lsf.cluster.
cluster
file in
LSF_CONFDIR
for each remote cluster and save the first
ten host names listed in the Host
section. These will be considered
as valid servers for that cluster, that is, one of these servers must be up
and running as the master.
If LSF_CONFDIR
is not shared or replicated then it is necessary to specify a list of valid servers in each cluster using the option Servers
in the Cluster
section. For example,
Begin Cluster
ClusterName Servers
clus1 (hostC hostD hostE)
clus2 (hostA hostB hostF)
End Cluster
The hosts listed in the servers
column are the contacts for LIMs in remote clusters to get in touch with the local cluster. One of the hosts listed in the Servers
column must be up and running as the master for other clusters to contact the local cluster.
To enable the multicluster feature, insert
the following section into the
lsf.cluster.
cluster
file.
Begin Parameters
PRODUCTS=LSF_Base LSF_MultiCluster LSF_Batch
End Parameters Note
The license file must support the LSF MultiCluster feature. If you have configured the cluster to run LSF MultiCluster on all hosts, and the license file does not contain the LSF MultiCluster feature, then the hosts will be unlicensed, even if you have valid licenses for other LSF components. See `Setting Up the License Key' on page 36 of the LSF Installation Guide for more details.
By default, the local cluster can obtain information
about all other clusters specified in lsf.shared
. However, if the
local cluster is only interested in certain remote clusters, you can use the
following section in lsf.cluster.
cluster
to limit which remote clusters your cluster is interested in. For example,
Begin RemoteClusters
CLUSTERNAME
clus3
clus4
End RemoteClusters
This means local applications will not know anything about clusters other than clusters clus3 and clus4. Note that this also affects the way RES behaves when it is authenticating a remote user. Remote execution requests originating from users outside of these clusters are rejected. The default behaviour is to accept any request from all the clusters in lsf.shared
.
The
RemoteClusters
section may be used to specify the following parameters associated with each cluster in addition to the CLUSTERNAME
parameter.
Load and host information is requested on demand from the remote cluster and cached by the local master LIM. Clients in the local cluster receives the cached copy of the remote cluster information. This parameter controls how long load information from the remote cluster is cached in seconds. The default is 60 seconds. Upon a request from a command, the cached information is used if it is less than
CACHE_INTERVAL
second old otherwise fresh information is retrieved from the relevant remote cluster by the local master LIM and returned to the user. Host information is cached twice as long as load information is.
The LSF utilities such as
lsload
, lshosts
, lsplace
, and lsrun
normally only return information about the local cluster. To get information about or run tasks on hosts in remote clusters, you must explicitly specify a cluster name (see sections below). To make resources in remote clusters as transparent as possible to the user, you can specify a remote cluster as being equivalent to the local cluster. The master LIM will then consider all equivalent clusters when servicing requests from clients for load, host or placement information. Therefore, you do not have to explicitly specify remote cluster names. For example, lsload
will list hosts of the local cluster as well as the remote clusters.
By default, if two clusters are configured to access each other's load information, they also accept interactive jobs from each other. If you want your cluster to access load information of another cluster but not to accept interactive jobs from the other cluster, you set
RECV_FROM
to `N
'. Otherwise, set RECV_FROM
to `Y
'.
For cluster clus1, clus2 is equivalent to the local cluster. Load information is refreshed every 30 seconds. However, clus1 rejects interactive jobs from clus2.
# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM
clus2 Y 30 N
...
End RemoteClusters
Cluster clus2 does not treat clus1 as equivalent to the local cluster. Load information is refreshed every 45 seconds. Interactive jobs from clus1 are accepted.
# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM
clus1 N 45 Y
...
End RemoteClusters
By default, root access across clusters is not allowed. To allow root access from a remote cluster, specify
LSF_ROOT_REX=all
in lsf.conf
. This implies that root jobs from both the local and remote clusters are accepted. This applies to both interactive and batch jobs.
If you want cluster clus1 and clus2 to allow root access execution for local jobs only, you insert the line LSF_ROOT_REX=local
into the lsf.conf
of both cluster
clus1 and cluster clus2.
However, if you want clus2 to also allow root access execution from any cluster, change the line in lsf.conf
of cluster clus2 to LSF_ROOT_REX=all
.
lsf.conf
file is host type specific and not shared across different platforms. You must make sure that the lsf.conf
file for all your host types are changed consistently.
To enable batch jobs to flow across clusters the keywords SNDJOBS_TO
and RCVJOBS_FROM
are used in the queue definition of the lsb.queues
file.
Begin Queue
QUEUE_NAME=normal
SNDJOBS_TO=Queue1@Cluster1 Queue2@Cluster2 ... QueueN@ClusterN
RCVJOBS_FROM=Cluster1 Cluster2 ... ClusterN
PRIORITY=30
NICE=20
End Queue
You do not specify a remote queue in the RCVJOBS_FROM
parameter. The administrator of the remote cluster determines which queues will forward jobs to the normal queue in this cluster.
It is up to you and the administrator of the remote clusters to ensure that the policy of the local and remote queues are equivalent in terms of the scheduling behaviour seen by users' jobs.
If a RCVJOBS_FROM
queue specifies REQUEUE_EXIT_VALUES
, it only applies to jobs submitted locally. Even if a remote job's exit value matches a value specified in the REQUEUE_EXIT_VALUES
, the job is not requeued but the job and its exit value are forwarded to the submission cluster.
When accepting a job with a pre-execution command from a remote cluster, the local cluster can configure the maximum number of times it will attempt the pre-execution command before returning the job to the submission cluster. The submission cluster will forward the job to one cluster at a time. The parameter to control the maximum number of times a remote jobs pre-exec command is retried by setting MAX_PREEXEC_RETRY
in lsb.params
.
In order to set up a queue that will forward jobs to remote clusters but will not run any jobs in the local cluster, you can specify that the queue uses no local hosts. This is done by setting the HOSTS parameter in the queue to the keyword "none
".
For example, the following definition sets up a queue remote_only
in cluster clus1 which sends the job to the import
queue in cluster clus2:
Begin Queue
QUEUE_NAME = remote_only
HOSTS = none
SNDJOBS_TO = import@clus2
PRIORITY = 30
DESCRIPTION = A remote only queue
End Queue
Any jobs submitted to queue remote_only
will be forwarded to the queue import in cluster clus2. This is done without attempting to schedule the job locally which reduces the latency of multicluster queues.
For clus2, the queue import
can be specified as follows:
Begin Queue
QUEUE_NAME = import
RCVJOBS_FROM = clus1
PRIORITY = 50
DESCRIPTION = A queue that imports jobs from clus1
End Queue
The information collected by LIMs on remote clusters can be viewed locally. The list of clusters and associated resources can be viewed with the lsclusters
command.
% lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
clus2 ok hostA user1 3 3
clus1 ok hostC user2 3 3
If you have defined EQUIV
to be `Y
' for cluster clus2 in your lsf.cluster.clus1
file, you will see all hosts in cluster clus2 if you run lsload
or lshosts
from cluster clus1. For example:
% lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA NTX86 PENT200 10.0 1 64M 100M Yes (pc nt)
hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cs)
hostB SUN41 SPARCSLC 8.0 1 15M 29M Yes (sparc bsd)
hostD HPPA A900 30.0 4 264M 512M Yes (hpux cs bigmem)
hostE SGI ORIGIN2K 36.0 32 596M 1024M Yes (irix cs bigmem)
hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cs)
You can use a cluster name in place of a host name to get information specific to a cluster. For example:
% lshosts clus1
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostD HPPA A900 30.0 4 264M 512M Yes (hpux cs bigmem)
hostE SGI ORIGIN2K 36.0 32 596M 1024M Yes (irix cs bigmem)
hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cs) % lshosts clus2
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA NTX86 PENT200 10.0 1 64M 100M Yes (pc nt)
hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cs)
hostB SUN41 SPARCSLC 8.0 1 15M 29M Yes (sparc bsd) % lsload clus1 clus2
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostD ok 0.2 0.3 0.4 19% 6.0 6 3 146M 319M 52M
hostC ok 0.1 0.0 0.1 1% 0.0 3 43 63M 44M 7M
hostA ok 0.3 0.3 0.4 35% 0.0 3 1 40M 42M 10M
hostB busy *1.3 1.1 0.7 68% *57.5 2 4 18M 25M 8M
hostE lockU 1.2 2.2 2.6 30% 5.2 35 0 10M 293M 399M
hostF unavail
LSF commands lshosts
, lsload
, lsmon
, lsrun
, lsgrun, and lsplace
can accept a cluster name in addition to host names.
The lsrun
and lslogin
commands
can be used to run interactive jobs both within and across clusters. See `Running
Batch Jobs across Clusters' on page 189 of the LSF Batch
User's Guide for examples.
You can configure the multicluster environment so that one cluster accepts interactive jobs from the other cluster, but not vice versa. For example, to make clus1 reject interactive jobs from clus2, you need to specify the RECV_FROM
field in file lsf.cluster.clus1
:
Begin RemoteClusters
CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM
clus2 Y 30 N
End RemoteClusters
When a user in clus2 attempts to use the cluster clus1, an error will result. For example:
% lsrun -m clus1 -R - hostname
ls_placeofhosts: Not enough host(s) currently eligible
Cluster clus2 will not make any placement of jobs on clus1 and therefore lsrun
will return an error about not being able to find enough hosts.
% lsrun -m hostC -R - hostname
ls_rsetenv: Request from a non-LSF host rejected
In this case, the job request is sent to the host hostC and the RES on hostC rejects the job as it is not considered a valid LSF host.
RECV_FROM
only controls accessibility of interactive jobs. It does not affect jobs submitted to LSF Batch.
As the administrator, you can configure a queue to send jobs to a queue in a remote cluster. Jobs submitted to the local queue can automatically get sent to remote clusters. The following commands can be used to get information about multiple clusters:
The bclusters
command displays a list of queues together with their relationship with queues in remote clusters.
% bclusters
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS
testmc send testmc clus2 ok
testmc recv - clus2 ok
The JOB_FLOW
field describes whether the local queue is to send jobs to or receive jobs from the remote cluster.
If the value of JOB_FLOW
is send
(that is, SNDJOBS_TO
is defined in the local queue), then the REMOTE
field indicates a queue name in the remote cluster. If the remote queue in the remote cluster does not have RCVJOBS_FROM
defined to accept jobs from the cluster, the status field will never be ok
. It will either be disc
, or reject
, where disc
means that the communication between the two clusters has not been established yet. This could occur if there are no jobs waiting to be dispatched or the remote master cannot be located. If remote cluster agrees to accept jobs from the local queue and communication has been successfully established, the status will be ok
, otherwise the status will be rejected.
If the value of JOB_FLOW
is recv
(that is, RCVJOBS_FROM
is defined in the local queue), then the REMOTE
field is always `-'. The CLUSTER
field then indicates the cluster name from which jobs will be accepted. The status
field will be ok if a connection with the remote cluster has established.
% bclusters
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS
testmc send testmc clus2 disc
testmc recv - clus2 disc
The -m
host_name option can optionally take a cluster name to display the queues in a remote cluster.
% bqueues -m clus2
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
fair 3300 Open:Active 5 - - - 0 0 0 0
interactive 1055 Open:Active - - - - 0 0 0 0
testmc 55 Open:Active - - - - 5 2 2 1
priority 43 Open:Active - - - - 0 0 0 0
The bjobs
command can display the cluster name in the FROM_HOST
and EXEC_HOST
fields. The format of these fields can be `host@cluster
' to indicate which cluster the job originated from or was forwarded to. Use the -w
option to get the full cluster name. To query the jobs in a specific cluster, use the -m
option and specify the cluster name.
% bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
101 user7 RUN testmc hostC hostA@clus2 simulate Oct 8 18:32
102 user7 USUSP testmc hostC hostB@clus2 simulate Oct 8 18:56
104 user7 RUN testmc hostA@clus2 hostC verify Oct 8 19:20
% bjobs -m clus2
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
521 user7 RUN testmc hostC@clus1 hostA simulate Oct 8 18:35
522 user7 USUSP testmc hostC@clus1 hostB simulate Oct 8 19:23
520 user7 RUN testmc hostA hostC@clus1 verify Oct 8 19:26
Note that jobs forwarded to a remote cluster are assigned new job IDs. You only need to use local job IDs when manipulating local jobs. The SUBMIT_TIME
field displays the real job submission time for local jobs, and job forwarding time for jobs from remote clusters.
To view the hosts of a specific cluster you can use a cluster name in place of a host name.
% bhosts clus2
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 10 1 1 0 0 0
hostB ok - 10 1 1 0 0 0
hostF closed - 3 3 3 0 0 0
The bhist
command displays the history of events about when a job is forwarded to another cluster or was accepted from another cluster.
% bhist -l 101
Job Id <101>, User <user7>, Project <default>, Command <simulate>
Tue Oct 08 18:32:11: Submitted from host <hostC> to Queue <testmc>, CWD <
/homes/user7>, Requested Resources <type!=ALPHA>
;
Tue Oct 08 18:35:07: Forwarded job to cluster clus2;
Tue Oct 08 18:35:25: Dispatched to <hostA>;
Tue Oct 08 18:35:35: Running with execution home </homes/user7>, Execution C
WD </homes/user7>, Execution Pid <25212>;
Tue Oct 08 20:30:50: USER suspend action initiated (actpid 25672);
Tue Oct 08 20:30:50: Suspended by the user or administrator.
Summary of time in seconds spent in various states by Tue Oct 08 20:35:24 1996
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
176 0 6943 274 0 0 7393
By default, LSF assumes a uniform user name space within a cluster and between clusters, but it is not uncommon for an organization to fail to satisfy this assumption. LSF Batch supports the execution of batch jobs across non-uniform user name spaces between clusters by allowing user account mapping between such clusters--at both the system level and the individual user level.
Individual users of the LSF cluster can set up their own account mapping by setting up a .lsfhosts
file in their home directories. The .lsfhosts
file used to support account mapping can be used to specify cluster names in place of host names.
A user has accounts on two clusters, clus1 and clus2. On cluster clus1, the user name is userA and on clus2 the user name is user_A. To run jobs in either cluster under the appropriate user name, the .lsfhosts
files should be set up as follows:
% cat ~userA/.lsfhosts
clus2 user_A
% cat ~user_A/.lsfhosts
clus1 userA
A user has the account userA on cluster clus1 and wants to use the lsfguest account when running jobs on cluster clus2. The .lsfhosts
files should be set up as follows:
% cat ~userA/.lsfhosts
clus2 lsfguest send
% cat ~lsfguest/.lsfhosts
clus1 userA recv
A site has two clusters, clus1 and clus2. A user has a uniform account name as userB on all hosts in clus2. However, in clus1, this user has a uniform account name as userA, except on hostX, on which he has the account name userA1. This user would like to use both clusters transparently.
To implement this mapping, the user should set the .lsfhosts
files in his home directories on different machines as follows:
% cat ~userA1/.lsfhosts
clus1 userA
hostX userA1
clus2 userB
On any other machine in clus1:
% cat ~userA/.lsfhosts
clus2 userB
hostX userA1
% cat ~userB/.lsfhosts
clus1 userA
hostX userA1
An LSF administrator can set up system level account mapping in the lsb.users
file.
For a job submitted as one user at the submission cluster to run as another user in a remote execution cluster, the LSF Batch system requires that both clusters agree with this account mapping. The submission cluster can propose a set of user mappings and the execution cluster decides whether to accept these settings or not.
The system level account mapping is defined in the "UserMap
" section of the lsb.users
file. It contains multiple account mapping entries, where each entry contains three fields:
LOCAL
: defines a list of local users
REMOTE
: defines a list of remote users in the form of username@clustername
DIRECTION
: two values can be used for this field: "export
" and "import
". The "export
" keyword indicates that exported jobs of users defined in the LOCAL
column are running as the users in the REMOTE
column. The "import
" keyword indicates that imported jobs belonging to remote users specified in the REMOTE
column are running as the users specified in the LOCAL
column.
For userA on cluster clus1 to map to userB on cluster clus2, at clus1, the lsb.users
file can be set up as follows:
Begin UserMap
LOCAL REMOTE DIRECTION
.
userA userB@clus2 export
.
End UserMap
At clus2, the lsb.users
file is set up as:
Begin UserMap
LOCAL REMOTE DIRECTION
.
userB userA@clus1 import
.
End UserMap
As another example, userA on clus1 wants to run as userB or userC on clus2. The clus1s lsb.users
file should be set up as follows:
Begin UserMap
LOCAL REMOTE DIRECTION
.
userA (userB@clus2 userC@clus2) export
.
End UserMap
At clus2, userA is allowed to run as both userB or
userD:
Begin UserMap LOCAL REMOTE DIRECTION (userB userD) userA@clus1 import End UserMap
Despite the fact that clus2 allows userA to also map to userD, clus1 does not propose such a mapping and hence the common agreeable account mapping between clus1 and clus2 for userA is userA@clus1
running as userB@clus2
.