[Contents] [Index] [Top] [Bottom] [Prev] [Next]


5. Managing LSF MultiCluster

What is LSF MultiCluster?

Within a single organization, divisions, departments, or sites may have separate LSF clusters managed independently. Many organizations have realized it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:

LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.

LSF MultiCluster is a separate product in the LSF product suite. You must obtain a specific license for LSF MultiCluster before you can use it.

This chapter describes the configuration and operational details of LSF MultiCluster. The topics covered are:

Enabling MultiCluster Functionalities

The following steps should be followed to enable the sharing of load information, interactive tasks and batch jobs between clusters:

  1. Define the multicluster feature in the lsf.cluster.cluster file. Your licence must have multicluster support.
  2. Configure LIM to specify the sharing of load information and interactive job access control.
  3. Configure LSF Batch to specify the queues sharing jobs and account mapping between the users.

The LIM configuration files lsf.shared and lsf.cluster.cluster (stored in LSF_CONFDIR) are affected by multicluster operation. For sharing to take place between clusters, they must share common definitions in terms of host types, models, and resources. For this reason, it is desirable to make the lsf.shared file the same on each cluster, often by putting it into a shared file system, or replicating it across all clusters.

Where it is not possible to maintain a common lsf.shared file, and each cluster maintains its own, the exchange of system information and jobs between clusters is based on the common definitions. A resource, host type, or model defined in one cluster is considered to be equivalent to that defined in another cluster if the name is the same. It is possible, for example, to define a host model with the same name but with different CPU factors so that each cluster considers the relative CPU speed differently.

In such cases, each cluster will interpret resource, host type or model information received from another cluster based on its local lsf.shared file. If the definition is not found locally, then it is ignored.

For example, if the remote cluster defines a static boolean resource local_res and associates it with hostA, then when hostA is viewed from the local cluster, local_res will not be associated with it. Similarly, a user will not be able to submit a job locally specifying a resource which is only defined in a remote cluster.

Each LIM reads the lsf.shared file and its own lsf.cluster.cluster file. All information about a remote cluster is retrieved dynamically by the master LIM's on each cluster communicating with each other. However, before this can occur a master LIM must know the name of at least some of the LSF server hosts in each remote cluster with which it will interact. The names of the servers in a remote cluster are used to locate the current master LIM on that cluster as well as to ensure that any remote master is a valid host for that cluster. The latter is necessary to ensure security and prevent a bogus LIM from interacting with your cluster.

The lsf.shared File

The lsf.shared file in LSF_CONFDIR should list the names of all clusters. For example:

Begin Cluster
ClusterName
clus1
clus2
End Cluster

The LIM will read the lsf.cluster.cluster file in LSF_CONFDIR for each remote cluster and save the first ten host names listed in the Host section. These will be considered as valid servers for that cluster, that is, one of these servers must be up and running as the master.

If LSF_CONFDIR is not shared or replicated then it is necessary to specify a list of valid servers in each cluster using the option Servers in the Cluster section. For example,

Begin Cluster
ClusterName      Servers
clus1          (hostC hostD hostE)
clus2          (hostA hostB hostF)
End Cluster

The hosts listed in the servers column are the contacts for LIMs in remote clusters to get in touch with the local cluster. One of the hosts listed in the Servers column must be up and running as the master for other clusters to contact the local cluster.

The lsf.cluster.cluster File

To enable the multicluster feature, insert the following section into the lsf.cluster.cluster file.

Begin Parameters
PRODUCTS=LSF_Base LSF_MultiCluster LSF_Batch
End Parameters Note

The license file must support the LSF MultiCluster feature. If you have configured the cluster to run LSF MultiCluster on all hosts, and the license file does not contain the LSF MultiCluster feature, then the hosts will be unlicensed, even if you have valid licenses for other LSF components. See `Setting Up the License Key' on page 36 of the LSF Installation Guide for more details.

By default, the local cluster can obtain information about all other clusters specified in lsf.shared. However, if the local cluster is only interested in certain remote clusters, you can use the following section in lsf.cluster.cluster to limit which remote clusters your cluster is interested in. For example,

Begin RemoteClusters
CLUSTERNAME
clus3
clus4
End RemoteClusters

This means local applications will not know anything about clusters other than clusters clus3 and clus4. Note that this also affects the way RES behaves when it is authenticating a remote user. Remote execution requests originating from users outside of these clusters are rejected. The default behaviour is to accept any request from all the clusters in lsf.shared.

The RemoteClusters section may be used to specify the following parameters associated with each cluster in addition to the CLUSTERNAME parameter.

CACHE_INTERVAL

Load and host information is requested on demand from the remote cluster and cached by the local master LIM. Clients in the local cluster receives the cached copy of the remote cluster information. This parameter controls how long load information from the remote cluster is cached in seconds. The default is 60 seconds. Upon a request from a command, the cached information is used if it is less than CACHE_INTERVAL second old otherwise fresh information is retrieved from the relevant remote cluster by the local master LIM and returned to the user. Host information is cached twice as long as load information is.

EQUIV

The LSF utilities such as lsload, lshosts, lsplace, and lsrun normally only return information about the local cluster. To get information about or run tasks on hosts in remote clusters, you must explicitly specify a cluster name (see sections below). To make resources in remote clusters as transparent as possible to the user, you can specify a remote cluster as being equivalent to the local cluster. The master LIM will then consider all equivalent clusters when servicing requests from clients for load, host or placement information. Therefore, you do not have to explicitly specify remote cluster names. For example, lsload will list hosts of the local cluster as well as the remote clusters.

RECV_FROM

By default, if two clusters are configured to access each other's load information, they also accept interactive jobs from each other. If you want your cluster to access load information of another cluster but not to accept interactive jobs from the other cluster, you set RECV_FROM to `N'. Otherwise, set RECV_FROM to `Y'.

Example

For cluster clus1, clus2 is equivalent to the local cluster. Load information is refreshed every 30 seconds. However, clus1 rejects interactive jobs from clus2.

# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
CLUSTERNAME      EQUIV   CACHE_INTERVAL  RECV_FROM
clus2              Y          30             N
...
End RemoteClusters

Cluster clus2 does not treat clus1 as equivalent to the local cluster. Load information is refreshed every 45 seconds. Interactive jobs from clus1 are accepted.

# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
CLUSTERNAME      EQUIV   CACHE_INTERVAL  RECV_FROM
clus1              N         45             Y
...
End RemoteClusters

Root Access

By default, root access across clusters is not allowed. To allow root access from a remote cluster, specify LSF_ROOT_REX=all in lsf.conf. This implies that root jobs from both the local and remote clusters are accepted. This applies to both interactive and batch jobs.

If you want cluster clus1 and clus2 to allow root access execution for local jobs only, you insert the line LSF_ROOT_REX=local into the lsf.conf of both cluster clus1 and cluster clus2. However, if you want clus2 to also allow root access execution from any cluster, change the line in lsf.conf of cluster clus2 to LSF_ROOT_REX=all.

Note

lsf.conf file is host type specific and not shared across different platforms. You must make sure that the lsf.conf file for all your host types are changed consistently.

LSF Batch Configuration

To enable batch jobs to flow across clusters the keywords SNDJOBS_TO and RCVJOBS_FROM are used in the queue definition of the lsb.queues file.

The syntax is as follows:

Begin Queue
QUEUE_NAME=normal
SNDJOBS_TO=Queue1@Cluster1 Queue2@Cluster2 ... QueueN@ClusterN
RCVJOBS_FROM=Cluster1 Cluster2 ... ClusterN
PRIORITY=30
NICE=20
End Queue

Note

You do not specify a remote queue in the RCVJOBS_FROM parameter. The administrator of the remote cluster determines which queues will forward jobs to the normal queue in this cluster.

It is up to you and the administrator of the remote clusters to ensure that the policy of the local and remote queues are equivalent in terms of the scheduling behaviour seen by users' jobs.

If a RCVJOBS_FROM queue specifies REQUEUE_EXIT_VALUES, it only applies to jobs submitted locally. Even if a remote job's exit value matches a value specified in the REQUEUE_EXIT_VALUES, the job is not requeued but the job and its exit value are forwarded to the submission cluster.

When accepting a job with a pre-execution command from a remote cluster, the local cluster can configure the maximum number of times it will attempt the pre-execution command before returning the job to the submission cluster. The submission cluster will forward the job to one cluster at a time. The parameter to control the maximum number of times a remote jobs pre-exec command is retried by setting MAX_PREEXEC_RETRY in lsb.params.

Remote-Only MultiCluster Queues

In order to set up a queue that will forward jobs to remote clusters but will not run any jobs in the local cluster, you can specify that the queue uses no local hosts. This is done by setting the HOSTS parameter in the queue to the keyword "none".

For example, the following definition sets up a queue remote_only in cluster clus1 which sends the job to the import queue in cluster clus2:

Begin Queue
QUEUE_NAME = remote_only
HOSTS = none
SNDJOBS_TO = import@clus2
PRIORITY = 30
DESCRIPTION = A remote only queue
End Queue

Any jobs submitted to queue remote_only will be forwarded to the queue import in cluster clus2. This is done without attempting to schedule the job locally which reduces the latency of multicluster queues.

For clus2, the queue import can be specified as follows:

Begin Queue
QUEUE_NAME = import
RCVJOBS_FROM = clus1
PRIORITY = 50
DESCRIPTION = A queue that imports jobs from clus1
End Queue

Inter-cluster Load and Host Information Sharing

The information collected by LIMs on remote clusters can be viewed locally. The list of clusters and associated resources can be viewed with the lsclusters command.

lsclusters
CLUSTER_NAME   STATUS   MASTER_HOST             ADMIN    HOSTS  SERVERS
clus2          ok       hostA                   user1    3      3
clus1          ok       hostC                   user2    3      3

If you have defined EQUIV to be `Y' for cluster clus2 in your lsf.cluster.clus1 file, you will see all hosts in cluster clus2 if you run lsload or lshosts from cluster clus1. For example:

lshosts
HOST_NAME   type   model    cpuf ncpus maxmem maxswp server RESOURCES
hostA       NTX86  PENT200  10.0     1    64M   100M    Yes (pc nt)
hostF       HPPA   HP735    14.0     1    58M    94M    Yes (hpux cs)
hostB       SUN41  SPARCSLC  8.0     1    15M    29M    Yes (sparc bsd)
hostD       HPPA   A900     30.0     4   264M   512M    Yes (hpux cs bigmem)
hostE       SGI    ORIGIN2K 36.0    32   596M  1024M    Yes (irix cs bigmem)
hostC       SUNSOL SunSparc 12.0     1    56M    75M    Yes (solaris cs)

You can use a cluster name in place of a host name to get information specific to a cluster. For example:

lshosts clus1
HOST_NAME   type   model    cpuf ncpus maxmem maxswp server RESOURCES
hostD       HPPA   A900     30.0     4   264M   512M    Yes (hpux cs bigmem)
hostE       SGI    ORIGIN2K 36.0    32   596M  1024M    Yes (irix cs bigmem)
hostC       SUNSOL SunSparc 12.0     1    56M    75M    Yes (solaris cs) lshosts clus2
HOST_NAME   type   model    cpuf ncpus maxmem maxswp server RESOURCES
hostA       NTX86  PENT200  10.0     1    64M   100M    Yes (pc nt)
hostF       HPPA   HP735    14.0     1    58M    94M    Yes (hpux cs)
hostB       SUN41  SPARCSLC  8.0     1    15M    29M    Yes (sparc bsd) lsload clus1 clus2
HOST_NAME   status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
hostD           ok   0.2   0.3   0.4  19%   6.0   6     3  146M  319M   52M
hostC           ok   0.1   0.0   0.1   1%   0.0   3    43   63M   44M    7M
hostA           ok   0.3   0.3   0.4  35%   0.0   3     1   40M   42M   10M
hostB         busy  *1.3   1.1   0.7  68% *57.5   2     4   18M   25M    8M
hostE        lockU   1.2   2.2   2.6  30%   5.2  35     0   10M  293M  399M
hostF      unavail

LSF commands lshosts, lsload, lsmon, lsrun, lsgrun, and lsplace can accept a cluster name in addition to host names.

Running Interactive Jobs on Remote Clusters

The lsrun and lslogin commands can be used to run interactive jobs both within and across clusters. See `Running Batch Jobs across Clusters' on page 189 of the LSF Batch User's Guide for examples.

You can configure the multicluster environment so that one cluster accepts interactive jobs from the other cluster, but not vice versa. For example, to make clus1 reject interactive jobs from clus2, you need to specify the RECV_FROM field in file lsf.cluster.clus1:

Begin RemoteClusters
CLUSTERNAME  EQUIV   CACHE_INTERVAL     RECV_FROM
clus2        Y       30                 N
End RemoteClusters

When a user in clus2 attempts to use the cluster clus1, an error will result. For example:

lsrun -m clus1 -R - hostname
ls_placeofhosts: Not enough host(s) currently eligible

Cluster clus2 will not make any placement of jobs on clus1 and therefore lsrun will return an error about not being able to find enough hosts.

lsrun -m hostC -R - hostname
ls_rsetenv: Request from a non-LSF host rejected

In this case, the job request is sent to the host hostC and the RES on hostC rejects the job as it is not considered a valid LSF host.

Note

RECV_FROM only controls accessibility of interactive jobs. It does not affect jobs submitted to LSF Batch.

Distributing Batch Jobs Across Clusters

As the administrator, you can configure a queue to send jobs to a queue in a remote cluster. Jobs submitted to the local queue can automatically get sent to remote clusters. The following commands can be used to get information about multiple clusters:

bclusters

The bclusters command displays a list of queues together with their relationship with queues in remote clusters.

bclusters
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS
testmc          send       testmc      clus2      ok
testmc          recv         -         clus2      ok

The JOB_FLOW field describes whether the local queue is to send jobs to or receive jobs from the remote cluster.

If the value of JOB_FLOW is send (that is, SNDJOBS_TO is defined in the local queue), then the REMOTE field indicates a queue name in the remote cluster. If the remote queue in the remote cluster does not have RCVJOBS_FROM defined to accept jobs from the cluster, the status field will never be ok. It will either be disc, or reject, where disc means that the communication between the two clusters has not been established yet. This could occur if there are no jobs waiting to be dispatched or the remote master cannot be located. If remote cluster agrees to accept jobs from the local queue and communication has been successfully established, the status will be ok, otherwise the status will be rejected.

If the value of JOB_FLOW is recv (that is, RCVJOBS_FROM is defined in the local queue), then the REMOTE field is always `-'. The CLUSTER field then indicates the cluster name from which jobs will be accepted. The status field will be ok if a connection with the remote cluster has established.

bclusters
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS
testmc          send       testmc      clus2      disc
testmc          recv         -         clus2      disc

bqueues

The -m host_name option can optionally take a cluster name to display the queues in a remote cluster.

bqueues -m clus2
QUEUE_NAME    PRIO       STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
fair          3300    Open:Active      5    -    -    -     0     0     0     0
interactive   1055    Open:Active      -    -    -    -     0     0     0     0
testmc          55    Open:Active      -    -    -    -     5     2     2     1
priority        43    Open:Active      -    -    -    -     0     0     0     0

bjobs

The bjobs command can display the cluster name in the FROM_HOST and EXEC_HOST fields. The format of these fields can be `host@cluster' to indicate which cluster the job originated from or was forwarded to. Use the -w option to get the full cluster name. To query the jobs in a specific cluster, use the -m option and specify the cluster name.

bjobs
JOBID USER     STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
101   user7    RUN   testmc     hostC       hostA@clus2 simulate   Oct  8 18:32
102   user7    USUSP testmc     hostC       hostB@clus2 simulate   Oct  8 18:56
104   user7    RUN   testmc     hostA@clus2 hostC        verify    Oct  8 19:20
bjobs -m clus2
JOBID USER     STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
521   user7    RUN   testmc     hostC@clus1 hostA       simulate   Oct  8 18:35
522   user7    USUSP testmc     hostC@clus1 hostB       simulate   Oct  8 19:23
520   user7    RUN   testmc     hostA       hostC@clus1  verify    Oct  8 19:26

Note that jobs forwarded to a remote cluster are assigned new job IDs. You only need to use local job IDs when manipulating local jobs. The SUBMIT_TIME field displays the real job submission time for local jobs, and job forwarding time for jobs from remote clusters.

bhosts

To view the hosts of a specific cluster you can use a cluster name in place of a host name.

bhosts clus2
HOST_NAME          STATUS    JL/U  MAX  NJOBS  RUN  SSUSP USUSP  RSV
hostA              ok          -    10     1     1     0     0     0
hostB              ok          -    10     1     1     0     0     0
hostF              closed      -     3     3     3     0     0     0

bhist

The bhist command displays the history of events about when a job is forwarded to another cluster or was accepted from another cluster.

bhist -l 101
Job Id <101>, User <user7>, Project <default>, Command <simulate>
Tue Oct 08 18:32:11: Submitted from host <hostC> to Queue <testmc>, CWD <
                     /homes/user7>, Requested Resources <type!=ALPHA>
                   ;
Tue Oct 08 18:35:07: Forwarded job to cluster clus2;
Tue Oct 08 18:35:25: Dispatched to <hostA>;
Tue Oct 08 18:35:35: Running with execution home </homes/user7>, Execution C
                     WD </homes/user7>, Execution Pid <25212>;
Tue Oct 08 20:30:50: USER suspend action  initiated (actpid 25672);
Tue Oct 08 20:30:50: Suspended by the user or administrator.
Summary of time in seconds spent in various states by Tue Oct 08 20:35:24 1996
PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
176      0        6943     274      0        0        7393

Account Mapping Between Clusters

By default, LSF assumes a uniform user name space within a cluster and between clusters, but it is not uncommon for an organization to fail to satisfy this assumption. LSF Batch supports the execution of batch jobs across non-uniform user name spaces between clusters by allowing user account mapping between such clusters--at both the system level and the individual user level.

User Level Account Mapping

Individual users of the LSF cluster can set up their own account mapping by setting up a .lsfhosts file in their home directories. The .lsfhosts file used to support account mapping can be used to specify cluster names in place of host names.

Example #1

A user has accounts on two clusters, clus1 and clus2. On cluster clus1, the user name is userA and on clus2 the user name is user_A. To run jobs in either cluster under the appropriate user name, the .lsfhosts files should be set up as follows:

On machines in cluster clus1:

cat ~userA/.lsfhosts
clus2 user_A

On machines in cluster clus2:

cat ~user_A/.lsfhosts
clus1 userA

Example #2

A user has the account userA on cluster clus1 and wants to use the lsfguest account when running jobs on cluster clus2. The .lsfhosts files should be set up as follows:

On machines in cluster clus1:

cat ~userA/.lsfhosts
clus2 lsfguest send

On machines in cluster clus2:

cat ~lsfguest/.lsfhosts
clus1 userA recv

Example #3

A site has two clusters, clus1 and clus2. A user has a uniform account name as userB on all hosts in clus2. However, in clus1, this user has a uniform account name as userA, except on hostX, on which he has the account name userA1. This user would like to use both clusters transparently.

To implement this mapping, the user should set the .lsfhosts files in his home directories on different machines as follows:

On hostX of clus1:

cat ~userA1/.lsfhosts
clus1    userA
hostX    userA1
clus2    userB

On any other machine in clus1:

cat ~userA/.lsfhosts
clus2    userB
hostX    userA1

On the clus2 machines:

cat ~userB/.lsfhosts
clus1    userA
hostX    userA1

System Level Account Mapping

An LSF administrator can set up system level account mapping in the lsb.users file.

For a job submitted as one user at the submission cluster to run as another user in a remote execution cluster, the LSF Batch system requires that both clusters agree with this account mapping. The submission cluster can propose a set of user mappings and the execution cluster decides whether to accept these settings or not.

The system level account mapping is defined in the "UserMap" section of the lsb.users file. It contains multiple account mapping entries, where each entry contains three fields:

Example #1

For userA on cluster clus1 to map to userB on cluster clus2, at clus1, the lsb.users file can be set up as follows:

Begin UserMap
LOCAL     REMOTE                       DIRECTION
.
userA     userB@clus2                  export
.
End UserMap

At clus2, the lsb.users file is set up as:

Begin UserMap
LOCAL     REMOTE                       DIRECTION
.
userB     userA@clus1                  import
.
End UserMap

Example #2

As another example, userA on clus1 wants to run as userB or userC on clus2. The clus1s lsb.users file should be set up as follows:

Begin UserMap
LOCAL     REMOTE                       DIRECTION
.
userA     (userB@clus2 userC@clus2)     export
.
End UserMap


At clus2, userA is allowed to run as both userB or userD:

 

Begin UserMap
LOCAL            REMOTE                DIRECTION
(userB userD)    userA@clus1           import
End UserMap 

Despite the fact that clus2 allows userA to also map to userD, clus1 does not propose such a mapping and hence the common agreeable account mapping between clus1 and clus2 for userA is userA@clus1 running as userB@clus2.



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.