Within a company or organization, each division, department, or site may have a separately managed LSF cluster. Many organizations have realized that it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:
LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.
The commands lshosts
, lsload
, and lsmon
can accept a cluster name to allow you to view the remote cluster. A list of clusters and associated information can be viewed with the lsclusters
command.
% lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
clus1 ok hostC user1 3 3
clus2 ok hostA user1 3 3
% lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA NTX86 PENT200 10.0 - - - Yes (NT)
hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cserver)
hostB SUN41 SPARCSLC 3.0 1 15M 29M Yes (sparc bsd)
hostD HPPA HP735 14.0 1 463M 812M Yes (hpux cserver)
hostE SGI R10K 16.0 16 896M 1692M Yes (irix cserver)
hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cserver)
% lshosts clus1
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostD HPPA HP735 14.0 1 463M 812M Yes (hpux cserver)
hostE SGI R10K 16.0 16 896M 1692M Yes (irix cserver)
hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cserver)
% lshosts clus2
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hostA NTX86 PENT200 10.0 - - - Yes (NT)
hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cserver)
hostB SUN41 SPARCSLC 3.0 1 15M 29M Yes (sparc bsd)
% lsload clus1 clus2
HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem
hostD ok 0.2 0.3 0.4 19% 6.0 6 3 146M 319M 252M
hostC ok 0.1 0.0 0.1 1% 0.0 3 43 63M 44M 27M
hostA ok 0.3 0.3 0.4 35% 0.0 3 1 40M 42M 13M
hostB busy *1.3 1.1 0.7 68% *57.5 2 4 18M 20M 8M
hostE lockU 1.2 2.2 2.6 30% 5.2 35 0 10M 693M 399M
hostF unavail
A queue may be configured to send LSF Batch jobs to a queue in a remote cluster (see `LSF Batch Configuration' on page 148 of the LSF Batch Administrator's Guide). When you submit a job to that local queue it will automatically get sent to the remote cluster:
The bclusters
command displays a list of local queues together with their relationship with queues in remote clusters.
% bclusters
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS
testmc send testmc clus2 ok
testmc recv - clus2 ok
The meanings of displayed fields are:
LOCAL_QUEUE
The name of the local queue that either receive jobs from queues in remote clusters, or forward jobs to queues in remote clusters.
JOB_FLOW
The value can be eithersend
orrecv
. If the value issend
, then this line describes a job flow from the local queue to a queue in a remote cluster. If the value isrecv
, then this line describes a job flow from a remote cluster to the local queue.
REMOTE
Queue name of a remote cluster that the local queue can send jobs to. This field is always "-" ifJOB_FLOW
field is "recv
".
CLUSTER
Remote cluster name.
STATUS
Connection status between the local queue and remote queue. IfJOB_FLOW
field issend
, then the possible values forSTATUS
field are "ok
", "reject
", and "disc
", otherwise the possible status are "ok
" and "disc
". When status is "ok
", it indicates that both queues agree on the job flow. When status is "disc
", it means communications between the local and remote cluster has not been established yet. This may either be because no jobs need to be forwarded to the remote cluster yet, or thembatchd
's of the two clusters have not been able to get in touch with each other. TheSTATUS
isreject
if send is the job flow and the queue in the remote cluster is not configured to receive jobs from the local queue.
In the above example, local queue testmc
can forward jobs in the local cluster to testmc
queue of remote cluster clus2 and vice versa.
If there is no queue in your cluster that is configured for remote clusters, you will see the following:
% bclusters
No local queue sending/receiving jobs from remote clusters
Use the -m
option with a cluster name to the bqueues
command to display the queues in the remote cluster.
% bqueues -m clus2
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
fair 3300 Open:Active 5 - - - 1 1 0 0
interactive 1055 Open:Active - - - - 1 0 1 0
testmc 55 Open:Active - - - - 5 2 2 1
priority 43 Open:Active - - - - 0 0 0 0
Submit your job with the bsub
command to the queue that is sending jobs to the remote cluster.
% bsub -q testmc -J mcjob myjob
Job <101> is submitted to queue <testmc>.
The bjobs
command will display the cluster name in the FROM_HOST
and EXEC_HOST
fields. The format of these fields is `host@cluster
' indicating which cluster the job originated from or was forwarded to. To query the jobs running in another cluster, use the -m
option and specify a cluster name.
% bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
101 user7 RUN testmc hostC hostA@clus2 mcjob Oct 19 19:41
% bjobs -m clus2
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
522 user7 RUN testmc hostC@clus2 hostA mcjob Oct 19 23:09
Note that the submission time shown from the remote cluster is the time when the job was forwarded to that cluster.
To view the hosts of another cluster you can use a cluster name in place of a host name as the argument to the bhosts
command.
% bhosts clus2
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
hostA ok - 10 1 1 0 0 0
hostB ok - 10 2 1 0 0 1
hostF unavail - 3 1 1 0 0 0
Run bhist
command to see the history of your job, including information about job forwarding to another cluster.
% bhist -l 101
Job Id <101>, Job Name <mcjob>, User <user7>, Project <default>, Command
<myjob>
Sat Oct 19 19:41:14: Submitted from host <hostC> to Queue <testmc>,CWD <$HOME>
Sat Oct 19 21:18:40: Parameters are modified to:Project <test>,Queue <testmc>,
Job Name <mcjob>;
Mon Oct 19 23:09:26: Forwarded job to cluster clus2;
Mon Oct 19 23:09:26: Dispatched to <hostA>;
Mon Oct 19 23:09:40: Running with execution home </home/user7>, Execution CWD <
/home/user7>, Execution Pid <4873>;
Mon Oct 20 07:02:53: Done successfully. The CPU time used is 12981.4 seconds;
Summary of time in seconds spent in various states by Wed Oct 20 07:02:53 1997
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
5846 0 28399 0 0 0 34245
The lsrun
command allows you to specify a cluster name instead of a host name. When a cluster name is specified, a host is selected from the cluster. For example:
% lsrun -m clus2 -R type==any hostname
hostA
The -m
option to the lslogin
command can be used to specify a cluster name. This allows you to login to the best host in a remote cluster.
% lslogin -v -m clus2
<<Remote login to hostF >>
The multicluster environment can be configured so that one cluster accepts interactive jobs from the other cluster, but not vice versa. See `Running Interactive Jobs on Remote Clusters' on page 152 of the LSF Batch Administrator's Guide. If the remote cluster will not accept jobs from your cluster, you will get an error:
% lsrun -m clus2 -R type==any hostname
ls_placeofhosts: Not enough host(s) currently eligible
By default, LSF assumes a uniform user name space within a cluster and between clusters. It is not uncommon for an organization to fail to satisfy this assumption. Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs. The .lsfhosts
file used to support account mapping can be used to specifying cluster names in place of host names.
For example, you have accounts on two clusters, clus1 and clus2. In clus1, your user name is `user1' and in clus2
your user name is `ruser_1'. To run your jobs in either cluster under the appropriate user name, you should setup your .lsfhosts
file as follows:
% cat ~user1/.lsfhosts
clus2 ruser_1
% cat ~ruser_1/.lsfhosts
clus1 user1
For another example, you have the account `user1' on cluster clus1 and you want to use the `lsfguest' account when sending jobs to be run on cluster clus2. The .lsfhosts
files should be setup as follows:
% cat ~user1/.lsfhosts
clus2 lsfguest send
% cat ~lsfguest/.lsfhosts
clus1 user1 recv
The other features of the .lsfhosts
file
also work in the multicluster environment. See `User
Controlled Account Mapping' on page 86 for further details. Also see
`Account Mapping Between Clusters' on
page 155 of the LSF Batch
Administrator's Guide.