This chapter provides simple examples that demonstrate the use of LSLIB functions in an application. The function prototypes as well as data structures that are used by the functions are described. Many of the examples resemble the implementation of the existing LSF utilities.
One of the services that LSF provides to applications is cluster configuration information service. This section describes how to get such services with a C program using LSLIB.
In the previous chapter, a very simple application was introduced that prints the name of the LSF cluster. This section extends that example to print out more information about the LSF cluster, namely, the current master host name and the defined resource names in the cluster. It uses the following additional LSLIB function calls:
struct lsInfo *ls_info()
char *ls_getmastername()
The function ls_info()
returns a pointer to the following data structure (as defined in <lsf/lsf.h>
):
struct lsinfo {
int nRes; Number of resources in the system
struct resItem *resTable; A resItem for each resource in the system
int nTypes; Number of host types
char hostTypes[MAXTYPES][MAXLSFNAMELEN]; Host types
int nModels; Number of host models
char hostModels[MAXMODELS][AXLSFNAMELEN]; Host models
float cpuFactor[MAXMODELS]; CPU factors of each host model
int numIndx; Total number of load indices in resItem
int numUsrIndix; Number of user-defined load indices
};
The function ls_getmastername()
returns a string containing the name of the current master host.
Both of these functions return NULL
on failure and set lserrno
to indicate the error.
The resItem
structure describes the valid resources defined in the LSF cluster:
struct resItem {
name[MAXLSFNAMELEN]; The name of the resource
char des[MAXRESDESLEN]; The description of the resorce
enum valueType valueType; BOOLEAN, NUMERIC, STRING
enum orderType orderType; INCR, DECR, NA
int flags; RESF_BUILTIN | RESF_DYNAMIC | RESF_GLOBAL
int interval; The update interval for a load index, in seconds
};
The constants MAXTYPES,
MAXMODELS
, and MAXLSFNAMELEN
are defined in <lsf/lsf.h
>. MAXLSFNAMELEN
is the maximum length of a name in the LSF system.
A host type in LSF refers to a class of hosts that are considered to be compatible from an application point of view. This is entirely configurable, although normally hosts with the same architecture (binary compatible hosts) should be configured to have the same host type.
A host model in LSF refers to a class of hosts with the same CPU performance. The CPU factor of a host model should be configured to reflect the CPU speed of the model relative to other host models in the LSF cluster.
Below is an example program that displays the general LSF cluster information using the above LSLIB function calls.
#include <stdio.h>
#include <lsf/lsf.h>
main()
{
struct lsInfo *lsInfo;
char *cluster, *master;
int i;
cluster = ls_getclustername();
if (cluster == NULL) {
ls_perror("ls_getclustername");
exit(-1);
}
printf("My cluster name is <%s>\n", cluster);
master = ls_getmastername();
if (master == NULL) {
ls_perror("ls_getmastername");
exit(-1);
}
printf("Master host is <%s>\n", master);
lsInfo = ls_info();
if (lsInfo == NULL) {
ls_perror("ls_info");
exit(-1);
}
printf("\n%-15.15s %s\n", "RESOURCE_NAME", "DESCRIPTION");
for (i=0; i<lsInfo->nRes; i++)
printf("-15.15s %s\n",
lsInfo->resTable[i].name, lsInfo->resTable[i].des);
exit(0);
}
The returned data structure of every LSLIB function is dynamically allocated inside LSLIB. This storage is automatically freed by LSLIB and re-allocated next time the same LSLIB function is called. An application should never attempt to free the storage returned by LSLIB. If you need to keep this information across calls, make your own copy of the data structure. This applies to all LSLIB function calls.
The above program will produce output similar to the following:
% a.out
My cluster name is <test_cluster>
Master host is <hostA>
RESOURCE_NAME DESCRIPTION
r15s 15-second CPU run queue length
r1m 1-minute CPU run queue length (alias: cpu)
r15m 15-minute CPU run queue length
ut 1-minute CPU utilization (0.0 to 1.0)
pg Paging rate (pages/second)
io Disk IO rate (Kbytes/second)
ls Number of login sessions (alias: login)
it Idle time (minutes) (alias: idle)
tmp Disk space in /tmp (Mbytes)
swp Available swap space (Mbytes) (alias: swap)
mem Available memory (Mbytes)
ncpus Number of CPUs
ndisks Number of local disks
maxmem Maximum memory (Mbytes)
maxswp Maximum swap space (Mbytes)
maxtmp Maximum /tmp space (Mbytes)
cpuf CPU factor
type Host type
model Host model
status Host status
rexpri Remote execution priority
server LSF server host
sparc SUN SPARC
hppa HPPA architecture
bsd BSD UNIX
sysv System V UNIX
hpux HP-UX UNIX
solaris SUN SOLARIS
cs Compute server
fddi Hosts connected to the FDDI
alpha DEC alpha
Host configuration information describes the static attributes of individual hosts in the LSF cluster. Examples of such attributes are host type, host model, number of CPUs, total physical memory, and the special resources associated with the host. These attributes are either read from the LSF configuration file, or found out by LIM on starting up.
The host configuration information can be obtained by calling the following LSLIB function:
struct hostInfo *
ls_gethostinfo(resreq, numhosts, hostlist, listsize, options)
The following parameters are used by this function:
char *resreq; Resource requirements that a host of interest must satisfy
int *numhosts; If numhosts is not NULL, *numhosts contains the size of the returned array
char **hostlist; An array of candidate hosts
int listsize; Number of candidate hosts
int options; Options, currently only DFT_FROMTYPE
On success, this function returns an array containing a hostInfo
structure for each host of interest. On failure, it returns NULL
and sets lserrno
to indicate the error.
The hostInfo
structure is defined in lsf.h
as
struct hostInfo {
char hostName[MAXHOSTNAMELEN]; Host name
char *hostType; Host type
char *hostModel; Host model
float cpuFactor; CPU factor of the host's CPUs
int maxCpus; Number of CPUs on the host
int maxMem; Size of physical memory on the host in MB
int maxSwap; Amount of swap space on the host in MB
int maxTmp Size of the /tmp file system on the host in MB
int nDisk; Number of disks on the host
int nRes; Size of the resources array
char **resources; An array of resources configured for the host
char *windows; Run windows of the host
int numIndx; Size of the busyThreshold array
float *busyThreshold; Array of load thresholds for determining if the host is
busy
char isServer; TRUE if the host is a server, FALSE otherwise
char licensed; TRUE if the host has an LSF license, FALSE otherwise
int rexPriority; Default priority for remote tasks execution on the host
};
On Solaris, when referencing MAXHOSTNAMELEN
, netdb.h
must be included before lsf.h
or lsbatch.h
.
The following example shows how to use the above LSLIB function in a program. This example program displays the name, host type, total memory, number of CPUs and special resources for each host that has more than 50MB of total memory.
#include <netdb.h> /* Required for Solaris to reference MAXHOSTNAMELEN */
#include <lsf/lsf.h>
#include <stdio.h>
main()
{
struct hostInfo *hostinfo;
char *resreq;
int numhosts = 0;
int options = 0;
int i, j;
resreq = "maxmem>50";
hostinfo = ls_gethostinfo(resreq, &numhosts, NULL, 0, options);
if (hostinfo == NULL) {
ls_perror("ls_gethostinfo");
exit(-10);
}
printf("There are %d hosts with more than 50MB total memory\n\n",
numhosts);
printf("%-11.11s %8.8s %6.6s %6.6s %9.9s\n",
"HOST_NAME", "type", "maxMem", "ncpus", "RESOURCES");
for (i = 0; i < numhosts; i++) {
printf("%-11.11s %8.8s %8.0fM ", hostinfo[i].hostName,
hostinfo[i].hostType);
if (hostinfo[i].maxMem > 0)
printf("%6d ", hostinfo[i].maxMem);
else /* maxMem info not available for this host*/
printf("%6.6s ", "-");
if (hostinfo[i].maxCpus > 0)
printf("%6d ", hostinfo[i].maxCpus);
else /* ncpus is not known for this host*/
printf("%6.6s", "-");
for (j = 0; j < hostinfo[i].nRes; j++)
printf(" %s", hostinfo[i].resources[j]);
printf("\n");
}
exit(0);
}
In the above example, resreq
is the resource requirements used to select the hosts. The variables you can use in a resource requirements must be the resource names returned from ls_info()
. You can also run the lsinfo
command to obtain a list of valid resource names in your LSF cluster.
Note that NULL
and 0
were supplied for the third and fourth parameters of the ls_gethostinfo()
call. This causes all LSF hosts meeting resreq
to be returned. If a host list parameter is supplied with this call, the selection of hosts will be limited to those belonging to the list.
If resreq
is NULL
, then the default resource requirements will be used. See `Handling Default Resource Requirements' on page 26 for details.
Note the test of maxMem
and maxCpus
. The values of these fields (along with maxSwap
, maxTmp
and nDisks
) are determined when LIM starts on a host. If the host is unavailable, the master LIM supplies a negative value.
The above example program produces output similar to the following:
% a.out
There are 4 hosts with more than 50MB total memory
HOST_NAME type maxMem ncpus RESOURCES
hostA HPPA10 128M 1 hppa hpux cs
hostB ALPHA 58M 2 alpha cs
hostD ALPHA 72M 4 alpha fddi
hostC SUNSOL 54M 1 solaris fddi
LSLIB also provides functions simpler than ls_gethostinfo()
to get frequently used information. These functions include:
char *ls_gethosttype(hostname)
char *ls_gethostmodel(hostname)
float *ls_gethostfactor(hostname)
See `List of LSF API Functions' on page 99 for more details about these functions.
Some LSLIB functions require a resource requirement parameter. This parameter is passed to LIM for host selection. It is important to understand how LSF handles default resource requirements. See the LSF User's Guide for further information about resource requirements.
It is desirable that LSF automatically assume default values for some key requirements if they are not specified by the user.
The default resource requirements depend on the specific application context. For example, the lsload
command would assume `type==any order[r15s:pg]
' as the default resource requirements, while lsrun
assumes `type==local
order[r15s:pg]
' as the default resource requirements. This is because the user usually expects lsload
to show the load on all hosts, while, with lsrun
, a conservative approach of running task on the same host type as the local host will in most cases cause the task to be run on the correct host type.
LSLIB provides flexibility for the application programmer to decide what the default behavior should be.
LSF default resource requirements contain two parts, a type requirement and an order requirement. The former makes sure that the correct type of hosts are selected, while the latter is used to order the selected hosts according to some reasonable criteria.
LSF appends a type resource requirement to the resource requirement string supplied by an application in the following situations:
resreq
is NULL
or an empty string.
resreq
does not contain a boolean resource, for example, `hppa
', and does not contain a type or model resource, for example, `type==solaris
', `model==HP715
'.
The default type requirement can be either `type==any
' or `type==$fromtype
' depending on whether or not the flag DFT_FROMTYPE
is set in the options
parameter of the function call, where DFT_FROMTYPE
is defined in lsf.h
.
If DFT_FROMTYPE
is set in the options
parameter, the default type requirement is `type==$fromtype
'. If DFT_FROMTYPE
is not set, then the default type requirement is `type==any
'.
The value of fromtype
depends on the function call. If the function has a fromhost
parameter, then fromtype
is the host type of the fromhost
. Otherwise, fromtype
is `local
'.
LSF also appends an order requirement, order[r15s:pg]
, to the resource requirement string if an order requirement is not already specified.
The table below lists some examples of how LSF appends the default resource requirements.
LSLIB provides several functions to obtain dynamic load information about hosts. The dynamic load information is updated periodically by LIM. The definition of all resources is stored in the struct lsInfo
data structure returned by the ls_info(3)
API call (see `Getting General Cluster Configuration Information' on page 19 for details). We can classify LSF resources into two groups by resource location, namely host-based resources and shared resources (see Chapter 2 of the LSF Batch Administrator's Guide for more information on host-based and shared resources).
Dynamic host-based resources are frequently referred to as load indices, consisting of 11 built-in load indices and a number of external load indices. The built-in load indices report load situation about the CPU, memory, disk subsystem, interactive activities, etc. on each host. The external load indices are optionally defined by your LSF administrator to collect additional host-based dynamic load information that is of interest to your site. The LSLIB function that reports information about load indices is:
struct hostLoad *
ls_load(resreq, numhosts, options, fromhost)
On success, this function returns an array containing a hostLoad
structure for each host of interest. On failure, it returns NULL
and sets lserrno
to indicate the error.
This function has the following parameters:
char *resreq; Resource requirements that each host of interest must satisfy
int *numhosts; *numhosts initially contains the number of hosts requested
int options; Option flags that affect the selection of hosts
char *fromhost; Used in conjunction with the DFT_FROMTYPE option
The value of *numhosts
determines how many hosts should be returned by this call. If *numhosts
is 0, information is requested on all hosts satisfying resreq
. If numhosts
is NULL
, load information is requested on one host. If numhosts
is not NULL
, then on a successful return *numhosts
will contain the number of hostLoad
structures returned.
The options
argument is constructed from the bitwise inclusive OR of zero or more of the option flags defined in <lsf/lsf.h
>. The most commonly used flags are:
EXACT
Exactly*numhosts
hosts are desired. IfEXACT
is set, either exactly*numhosts
hosts are returned, or the call returns an error. IfEXACT
is not set, then up to*numhosts
hosts are returned. If*numhosts
is zero, then theEXACT
flag is ignored and as many hosts in the load sharing system as are eligible (that is, those that satisfy the resource requirement) are returned.
OK_ONLY
Return only those hosts that are currently in theok
state. IfOK_ONLY
is set, those hosts that arebusy
,locked
,unlicensed
orunavail
are not returned. IfOK_ONLY
is not set, then some or all of the hosts whose status are notok
may also be returned, depending on the value of*numhosts
and whether theEXACT
flag is set.
NORMALIZE
Normalize CPU load indices. IfNORMALIZE
is set, then the CPU run queue length load indicesr15s
,r1m
, andr15m
of each host returned are normalized. See the LSF User's Guide for different types of run queue lengths. The default is to return the raw run queue length.
EFFECTIVE
IfEFFECTIVE
is set, then the CPU run queue length load indices of each host returned are the effective load. The default is to return the raw run queue length. The optionsEFFECTIVE
andNORMALIZE
are mutually exclusive.
DFT_FROMTYPE
This flag determines the default resource requirements. See `Handling Default Resource Requirements' on page 26 for details.
The fromhost
parameter is used when DFT_FROMTYPE
is set in options
. If fromhost
is NULL
, the local host is assumed.
ls_load()
returns an array of the following data structure as defined in <lsf/lsf.h>
:
struct hostLoad {
char hostName[MAXHOSTNAMELEN]; Name of the host
int status[2]; The operational and load status of the host
float *li; Values for all load indices of this host
}:
The returned hostLoad
array is ordered according to the order requirement in the resource requirements. For details about the ordering of hosts, see the LSF User's Guide.
The following example takes no option, and periodically displays the host name, host status and 1-minute effective CPU run queue length for each Sun SPARC host in the LSF cluster.
#include <stdio.h>
#include <lsf/lsf.h>
main()
{
int i;
struct hostLoad *hosts;
char *resreq = "type==sparc";
int numhosts = 0;
int options = EFFECTIVE;
char *fromhost = NULL;
char field[20] = "*";
for (;;) { /* repeatedly display load */
hosts = ls_load(resreq, &numhosts, options, fromhost);
if (hosts == NULL) {
ls_perror("ls_load");
exit(-1);
}
printf("%-15.15s %6.6s%6.6s\n", "HOST_NAME", "status", "r1m");
for (i = 0; i < numhosts; i++) {
printf("%-15.15s ", hosts[i].hostName);
if (LS_ISUNAVAIL(hosts[i].status)) {
printf("%6s\n", "unavail");
else if (LS_ISBUSY(hosts[i].status))
printf("%6.6s", "busy");
else if (LS_ISLOCKED(hosts[i].status))
printf("%6.6s", "locked");
else
printf("%6.6s", "ok");
if (hosts[i].li[R1M] >= INFINIT_LOAD)
printf("%6.6s\n", "-");
else {
sprintf(field + 1, "%5.1f", hosts[i].li[R1M]);
if (LS_ISBUSYON(hosts[i].status, R1M))
printf("%6.6s\n", field);
else
printf("%6.6s\n", field + 1);
}
}
sleep(60); /* until next minute */
}
}
The output of the above program is similar to the following:
% a.out
HOST_NAME status r1m
hostB ok 0.0
hostC ok 1.2
hostA busy 0.6
hostD busy *4.3
hostF unavail
If the host status is busy
because of r1m
, then a `*' is printed in front of the value of the r1m
load index.
In the above example, note that the returned data structure hostLoad
never needs to be freed by the program even if ls_load()
is called repeatedly.
Each element of the li
array is a floating point number between 0.0 and INFINIT_LOAD
(defined in lsf.h
). The index value is set to INFINIT_LOAD
by LSF to indicate an invalid or unknown value for an index.
The li
array can be indexed using different ways. The constants defined in lsf.h
(see the ls_load(3)
man page) can be used to index any built-in load indices as shown in the above example. If external load indices are to be used, the order in which load indices are returned will be the same as that of the resources returned by ls_info()
. The variables numUsrIndx
and numIndx
in structure lsInfo
can be used to determine which resources are load indices. See `Advanced Programming Topics' on page 83 for a discussion of more flexible ways to map load index names to values.
LSF defines a set of macros in lsf.h
to test the status
field. The most commonly used macros include:
LS_ISUNAVAIL(status)
The LIM on the host is unavailable.
LS_ISBUSY(status)
Returns 1 if the host is busy.
LS_ISBUSYON(status
,index)
Returns 1 if the host is busy on the given index.
LS_ISLOCKED(status)
Returns 1 if the host is locked.
LS_ISOK(status)
Returns 1 if none of the above is true.
Unlike host-based resources which are inherent properties contributing to the making of each host, shared resources are shared among a set of hosts. The availability of a shared resource is characterized by having multiple instances, with each instance being shared among a set of hosts.
The LSLIB function that can be used to access share resource information is:
LS_SHARED_RESOURCE_INFO_T
*ls_sharedresourceinfo(resources, numresources, hostname, options)
On success, this function returns an array containing a shared resource information structure (LS_SHARED_RESOURCE_INFO_T)
for each shared resource. On failure, this function returns NULL
and sets lserrno
to indicate the error. This function has the following parameters:
char **resources; NULL terminated array of resource names
int *numresources; Number of shared resources
int hostName; Host name
int options; Options (Currently set to 0)
resources
is a list (NULL
terminated array) of shared resource names whose resource information is to be returned. Specify NULL
to return resource information for all shared resources defined in the cluster.
numresources
is an integer specifying the number of resource information structures (LS_SHARED_RESOURCE_INFO_T
) to return. Specify 0 to return resource information for all shared resources in the cluster. On success, numresources
is assigned the number of LS_SHARED_RESOURCE_INFO_T
structures returned.
hostName
is the integer name of a host.
Specifying hostName
indicates that only the shared resource information for the named host is to
be returned. Specify NULL
to return resource information for all
shared resources defined in the cluster.
ls_sharedresourceinfo
returns an array of the following data structure as defined in <lsf/lsf.h>
:
typedef struct lsSharedResourceInfo {
char *resourceName; Resource name
int nInstances; Number of instances
LS_SHARED_RESOURCE_INST_T *instances;pointer to the next instance
} LS_SHARED_RESOURCE_INFO_T;
For each shared resource, LS_SHARED_RESOURCE_INFO_T
encapsulates an array of instances in the instances
field. Each instance is represented by the data type LS_SHARED_RESOURCE_INST_T
defined in <lsf/lsf.h>
:
typedef struct lsSharedResourceInstance {
char *value; Value associated with the instance
int nHosts; Number of hosts sharing the instance
char **hostList; Hosts associated with the instance
} LS_SHARED_RESOURCE_INST_T;
The value
field of the LS_SHARED_RESOURCE_INST_T
structure contains the ASCII representation of the actual value of the resource. The interpretation of the value requires the knowledge of the resource (Boolean, Numeric or String), which can be obtained from the resItem
structure accessible through the lsLoad
structure
returned by ls_load()
. See `Getting General Cluster Configuration Information' on page 19 for details.
The following example shows how to use ls_sharedresourceinfo()
to collect dynamic shared resource information in an LSF cluster. This example displays information from all the dynamic shared resources in the cluster. For each resource, the resource name, instance number, value and locations are displayed.
#include <stdio.h>
#include <lsf/lsf.h>
static struct resItem * getResourceDef(char *);
static struct lsInfo * lsInfo;
void
main()
{
struct lsSharedResourceInfo *resLocInfo;
int numRes = 0;
int i, j, k;
lsInfo = ls_info();
if (lsInfo == NULL) {
ls_perror("ls_info");
exit(-10);
}
resLocInfo = ls_sharedresourceinfo (NULL, &numRes, NULL, 0);
if (resLocInfo == NULL) {
ls_perror("ls_sharedresourceinfo");
exit(-1);
}
printf("%-11.11s %8.8s %6.6s %14.14s\n",
"NAME", "INSTANCE", "VALUE", "LOCATIONS");
for (k = 0; k < numRes; k++) {
struct resItem *resDef;
resDef = getResourceDef(resLocInfo[k].resourceName);
if (! (resDef->flags & RESF_DYNAMIC))
continue;
printf("%-11.11s", resLocInfo[k].resourceName);
for (i = 0; i < resLocInfo[k].nInstances; i++) {
struct lsSharedResourceInstance *instance;
if (i == 0)
printf(" %8.1d", i+1);
else
printf(" %19.1d", i+1);
instance = &resLocInfo[k].instances[i];
printf(" %6.6s", instance->value);
for (j = 0; j < instance->nHosts; j++)
if (j == 0)
printf(" %14.14s\n", instance->hostList[j]);
else
printf(" %41.41s\n", instance->hostList[j]);
} /* for */
} /* for */
} /* main */static struct resItem *
getResourceDef(char *resourceName)
{
int i;
for (i = 0; i < lsInfo->nRes; i++) {
if (strcmp(resourceName, lsInfo->resTable[i].name) == 0)
return &lsInfo->resTable[i];
}
/* Fail to find the matching resource */
fprintf(stderr, "Cannot find resource definition for <%s>\n",
resourceName);
exit (-1);
}
The output of the above program is similar to the following:
%a.out
NAME INSTANCE VALUE LOCATIONS
dynamic1 1 2 hostA
hostC
hostD
2 4 hostB
hostE
dynamic2 1 3 hostA
hostE
Note that the resource dynamic1
has two instances, one contains two resource units shared by hostA
, hostC
and hostD
and the other contains four resource units shared by hostB
and hostE
. The dynamic2 resource has only one instance with three resource units shared by hostA
and hostE
.
If you are writing an application that needs to run tasks on the best available hosts, you need to make placement decision as to on which host each task should run.
Placement decision takes two factors into consideration. The first factor is the resource requirements of the task. Every task has a certain set of resource requirements. These may be static, such as a particular hardware architecture or operating system, or dynamic, such as a certain amount of swap space for virtual memory.
LSLIB provides services for placement advice. All you have to do is to call the appropriate LSLIB function with appropriate resource requirements.
A placement advice can be obtained by calling either ls_load()
function or ls_placereq()
function. ls_load()
returns a placement advice together with load index values. ls_placereq()
returns only the qualified host names. The result list of hosts are ordered by preference, with the first being the best. ls_placereq()
is useful when a simple placement decision would suffice. ls_load()
can be used if the placement advice from LSF must be adjusted by your additional criteria. The LSF utilities lsrun
, lsmake
, lslogin
, and lstcsh
all use ls_placereq()
for placement decision, whereas lsbatch uses ls_load()
to get an ordered list of qualified hosts, and then makes placement decisions by considering lsbatch-specific policies.
In order to make optimal placement decisions, it is important that your resource requirements best describe the resource needs of the application. For example, if your task is memory intensive, then your resource requirement string should have `mem'
in the order
segment, `fddi order[mem:r1m]
'.
The LSLIB function, ls_placereq()
, takes the form of
char **ls_placereq(resreq, num, options, fromhost)
On success, this function returns an array of host names that best meet the resource requirements. Hosts may be duplicated for hosts that have sufficient resources to accept multiple tasks (for example, multiprocessors).
On failure, this function returns NULL
and sets lserrno
to indicate the error.
The parameters for ls_placereq()
are very similar to those of the ls_load()
function described in the previous section.
LSLIB will append default resource requirement to resreq according to the rules described in `Handling Default Resource Requirements' on page 26.
Preference is given to fromhost
over remote hosts that do not have significantly lighter load or greater resources. This preference avoids unnecessary task transfer and reduces overhead. If fromhost
is NULL
, then the local host is assumed.
The example program below takes a resource requirement string as an argument and displays the host in the LSF cluster that best satisfies the resource requirement.
#include <stdio.h>
#include <lsf/lsf.h>
main(argc, argv)
int argc;
char *argv[];
{
char *resreq = argv[1];
char **best;
int num = 1;
int options = 0;
char *fromhost = NULL;
if (argc != 2 ) {
fprintf(stderr, "Usage: %s resreq\n", argv[0]);
exit(-2);
}
best = ls_placereq(resreq, &num, options, fromhost);
if (best == NULL) {
ls_perror("ls_placereq()");
exit(-1);
}
printf("The best host is <%s>\n", best[0]);
exit(0);
}
The above program will produce output similar to the following:
% a.out "type==local order[r1m:ls]"
The best host is <hostD>
LSLIB also provides a variant of ls_placereq()
. ls_placeofhosts()
lets you provide a list of candidate hosts. See the ls_policy(3)
man page for details.
Host selection relies on resource requirements. To avoid the need to specify resource requirements each time you execute a task, LSF maintains a list of task names together with their default resource requirements for each user. This information is kept in three task list files: the system-wide defaults, the per-cluster defaults, and the per-user defaults.
A user can put a task name together with its resource requirements into his/her remote task list by running the lsrtasks
command. The lsrtasks
command can be used to add, delete, modify, or display a task entry in the task list. For more information on remote task list and an explanation of resource requirement strings, see the LSF User's Guide.
LSLIB provides a function to get the resource requirements associated with a task name. With this function, LSF applications or utilities can automatically retrieve the resource requirements of a given task if the user does not explicitly specify it. For example, the LSF utility lsrun
tries to find the resource requirements of the user-typed command automatically if `-R
' option is not specified by the user on the command line.
The LSLIB function call ls_resreq()
obtains resource requirements of a given task. The syntax of this function is:
char *
ls_resreq(taskname)
If taskname
does not appear in the remote task list, this function returns NULL
.
Typically the resource requirements of a task are then used for host selection purpose. The following program takes the input argument as a task name, get the associated resource requirements from the remote task list, and then supply the resource requirements to a ls_placereq()
call to get the best host for running this task.
#include <stdio.h>
#include <lsf/lsf.h>
main(argc, argv)
int argc;
char *argv[];
{
char *taskname = argv[1];
char *resreq;
char **best;
if (argc != 2 ) {
fprintf(stderr, "Usage: %s taskname\n", argv[0]);
exit(-1);
}
resreq = ls_resreq(taskname);
if (resreq)
printf("Resource requirement for %s is \"%s\".\n", taskname, resreq);
else
printf("Resource requirement for %s is NULL.\n", taksname);
best = ls_placereq(resreq, NULL, 0, NULL);
if (best == NULL) {
ls_perror("ls_placereq");
exit(-1);
}
printf("Best host for %s is <%s>\n", taskname, best[0]);
exit(0);
}
The above program will produce output similar to the following:
% a.out myjob
Resource requirement for myjob is "swp>50 order[cpu:mem]"
Best host for myjob is <hostD>
Remote execution of interactive tasks in LSF is supported through the Remote Execution Server (RES). The RES listens on a well-known port for service requests. Applications initiate remote execution by making an LSLIB call.
The following steps are typically involved during a remote execution:
stdin
,
stdout
and stderr
associated with the pseudo-terminal
or socket. The remote task runs, and the RES forwards any output from the
remote task back to the client's NIOS.
When the remote task finishes, the RES collects its status and resource usage and sends them back to the client through its NIOS
Note that all of the above transactions are triggered by an LSLIB remote execution function call and take place transparently to the programmer. Figure 5 shows the relationships between these entities.
The same NIOS is shared by all remote tasks running on different hosts started by the same instance of LSLIB. The LSLIB contacts multiple RESes and they all call back to the same NIOS. The sharing of the NIOS is restricted to within the same application.
Remotely executed tasks behave as if they were executing locally. The local execution environment passed to the RES is re-established on the remote host, and the task's status and resource usage are passed back to the client. Terminal I/O is transparent, so even applications such as vi
that do complicated terminal manipulation run transparently on remote hosts. UNIX signals are supported across machines, so that remote tasks get signals as if they were running locally. Job control also is done transparently. This level of transparency is maintained between heterogeneous hosts.
Before executing a task remotely, an application must call the following LSLIB function:
int ls_initrex
(numports, options)
On success, this function initializes the LSLIB for remote execution. If your application is installed as a setuid program, this function returns the number of socket descriptors bound to privileged ports. If your program is not installed as a setuid to root program, this function returns numports
on success.
On failure, this function returns -1 and sets the global variable lserrno
to indicate the error.
This function must be called before any other remote execution function (see ls_rex(3)
) or any remote file operation function (see ls_rfs(3)
) in LSLIB can be called.
ls_initrex()
has the following parameters:
int numports; The number of priviliged ports to create
int options; either KEEPUID or 0
If your program is installed as a setuid to root program, numports
file descriptors, starting from FIRST_RES_SOCK
(defined in <lsf/lsf.h
>), are bound to privileged ports by ls_initrex()
. These sockets are used only for remote connections to RES. If numports
is 0, then the system will use the default value LSF_DEFAULT_SOCKS
defined in lsf.h
.
By default, ls_initrex()
restores the effective user ID to real user ID if the program is installed as a setuid to root program. If options
is KEEPUID
(defined in lsf.h
), ls_initrex()
preserves the current effective user ID. This option is useful if the application needs to be a setuid to root program for some other purpose as well and does not want to go back to real user ID immediately after ls_initrex()
.
If KEEPUID
flag is set in options, you must make sure that your application restores back to the real user ID at a proper time of the program execution.
ls_initrex()
function selects the security option according to the following rule: if the application program invoking it has an effective uid of root, then privileged ports are created; otherwise, no such port is created and, at remote task start-up time, RES will use the authentication protocol defined by LSF_AUTH
in the lsf.conf
file.
The example program below runs a command on one of the best available hosts. It makes use of the ls_resreq()
function described in `Getting Task Resource Requirements' on page 38, the ls_placereq()
function described in `Making a Placement Decision' on page 36, the ls_initrex()
function described in `Initializing an Application for Remote Execution' on page 42, and the following LSLIB function:
int ls_rexecv(host, argv, options)
This function executes a program on the specified host. It does not return if successful. It returns -1 on failure.
This function is basically a remote execvp
. If a connection with the RES on host has not been established, ls_rexecv()
sets one up. The remote execution environment is set up to be exactly the same as the local one and is cached by the remote RES server. This LSLIB function has the following parameters:
char *host; The execution host
char *argv[]; The command and its arguments
int options; See below
The options
argument is constructed from the bitwise inclusive OR of zero or more of the option flags defined in <lsf/lsf.h>
with names starting with `REXF_"
. The most commonly used flag is:
REXF_USEPTY
Use a remote pseudo terminal as thestdin
,stdout
, andstderr
of the remote task. This option provides a higher degree of terminal I/O transparency. This is only necessary for executing interactive screen applications such asvi
. The use of a pseudo-terminal incurs more overhead and should be used only if necessary.
LSLIB also provides an ls_rexecve(3)
function that allows you to specify the environment to be set up on the remote host.
#include <stdio.h>
#include <lsf/lsf.h>
main(argc, argv)
int argc;
char *argv[];
{
char *command = argv[1];
char *resreq;
char **best;
int num = 1;
if (argc < 2 ) {
fprintf(stderr, "Usage: %s command [argument ...]\n", argv[0]);
exit(-1);
}
if (ls_initrex(1, 0) < 0) {
ls_perror("ls_initrex");
exit(-1);
}
resreq = ls_resreq(command);
best = ls_placereq(resreq, &num, 0, NULL);
if (host == NULL) {
ls_perror("ls_placereq()");
exit(-1);
}
printf("<<Execute %s on %s>>\n", command, best[0]);
ls_rexecv(best[0], argv + 1, 0);
/* should never get here */
ls_perror("ls_rexecv()");
exit(-1);
}
The output of the above program would be something like:
% a.out myjob
<<Execute myjob on hostD>>
(output from myjob goes here ....)
Any application that uses LSF's remote execution service must be installed for proper authentication. See `Authentication' on page 17.
The LSF utility lsrun
is implemented using the ls_rexecv()
function. After remote task is initiated, lsrun
calls the ls_rexecv()
function, which then executes NIOS to handle all input/output to and from the remote task and exits with the same status when remote task exits.
See `Advanced Programming Topics' on page 83 for an alternative way to start remote tasks.