[Contents] [Index] [Top] [Bottom] [Prev] [Next]


1. Introduction

What is LSF JobScheduler?

Production job scheduling has been an integral part of mainframe data processing operations for decades. With the emergence of distributed computing, along with UNIX and Windows NT workstations and file servers, system architecture has changed drastically, calling for a new approach to job scheduling.

LSF JobScheduler is part of Platform's Workload Management solutions. LSF JobScheduler centralizes and automates the scheduling of production workload in distributed UNIX and Windows NT environments. LSF JobScheduler integrates heterogeneous servers into a `virtual mainframe' to deliver high availability, robustness, and ease-of-use. It provides the functions of traditional mainframe job scheduling with transparent operation across a network of heterogeneous UNIX and Windows NT systems.

With LSF JobScheduler, you can target jobs to specific servers, or you can let the system match the requirements of your jobs to the capabilities of your servers. LSF JobScheduler dynamically collects system load information about all aspects of computing resources including CPU, memory, I/O, disk space, and interactive activities. Jobs are dynamically scheduled to run on the most suitable servers available. LSF JobScheduler offers graphical tools in addition to the standard command line interface.

Some of the features of LSF JobScheduler are:

Structure of LSF JobScheduler

LSF JobScheduler consists of a master scheduler and a number of slave execution servers distributed across a cluster of computers. There is only one master scheduler in the whole cluster, and one slave execution server on each machine that runs jobs. The master scheduler (mbatchd) accepts jobs created by users and schedules jobs to run by slave execution servers on individual machines. A slave execution server (sbatchd) accepts jobs dispatched from the master scheduler and runs them on the local machine, controlling the execution according to job specifications from the master.

The components of LSF JobScheduler are shown in Figure 1.

LSF Cluster

An LSF cluster is a group of computers that are configured to act as a single, integrated system for job scheduling. All machines configured into the cluster share resources transparently. An LSF cluster consists of one or more server hosts and zero or more client hosts. A client host is a machine that does not run user-submitted jobs, but allows users to submit, monitor, and control jobs running on LSF JobScheduler server hosts. A server host does everything a client host can do, and also runs user-submitted jobs .

One of the server hosts acts as the master for the cluster. It runs the master scheduler, mbatchd. Each server host runs a slave execution server, sbatchd, which manages jobs dispatched by the master scheduler. Each server host also runs a Load Information Manager daemon, LIM. It monitors the availability of resources and makes this information available to LSF JobScheduler and other LSF utilities.

Each cluster can have one or more LSF cluster administrators. An LSF cluster administrator is a user account that has permission to change the LSF JobScheduler configuration and perform other maintenance functions. The LSF cluster administrator has the authority to decide how the LSF JobScheduler cluster is configured.

The master scheduler maintains the status of all entities defined in the system including jobs, events, calendars, and queues.

Figure 1. Components of LSF JobScheduler

Jobs

A job is a program or command that is scheduled to run in a specific environment. A job has many attributes specifying its scheduling and execution requirements. Job attributes are specified by the user who submits the job. LSF JobScheduler uses job attributes, system resource information, and configured scheduling policies to decide when, where, and how to run jobs. While each job is assigned a unique job identification number by the system, you can associate your own job names to make referencing easier.

Job Groups

A job group is a container for jobs in much the same way that a directory is a container for files. When developing a complex schedule involving many jobs, it is useful to organize related jobs into groups so that it becomes easier to view and manipulate them. For example, a payroll application may have one group of jobs that calculates weekly payments, another job group for calculating monthly salaries, and a third job group that handles the salaries of part-time or contract employees. Users can view and operate on the job groups rather than looking at individual jobs.

Events

An event is a change or occurrence in the system (such as the creation of a specific file, a tape drive becoming available, or a prior job completing successfully or at a particular time) that can be used to trigger jobs. LSF JobScheduler responds to the following types of events:

When defining a job, it is possible to specify any combination of events that must be satisfied before the job is considered eligible for execution.

Calendars

A calendar consists of a sequence of days on which the calendar is considered active. A job is scheduled when the calendar is active and a time of day specification is met. Calendars are defined and manipulated independently of jobs so that multiple jobs can share the same calendar. Each user can maintain a private set of calendars, reference calendars of other users, or use the calendars configured into the system. A calendar can be modified after it has been created. Any new jobs associated with it will automatically run according to the new definition.

Exceptions and Alarms

When managing critical jobs it is important to ensure that the jobs run properly. When problems are detected during the processing of the job, it becomes necessary to take some form of corrective action. LSF JobScheduler allows you to associate each job with one or more exception handlers which tell the system to watch for a particular type of error and take a specified action if it occurs. An exception condition represents a problem in processing a job. LSF JobScheduler can watch for several types of exception conditions during a job's life cycle.

An alarm specifies how a notification should be sent in the event of an exception.

Queues

Production job scheduling provides efficient, timely execution of mission-critical jobs. When you submit a job, it is placed into a queue. The LSF JobScheduler system runs jobs from the queue based on the scheduled time and when the appropriate resources are available. Jobs from a queue can be dispatched to any server hosts in your cluster that are configured to run jobs for the queue.

A queue can be configured with many features that make your life easier. LSF JobScheduler allows you to define various types of services by configuring different queues. For each queue, you can configure a set of parameters that customize job scheduling policies, job execution behaviour, and resource allocation constraints.

Inter-job Dependency

LSF JobScheduler allows you to control a job's execution upon the completion, failure, or startup of other jobs. For example, you can configure the system to start several main processing jobs only after a data preparation job has completed, then you can start the post-processing job after all the main processing jobs are done. These jobs do not have to run on the same host.

Command Set and GUI Tools

LSF JobScheduler provides a rich set of commands and GUI tools to define, monitor and manage the workload using any desktop as the system console. Typically, you define your calendars and jobs together with any interdependencies using the GUI tools xbcal and xbsub. Once these are set up, LSF JobScheduler will ensure that jobs are run according to the conditions and policies specified.

You can keep close track of your jobs with LSF JobScheduler using the GUI program xlsjs. As well as monitoring the status of jobs, the system allows you to perform various operations on them, including:

LSF JobScheduler also comes with a comprehensive set of tools for monitoring your cluster. These tools allow you to view your cluster of resources from any host in the cluster so that you know the dynamic resource usage of all your machines.

Job History

LSF JobScheduler maintains the full history data of all jobs. The history information tells you what has happened to your jobs.



[Contents] [Index] [Top] [Bottom] [Prev] [Next]


doc@platform.com

Copyright © 1994-1998 Platform Computing Corporation.
All rights reserved.