blob: 28bd6c711e3588676b0015f84059827a10c54555 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
Licensed to the Apache Software Foundation (ASF) under one or more contributor
license agreements. See the NOTICE.txt file distributed with this work for
additional information regarding copyright ownership. The ASF licenses this
file to you under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
<title>CAS PGE Basic Developer Guide</title>
<author email="">Brian Foster</author>
<author email="">Rishi Verma</author>
<section name="Introduction">
This is the developer guide for the Apache OODT Catalog and Archive Service (CAS)
Program Generation Executable (PGE) component, or CAS-PGE for short. This guide
explains the CAS-PGE architecture as well as its tailorable extension
<p>The remainder of this guide is separated into the following sections:</p>
<li><a href="#section1">Project Description</a></li>
<li><a href="#section2">Architecture</a></li>
<li><a href="#section3">Extension Points</a></li>
<a name="section1"/>
<section name="Project Description">
<p>In order to fully understand the CAS-PGE component, it is helpful to have a solid grasp
of the CAS Workflow component. If you need some background on CAS Workflow, please
see our <a href="../../workflow/development/developer.html">CAS Workflow Developer Guide</a>.
With CAS Workflow in mind, it is often the case that CAS Workflow is used as part of a data
processing system - where
workflows are responsible for controlling the run order of different Product Generation
Executables (PGEs). In circumstances like this, CAS-PGE can help wrap a PGE as part of a
CAS Workflow. One can think of a PGE as a piece of code, which given a set of inputs,
generates output files. Thus, CAS-PGE is designed to help accomplish the most common actions
required to run PGEs: ie. finding their input files, executing the PGE, and saving their output files.
CAS-PGE performs some of these actions by interacting with a second CAS component as well:
CAS File Manager. The CAS File Manager can be part of this type of workflow-based data processing
system, which manages data files, and can support metadata-filtering queries across those
files to allow for fast retrieval. In other words, CAS File Manger complements CAS-PGE by
supporting file cataloging for files involved in PGE operations. </p>
<p>In summary, CAS-PGE's role is to provide tools for
encapsulating PGEs; however, it also seeks to leverage and make the use of other CAS
components to support the aforementioned goal.</p>
<a name="section2"/>
<section name="Architecture">
<a name="section3"/>
<section name="Extension Points">
<p>PGEs usually need a method by which information is given to them on how
to run, what to run with (i.e. input files), and where to place the
output files as well as what to name them. CAS-PGE accomplishes this, and other tasks,
by making use of customizable extension points.
<p>The following is a description of the most common extension points</p>
<li><b>SciPgeConfigFileWriter</b> - writes configuration files for describing how a
PGE will run, with which input files it will run with, and where the output will be placed</li>
<li><b>PcsMetFileWriter</b> - controls which metadata should be sent to the CAS File
Manager (with each output file) for ingestion</li>
<li><b>PGETaskInstance</b> - an extensible module which performs the most generic
and common actions required by typical PGEs. This module makes getting started with a
default PGE configuration simple.</li>
<li><b>PgeConfigBuilder</b> - builds a PgeConfig object, which has the ability to
control how a CAS-PGE will run</li>
<p>The relationship between these extension-points and other CAS-PGE components
is described in the below figure. </p>
<p><img src="../images/pge_instance_plugin_points.png"
alt="Extension Points"/></p>
<subsection name="Runtime Execution">
In terms of runtime execution, CAS-PGE makes use of two mediums to configure how a
PGE will run: metadata and a
PgeConfig object. Using these two pieces of information, CAS-PGE can configure how
many configuration files it should generate, which SciPgeConfigFileWriter(s) to use to create
these configuration files, which output files need which PcsMetFileWriter to generate their
metadata for CAS File Manager ingestion, how to run the PGE, which CAS File
Manager to talk to, etc. For the first medium (metadata), there is a set of reserved metadata
fields that CAS-PGE expects, which affects the way CAS-PGE runs (i.e. which CAS
File Manager to ingest to). For the second medium (PgeConfig), the PgeConfigBuilder builds up a
PgeConfig object, which can also control how CAS-PGE runs.