blob: 1a679b81f1109996d0a8d32b6ab3e475a2f24f4b [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more contributor
license agreements. See the NOTICE.txt file distributed with this work for
additional information regarding copyright ownership. The ASF licenses this
file to you under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
-->
<document>
<properties>
<title>CAS File Manager Developer Guide</title>
<author email="Chris.Mattmann@jpl.nasa.gov">Chris Mattmann</author>
<author email="woollard@jpl.nasa.gov">Dave Woollard</author>
</properties>
<body>
<section name="Introduction">
<p>
This is the developer guide for the Apache OODT Catalog and Archive Service (CAS)
File Manager component, or File Manager for short. Primarily, this guide
will explain the File Manager architecture and interfaces, including its
tailorable extension points. For information on installation, configuration,
and examples, please see our <a href="../user/basic.html">User Guides.</a>
<p>The remainder of this guide is separated into the following sections:</p>
<ul>
<li><a href="#section1">Project Description</a></li>
<li><a href="#section2">Architecture</a></li>
<li><a href="#section3">Extension Points</a></li>
<li><a href="#section4">Current Extension Point Implementations</a></li>
</ul>
</p>
</section>
<a name="section1"/>
<section name="Project Description">
<p>The File Manager component is responsible for tracking, ingesting and moving
file data and metadata between a client system and a server system. The File
Manager is an extensible software component that provides an XML-RPC external
interface, and a fully tailorable Java-based API for file management.</p>
</section>
<a name="section2"/>
<section name="Architecture">
<p>In this section, we will describe the architecture of the File Manager,
including its constituent components, object model, and key capabilities.</p>
<subsection name="Components">
<p>The major components of the File Manager are the Client and Server, the
Repository Manager, the Catalog, the Validation Layer, the Versioner, and
the Transferer. The relationship between all of these components are shown in
the diagram below:</p>
<p><img src="../images/fm_extension_points.png" alt="File Manager Architecture"/></p>
<p>The File Manager Server contains both a Repository that manages products
(and the products' location in the archive as specified by Versioner), and a
Catalog that validates metadata via the Validation Layer. Transfer of data
products from the Client to the Server is the domain of the Transfer and can
be initiated at either the Client or the Server.</p>
</subsection>
<subsection name="Object Model">
<p>The critical objects managed by the File Manager include:</p>
<ul>
<li><strong>Products</strong> - Collections of one or more files, and their
associated Metadata.</li>
<li><strong>Metadata</strong> - A map of key->multiple values of descriptive
information about a Product. See
<a href="../../metadata/">CAS-Metadata</a> for more
information on Metadata.</li>
<li><strong>Reference</strong> - A pointer to a Product file's (or files')
original location, and to its final resting location within the archive
constructed by the File Manager.</li>
<li><strong>Product Type</strong> - Descriptive information about a Product
that includes what type of file URI generation scheme to use, the root
repository location for a particular Product, and a description of the
Product.</li>
<li><strong>Element</strong> - A singular Metadata element, such as "Author",
or "Creator". Elements may have additional metadata, in the form of the
associated definition and even a corresponding
<a href="http://dublincore.org/">Dublin Core</a> attribute. See
<a href="../../metadata/">CAS-Metadata</a> for more
information on Metadata Elements.</li>
<li><strong>Versioner</strong> - A URI generation scheme for Product Types
that defines the location within the archive (built by the File Manager)
where a file belonging to a Product (that belongs to the associated Product
Type) should be placed.</li>
</ul>
<p>Each Product contains 1 or more References, and one Metadata object. Each
Product is a member of a single Product Type. The Metadata collected for each
Product is defined by a mapping of Product Type->1...* Elements. Each Product
Type has an associated Versioner. These relationships are shown in the below
figure.</p>
<img src="../images/fm_object_model.png" alt="File Manager Object Model"/>
</subsection>
<subsection name="Key Capabilities">
<p>The File manager has been designed with a new of key capabilities in mind.
These capabilities include:</p>
<p><strong>Easy management of different types of Products.</strong> The
Repository Manager extension point is responsible for managing Product Types,
and their associated information. Management of Product Types includes adding
new types, deleting and updating existing types, and retrieving Product Type
Objects, by their ID or by their name.</p>
<p><strong>Support for different kinds of back end catalogs.</strong> The
Catalog extension point allows Product instance metadata and file location
information to be stored in different types of back end data stores quite
easily. Existing implementations of the Catalog interface include a JDBC based
back end database, along with a flat-file index powered by
<a href="http://lucene.apache.org/">Lucene.</a></p>
<p><strong>Management of Product instance information.</strong> Management
includes adding, deleting and updating product instance information, including
file locations (References), along with Product Metadata. It also includes
retrieving Metadata and References associated with existing Products as well
as obtaining the Products themselves.</p>
<p><strong>Element management for Metadata.</strong> The File Manager's
Validation Layer extension point allows for the management of Element policy
information in different types of back end stores. For instance, Element policy
could be stored in XML files, a Database, or a Metadata Registry.</p>
<p><strong>Data transfer mechanism interface.</strong> By having
an extension point for Data Transfer, the File Manager can support different
Data Transfer protocols, both local and remote.</p>
<p><strong>Advanced support for File Repository layouts.</strong>
The Versioner extension point allows for different File Repository layouts
based on Product Types.</p>
<p><strong>Support for multiple Product structures.</strong> The File Manager
Client allows for Products to be Flat, or Hierarchical-based. Flat products
are collections of singular files that are aggregated together to make a
Product. Hierarchical Products are Products that contain collections of
directories, and sub-directories, and files.</p>
<p><strong>Design for scalability.</strong> The File Manager uses the popular
client-server paradigm, allowing new File Manager servers to be instantiated,
as needed, without affecting the File Manager clients, and vice-versa.</p>
<p><strong>Standard communication protocols.</strong> The File Manager uses
XML-RPC as its main external interface between the File Manager client and
server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses
the underlying HTTP protocol for data transfer.</p>
<p><strong>RSS-based Product syndication.</strong> The File Manager web
interface allows for the RSS-based syndication of Product feeds based on
Product Type.</p>
<p><strong>Data transfer status tracking.</strong> The File Manager tracks
all current Product and File transfers and even publishes an RSS-feed of
existing transfers.</p>
<p>This capability set is not exhaustive, and is meant to give the user a
<i>feel</i> for what general features are provided by the File Manager. Most
likely the user will find that the File Manager provides many other
capabilities besides those described here.</p>
</subsection>
</section>
<a name="section3"/>
<section name="Extension Points">
<p>We have constructed the File Manager making use of the <i>factory
method pattern</i> to provide multiple extension points for the File Manager. An
extension point is an interface within the File Manager that can have many
implementations. This is particularly useful when it comes to software
component configuration because it allows different implementations of
an existing interface to be selected at deployment time.</p>
<div class="info">The factory method pattern is a creational pattern common to
object oriented design. Each File Manager extension point involves the
implementation of two interfaces: an <i>extension factory</i> and an
<i>extension</i> implementation. At run-time, the File Manager loads a
properties file specifies a factory class to use during extension point
instantiation. For example, the File Manager may communicate with a
database-based Catalog and an XML-based Element Store (called a Validation
Layer), or it may use a Lucene-based Catalog and a database-based Validation
Layer.</div>
<p>Using extension points, it is fairly simple to support many different types
of what are typically referred to as "plug-in architectures." Each of the core
extension points for the File Manager is described below:</p>
<table>
<tr>
<td>Catalog</td>
<td>The Catalog extension point is responsible for storing all the
instance data for Products, Metadata, and for file References.
Additionally, the Catalog provides a query capability for Products.
</td>
</tr>
<tr>
<td>Data Transfer</td>
<td>The Data Transfer extension point allows for the movement of a
Product to and from the archive managed by the File Manager component.
Different protocols for Data Transfer may include local (disk-based)
copy, or remote XML-RPC based transfer across networked machines.
</td>
</tr>
<tr>
<td>Repository Manager</td>
<td>The Repository Manager extension point provides a means for
managing all of the policy information (i.e., the Product Types and
their associated information) for Products managed by the File Manager.
</td>
</tr>
<tr>
<td>Validation Layer</td>
<td>The Validation Layer extension point allows for the querying of
element definitions associated with a particular Product Type. The
extension point also maps Product Type to Elements.
</td>
</tr>
<tr>
<td>Versioning</td>
<td>The Versioning extension point allows for the definition of
different URI generation schemes that define the final resting location
of files for a particular Product.
</td>
</tr>
<tr>
<td>System</td>
<td>The extension point that provides the external interface to the
File Manager services. This includes the File Manager server interface,
as well as the associated File Manager client interface, that
communicates with the server.
</td>
</tr>
</table>
</section>
<a name="section4"/>
<section name="Current Extension Point Implementations">
<p>There are at least two implementations of all of the aforementioned
extension points for the File Manager. Each extension point implementation
is detailed in this section.</p>
<subsection name="Catalog">
<ul>
<li><strong>Data Source based Catalog.</strong> An implementation of
the Catalog extension point interface that uses a JDBC accessible database
backend.</li>
<li><strong>Lucene based Catalog.</strong> An implementation of the
Catalog extension point interface that uses the
<a href="http://lucene.apache.org/">Lucene</a> free text index system to
store Product instance information.</li>
</ul>
</subsection>
<subsection name="Data Transfer">
<ul>
<li><strong>Local Data Transfer.</strong> An implementation of the Data
Transfer interface that uses Apache's
<a href="http://jakarta.apache.org/commons-io/">commons-io</a> to perform
local, disk based filesystem data transfer. This implementation also
supports locally accessible Network File System (NFS) disks.</li>
<li><strong>Remote Data Transfer.</strong> An implementation of the Data
Transfer interface that uses the XML-RPC File Manager client to transfer
files to a remote XML-RPC File Manager server.</li>
<li><strong>InPlace Data Transfer.</strong> An implementation of the Data
Transfer interface that avoids transfering any products -- this can be used
in the situation where metadata about a particular product should be
recorded, but no physical transfer needs to occur.</li>
</ul>
</subsection>
<subsection name="Repository Manager">
<ul>
<li><strong>Data Source based Repository Manager.</strong> An
implementation of the Repository Manager extension point that stores
Product Type policy information in a JDBC accessible database.</li>
<li><strong>XML based Repository Manager.</strong> An implementation of
the Repository Manager extension point that stores Product Type policy
information in an XML file called <code>product-types.xml</code></li>
</ul>
</subsection>
<subsection name="Validation Layer">
<ul>
<li><strong>Data Source based Validation Layer.</strong> An
implementation of the Validation Layer extension point that stores
Element policy information in a JDBC accessible database.</li>
<li><strong>XML based Validation Layer.</strong> An implementation of
the Validation Layer extension point that stores Element policy
information in 2 XML files called <code>elements.xml</code> and
<code>product-type-element-map.xml</code></li>
</ul>
</subsection>
<subsection name="System (File Manager client and File Manager server)">
<ul>
<li><strong>XML-RPC based File Manager server.</strong> An
implementation of the external server interface for the File Manager that
uses XML-RPC as the transportation medium.</li>
<li><strong>XML-RPC based File Manager client.</strong> An
implementation of the client interface for the XML-RPC File Manager
server that uses XML-RPC as the transportation medium.</li>
</ul>
</subsection>
</section>
<section name="Use Cases">
<p>
The File Manager was built to support several of the above capabilities outlined in
Section 3. In particular there were several use cases that we wanted to support, some
of which are described below.
</p>
<img src="../images/fm_use_case1.png" alt="File Manager Ingest Use Case"/>
<p>The red numbers in the above Figure correspond to a sequence of steps that occurs and a
series of interactions between the different File Manager extension points in order to
perform the file ingestion activity. In Step 1, a File Manager client is invoked for the
ingest operation, which sends Metadata and References for a particular Product to ingest
to the File Manager server’s System Interface extension point. The System Interface uses
the information about Product Type policy made available by the Repository Manager in order
to understand whether or not the product should be transferred, where it’s root repository
path should be, and so on. The System Interface then catalogs the file References and Metadata
using the Catalog extension point. During this catalog process, the Catalog extension point
uses the Validation Layer to determine which Elements should be extracted for the particular
Product, based upon its Product Type. After that, Data Transfer is initiated either at the
client or server end, and the first step to Data Transfer is using the Product’s associated
Versioner to generate final file References. After final file References have been determined,
the file data is transferred by the server or by the client, using the Data Transfer extension
point.</p>
</section>
<section name="Conclusion">
<p>The aim of this document is to provide information relevant to developers
about the CAS File Manager. Specifically, this document has described the File
Manager's architecture, including its constituent components, object model and
key capabilities. Additionally, the this document provides an overview of the
current implementations of the File Manager's extension points.</p>
<p>In the <a href="../user/basic.html">Basic User Guide</a> and
<a href="../user/advanced.html">Advanced User Guide</a>, we will cover topics
like installation, configuration, and example uses as well as advanced topics
like scaling and other tips and tricks.</p>
</section>
</body>
</document>