blob: cb3de01756074129db034383900e686b379aa452 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document>
<properties>
<title>Serving Large Products</title>
<author email="Sean.Kelly@jpl.nasa.gov">Sean Kelly</author>
</properties>
<body>
<section name="Serving Large Products">
<p>In the <a href="../qh/">last tutorial</a>, we created a query
handler and "installed" it in a product server. We could
query it for products (mathematical constants) using the
XMLQuery's postfix boolean stacks. The handler would return
results by embedding them in the returned XMLQuery. Now we'll
return larger products that live outside of the XMLQuery.
</p>
</section>
<section name="What's Large?">
<p>There's a <a
href="http://www.worldslargestthings.com/california/clam.htm">giant
clam</a> at Pismo Beach, a <a
href="http://www.worldslargestthings.com/kansas/cawkercity.htm">giant
ball of twine</a> in Kansas, and for those who drive SUVs, a
<a
href="http://www.worldslargestthings.com/missouri/gaspump.htm">giant
gas pump</a>. For the OODT framework, large is similarly hard to define.
</p>
<p>One of the original architects of the OODT framework thought
that putting a products result in with the query meant that
you'd never lose the separation between product and the query
that generated it. I'm not sure I see the value in that, but
regardless, it posed a practical challenge: an
<code>XMLQuery</code> object in memory with one or two large
results in it will exhaust the Java virtual machine's available
memory.
</p>
<p>It's even worse in when the XMLQuery is expressed as a
textual XML document. In this case, a binary product must be
encoded in a text format (we use <a
href="ftp://ftp.rfc-editor.org/in-notes/rfc2045.txt">Base64</a>),
making the XMLQuery in XML format even larger than as a Java
object. Moreover, those XML documents must be parsed at some
time to reconstitute them as Java objects. We use a DOM-based
parser, which holds the entire document in memory. Naturally,
things tend to explode at this rate.
</p>
<p>There is a way out of the quagmire, though. Instead of
writing a <code>QueryHandler</code>, write a
<code>LargeProductQueryHandler</code>. A
<code>QueryHandler</code> puts <code>Result</code> objects
into the <code>XMLQuery</code> which hold the entire product.
A <code>LargeProductQueryHandler</code> puts
<code>LargeResult</code> objects which hold <em>a reference to
the product</em>.
</p>
</section>
<section name="Large Handlers and Large Results">
<p>The OODT framework provides an extension to the
<code>QueryHandler</code> interface called
<code>jpl.eda.product.LargeProductQueryHandler</code>. This
interface adds two methods that you must implement:
</p>
<ul>
<li><code>retrieveChunk</code>. This method returns a byte
array representing a chunk of the product. The OODT
framework calls this method repeatedly to gather chunks of
the product for the product client. It takes a <em>product
ID</em> (a string) that identifies which product is being
retrieved. It also takes an byte offset into the product
data and a size of the byte chunk to return. You return the
matching chunk.
</li>
<li><code>close</code>. This method is called by the OODT
framework to tell the query handler it's done getting a
product. It takes a <em>product ID</em> that tells which
product is no longer being retrieved. You use this method
to perform any cleanup necessary.
</li>
</ul>
<p>Because it extends the <code>QueryHandler</code> interface,
you still have to implement the <code>query</code> method.
However, as a <code>LargeProductQueryHandler</code>, you can
add <code>LargeResult</code> objects to the
<code>XMLQuery</code> passed in. <code>LargeResult</code>s
identify the <em>product ID</em> (string) that the OODT
framework will later use when it calls
<code>retrieveChunk</code> and <code>close</code>.
</p>
<p>For example, suppose you're serving large images by
generating them from various other data sources:
</p>
<ol>
<li>The <code>query</code> method would examine the user's
query, consult the various data sources, and generate the
image, storing it in a temporary file. It would also assign
a string <em>product ID</em> to this file, use that product
ID in a <code>LargeResult</code> object, add the
<code>LargeResult</code> to the <code>XMLQuery</code>, and
return the modified <code>XMLQuery</code>.
</li>
<li>Shortly afterward, the OODT framework will repeatedly call
the <code>retrieveChunk</code> method. This method would
check the <em>product ID</em> passed in and locate the
corresponding temporary file generated earlier by the
<code>query</code> method. It would index into the file by
the offset requested by the framework, read the number of
bytes requested by the framework, package that up into a
byte array, and return it. Eventually, the OODT framework
will have read the entire product this way.
</li>
<li>Lastly, the OODT framework will call the
<code>close</code> method. This method would check the
<em>product ID</em> and locate and delete the temporary
file.
</li>
</ol>
<p>To put this into practice, let's create a
<code>LargeProductQueryHandler</code> that serves files out of
the product server's filesystem.
</p>
</section>
<section name="Writing the Handler">
<p>We'll develop a <code>FileHandler</code> that will serve
files out of the product server's filesystem. Providing
filesystem access through the OODT framework in this way is
probably not a very good idea (after all, product clients
could request copies of sensitive files), but for a
demonstration it'll do.
</p>
<p>Because files can be quite large, we'll use a
<code>LargeProductQueryHandler</code>. It will serve queries
of the form
</p>
<p><code>file = <var>path</var></code></p>
<p>where <var>path</var> is the full path of the file the user
wants. The handler will add <code>LargeResult</code>s to the
XMLQuery, and the <em>product ID</em> will just simply be the
<var>path</var> of the requested file. The
<code>retrieveChunk</code> method will open the file with the
given product ID (which is just the path to the file) and
return a block of data out of it. The <code>close</code>
method won't need to do anything, since we're not creating
temporary files or making network conncetions or anything;
there's just nothing to clean up.
</p>
<subsection name="Getting the Path">
<p>First, let's create a utility method that takes the
<code>XMLQuery</code> and returns a <code>java.io.File</code>
that matches the requested file. Because the query takes the form
</p>
<p><code>file = <var>path</var></code></p>
<p>there should be three <code>QueryElement</code>s on the "where" stack:</p>
<ol>
<li>The zeroth (topmost) has role = <code>elemName</code>
and value = <code>file</code>.
</li>
<li>The first (middle) has role = <code>LITERAL</code> and
value = the <var>path</var> of the file the user wants.
</li>
<li>The last (bottom) has role = <code>RELOP</code> and
value = <code>EQ</code>.
</li>
</ol>
<p>We'll reject any other query by returning <code>null</code>
from this method. Further, if the file named by the
<var>path</var> doesn't exist, or if it's not a file (for
example, it's a directory or a socket), we'll return <code>null</code>.
</p>
<p>Here's the start of our <code>FileHandler.java</code>:</p>
<source>import java.io.File;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;
public class FileHandler
implements LargeProductQueryHandler {
private static File getFile(XMLQuery q) {
List stack = q.getWhereElementSet();
if (stack.size() != 3) return null;
QueryElement e = (QueryElement) stack.get(0);
if (!"elemName".equals(e.getRole())
|| !"file".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(2);
if (!"RELOP".equals(e.getRole())
|| !"EQ".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(1);
if (!"LITERAL".equals(e.getRole()))
return null;
File file = new File(e.getValue());
if (!file.isFile()) return null;
return file;
}
}</source>
</subsection>
<subsection name="Checking the MIME Type">
<p>Recall that the user can say what MIME types of products
are acceptable by specifying the preference list in the
XMLQuery. This lets a product server that serves, say,
video clips, convert them to <code>video/mpeg</code>
(MPEG-2), <code>video/mpeg4-generic</code> (MPEG-4),
<code>video/quicktime</code> (Apple Quicktime), or some
other format, in order to better serve its clients.
</p>
<p>Since our product server just serves <em>files of any
format</em>, we won't really bother with the list of
acceptable MIME types. After all, the
<code>/etc/passwd</code> file <em>could</em> be a JPEG
image on some systems. (Yes, we could go through the
extra step of determining the MIME type of a file by
looking at its extension or its contents, but this is an
OODT tutorial, not a something-else-tutorial!)
</p>
<p>However, we will honor the user's wishes by labeling the
result's MIME type based on what the user specifies in the
acceptable MIME type list. So, if the product client says
that <code>image/jpeg</code> is acceptable and the file is
<code>/etc/passwd</code>, we'll call
<code>/etc/passwd</code> a JPEG image. However, we won't
try to read the client's mind: if the user wants
<code>image/*</code>, then we'll just say it's a binary
file, <code>application/octet-stream</code>.
</p>
<p>Here's the code:</p>
<source>import java.util.Iterator;
...
public class FileHandler
implements LargeProductQueryHandler {
...
private static String getMimeType(XMLQuery q) {
for (Iterator i = q.getMimeAccept().iterator();
i.hasNext();) {
String t = (String) i.next();
if (t.indexOf('*') == -1) return t;
}
return "application/octet-stream";
}
}</source>
</subsection>
<subsection name="Inserting the Result">
<p>Once we've got the file that the user wants and the MIME
type to call it, all we have to do is insert the
<code>LargeResult</code>. Remember that it's the
<code>LargeResult</code> that tells the OODT framework what
the <em>product ID</em> is for later
<code>retrieveChunk</code> and <code>close</code> calls.
The <em>product ID</em> is passed as the first argument to
the <code>LargeResult</code> constructor.
</p>
<p>We'll write a utility method to insert the <code>LargeResult</code>:</p>
<source>import java.io.IOException;
import java.util.Collections;
import jpl.eda.xmlquery.LargeResult;
...
public class FileHandler
implements LargeProductQueryHandler {
...
private static void insert(File file, String type,
XMLQuery q) throws IOException {
String id = file.getCanonicalPath();
long size = file.length();
LargeResult lr = new LargeResult(id, type,
/*profileID*/null, /*resourceID*/null,
/*headers*/Collections.EMPTY_LIST, size);
q.getResults().add(lr);
}
}</source>
</subsection>
<subsection name='Handling the Query'>
<p>With our three utility methods in hand, writing the
required <code>query</code> method is a piece of cake. Here
it is:
</p>
<source>import jpl.eda.product.ProductException;
...
public class FileHandler
implements LargeProductQueryHandler {
...
public XMLQuery query(XMLQuery q)
throws ProductException {
try {
File file = getFile(q);
if (file == null) return q;
String type = getMimeType(q);
insert(file, type, q);
return q;
} catch (IOException ex) {
throw new ProductException(ex);
}
}
}</source>
<p>The <code>query</code> method as defined by the
<code>QueryHandler</code> interface (and extended into the
<code>LargeProductQueryHandler</code> interface) is allowed
to throw only one kind of checked exception:
<code>ProductException</code>. So, in case the
<code>insert</code> method throws an
<code>IOException</code>, we transform it into a
<code>ProductException</code>.
</p>
<p>Now there are just two more required methods to implement,
<code>retrieveChunk</code> and <code>close</code>.
</p>
</subsection>
<subsection name='Blowing Chunks'>
<p>The OODT framework repeatedly calls handler's
<code>retrieveChunk</code> method to get chunks of the
product, evenutally getting the entire product (unless the
product client decides to abort the transfer). For our file
handler, retrieve chunk just has to
</p>
<ol>
<li>Make sure the file specified by the <em>product ID</em>
still exists (after all, it could be deleted at any time,
even before the first <code>retrieveChunk</code> got
called).
</li>
<li>Open the file.</li>
<li>Skip into the file by the requested offset.</li>
<li>Read the requested number of bytes out of the file.</li>
<li>Return those bytes.</li>
<li>Close the file.</li>
</ol>
<p>We'll write a quick little <code>skip</code> method to skip
into a file's input stream:
</p>
<source>private static void skip(long offset,
InputStream in) throws IOException {
while (offset > 0)
offset -= in.skip(offset);
}</source>
<p>And here's another little utility method to read a
specified number of bytes out of a file's input stream:
</p>
<source>private static byte[] read(int length,
InputStream in) throws IOException {
byte[] buf = new byte[length];
int numRead;
int index = 0;
int toRead = length;
while (toRead > 0) {
numRead = in.read(buf, index, toRead);
index += numRead;
toRead -= numRead;
}
return buf;
}</source>
<p>(By now, you're probably wondering why we just didn't use
<code>java.io.RandomAccessFile</code>; I'm wondering that
too!)</p>
<p>Finally, we can implement the required
<code>retrieveChunk</code> method:
</p>
<source>import java.io.BufferedInputStream;
import java.io.FileInputStream;
...
public class FileHandler
implements LargeProductQueryHandler {
...
public byte[] retrieveChunk(String id, long offset,
int length) throws ProductException {
BufferedInputStream in = null;
try {
File f = new File(id);
if (!f.isFile()) throw new ProductException(id
+ " isn't a file (anymore?)");
in = new BufferedInputStream(new FileInputStream(f));
skip(offset, in);
byte[] buf = read(length, in);
return buf;
} catch (IOException ex) {
throw new ProductException(ex);
} finally {
if (in != null) try {
in.close();
} catch (IOException ignore) {}
}
}
}</source>
</subsection>
<subsection name='Closing Up'>
<p>Because the OODT framework has no idea what data sources a
<code>LargeProductQueryHandler</code> will eventually
consult, what temporary files it may need to clean up, what
network sockets it might need to shut down, and so forth, it
needs some way to indicate to a query handler that's it's
done calling <code>retrieveChunk</code> for a certain
<em>product ID</em>. The <code>close</code> method does this.
</p>
<p>In our example, <code>close</code> doesn't need to do
anything, but we are obligated to implement it:
</p>
<source>...
public class FileHandler
implements LargeProductQueryHandler {
...
public void close(String id) {}
}</source>
</subsection>
<subsection name='Complete Source Code'>
<p>Here's the complete source file, <code>FileHandler.java</code>:</p>
<source>import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.product.ProductException;
import jpl.eda.xmlquery.LargeResult;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;
public class FileHandler
implements LargeProductQueryHandler {
private static File getFile(XMLQuery q) {
List stack = q.getWhereElementSet();
if (stack.size() != 3) return null;
QueryElement e = (QueryElement) stack.get(0);
if (!"elemName".equals(e.getRole())
|| !"file".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(2);
if (!"RELOP".equals(e.getRole())
|| !"EQ".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(1);
if (!"LITERAL".equals(e.getRole()))
return null;
File file = new File(e.getValue());
if (!file.isFile()) return null;
return file;
}
private static String getMimeType(XMLQuery q) {
for (Iterator i = q.getMimeAccept().iterator();
i.hasNext();) {
String t = (String) i.next();
if (t.indexOf('*') == -1) return t;
}
return "application/octet-stream";
}
private static void insert(File file, String type,
XMLQuery q) throws IOException {
String id = file.getCanonicalPath();
long size = file.length();
LargeResult lr = new LargeResult(id, type,
/*profileID*/null, /*resourceID*/null,
/*headers*/Collections.EMPTY_LIST, size);
q.getResults().add(lr);
}
public XMLQuery query(XMLQuery q)
throws ProductException {
try {
File file = getFile(q);
if (file == null) return q;
String type = getMimeType(q);
insert(file, type, q);
return q;
} catch (IOException ex) {
throw new ProductException(ex);
}
}
private static void skip(long offset,
InputStream in) throws IOException {
while (offset > 0)
offset -= in.skip(offset);
}
private static byte[] read(int length,
InputStream in) throws IOException {
byte[] buf = new byte[length];
int numRead;
int index = 0;
int toRead = length;
while (toRead > 0) {
numRead = in.read(buf, index, toRead);
index += numRead;
toRead -= numRead;
}
return buf;
}
public byte[] retrieveChunk(String id, long offset,
int length) throws ProductException {
BufferedInputStream in = null;
try {
File f = new File(id);
if (!f.isFile()) throw new ProductException(id
+ " isn't a file (anymore?)");
in = new BufferedInputStream(new FileInputStream(f));
skip(offset, in);
byte[] buf = read(length, in);
return buf;
} catch (IOException ex) {
throw new ProductException(ex);
} finally {
if (in != null) try {
in.close();
} catch (IOException ignore) {}
}
}
public void close(String id) {}
}</source>
</subsection>
</section>
<section name='Compiling the Code'>
<p>We'll compile this code using the J2SDK command-line tools,
but if you're more comfortable with some kind of Integrated
Development Environment (IDE), adjust as necessary.
</p>
<p>Let's go back again to the <code>$PS_HOME</code> directory we
made earlier; create the file
<code>$PS_HOME/src/FileHandler.java</code> with the contents
shown above. Then, compile and update the jar file as follows:
</p>
<source>% <b>javac -extdirs lib \
-d classes src/FileHandler.java</b>
% <b>ls -l classes</b>
total 8
-rw-r--r-- 1 kelly kelly 2524 25 Feb 15:46 ConstantHandler.class
-rw-r--r-- 1 kelly kelly 3163 26 Feb 16:15 FileHandler.class
% <b>jar -uf lib/my-handlers.jar \
-C classes FileHandler.class</b>
% <b>jar -tf lib/my-handlers.jar</b>
META-INF/
META-INF/MANIFEST.MF
ConstantHandler.class
FileHandler.class</source>
<p>We've now got a jar with the <code>ConstantHandler</code>
from the <a href="../qh/">last tutorial</a> and our new
<code>FileHandler</code>.
</p>
</section>
<section name='Specifying and Running the New Query Handler'>
<p>The <code>$PS_HOME/bin/ps</code> script already has a system
property specifying the <code>ConstantHandler</code>, so we
just need to add the <code>FileHandler</code> to that list.
</p>
<p>First, stop the product server by hitting CTRL+C (or your
interrupt key) in the window in which it's currently running.
Then, modify the <code>$PS_HOME/bin/ps</code> script to read
as follows:
</p>
<source>#!/bin/sh
exec java -Djava.ext.dirs=$PS_HOME/lib \
-Dhandlers=ConstantHandler,FileHandler \
jpl.eda.ExecServer \
jpl.eda.product.rmi.ProductServiceImpl \
urn:eda:rmi:MyProductService</source>
<p>Then start the server by running
<code>$PS_HOME/bin/ps</code>. If all goes well, the product
server will be ready to answer queries again, this time
passing each incoming <code>XMLQuery</code> to <em>two</em>
different query handlers.
</p>
<p>Edit the <code>$PS_HOME/bin/pc</code> script once more to
make sure the <code>-out</code> and not the <code>-xml</code>
command-line argument is being used. Let's try querying for a
file:
</p>
<source>% <b>$PS_HOME/bin/pc "file = /etc/passwd"</b>
nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
...</source>
<p>If you like, you can change the <code>-out</code> to
<code>-xml</code> again and examine the XML version. This
time, the product data isn't in the XMLQuery object.
</p>
</section>
<section name="What's the Difference?">
<p>On the client side, the interface to get product results in
<code>LargeResult</code>s versus regular <code>Result</code>s
is identical. The client calls <code>getInputStream</code> to
get a binary stream to read the product data.
</p>
<p>There is a speed penalty for large results. What
<code>Result.getInputStream</code> returns is an input stream
to product data already contained in the XMLQuery. It's a
stream to a buffer already in the client's address space, so
it's nice and fast.
</p>
<p><code>LargeResult</code> overrides the
<code>getInputStream</code> method to instead return an input
stream that repeatedly makes calls back to the product
server's <code>retrieveChunk</code> method. Since the product
is <em>not</em> already in the local address space of the
client, getting large products is a bit slower. To
compensate, the input stream actually starts a background
thread to start retrieving chunks of the product ahead of the
product client, up to a certain point (we don't want to run
out of memory again).
</p>
<p>On the server side, the difference is in programming
complexity. Creating a <code>LargeProductQueryHandler</code>
requires implementing three methods instead of just one. You
may have to clean up temporary files, close network ports, or
do other cleanup. You may even have to guard against clients
that present specially-crafted product IDs that try to
circumvent access controls to products.
</p>
<p><code>LargeResult</code>s are more general, and will work for
any size product, from zero bytes on up. And you can even mix
and match: a <code>LargeProductQueryHandler</code> can add
regular <code>Result</code>s to an XMLQuery as well as
<code>LargeResult</code>s. You might program some logic that,
under a certain threshold, to return regular
<code>Result</code>s for small sized products, and
<code>LargeResult</code>s for anything bigger than small.
</p>
</section>
<section name='Conclusion'>
<p>In this tutorial, we implemented a
<code>LargeProductQueryHandler</code> that served large
products. In this case, large could mean zero bytes (empty
products) up to gargantuan numbers of bytes. This handler
queried for files in the product server's filesystem, which is
a bit insecure so you might want to terminate the product
server as soon as possible. We also learned that what the
advantages and disadvantages were between regular product
results and large product results, and that
<code>LargeProductQueryHandler</code>s can use
<code>LargeResult</code> objects in addition to regular
<code>Result</code> objects.
</p>
<p>If you've also completed the <a href="../ps">Your First
Product Service</a> tutorial and the <a
href="../qh/">Developing a Query Handler</a> tutorial, you
are now a master of the OODT Product Service.
Congratulations!
</p>
</section>
</body>
</document>