blob: 7eaecf68c8205243b79eb9722d4bf3a52a95b5ba [file] [log] [blame]
//////////////////////
* Copyright (c) 2010, Rickard Öberg. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
* either express or implied.
* See the License for the specific language governing permissions
* and limitations under the License.
//////////////////////
[[howto-use-io,Use I/O API]]
= Use I/O API =
NOTE: This article was written on Rickard Öberg's blog, 6 Nov 2010
The past week I've had to deal with a lot of data shuffling, both in raw form as bytes and strings, and as SPI and
domain level objects. What struck me is that it is notoriously hard to shuffle things from one place to another in a
way that is scalable, performant and handles errors correctly. And I had to do some things over and over again, like
reading strings from files.
So the thought occurred: there must be a general pattern to how this thing works, which can be extracted and put into a
library. "Reading lines from a text file" should only have to be done once, and then used in whatever scenario requires
it. Let's take a look at a typical example of reading from one file and writing to another to see if we can find out
what the possible pieces could be:
[source,java]
-------------
1: File source = new File( getClass().getResource( "/iotest.txt" ).getFile() );
1: File destination = File.createTempFile( "test", ".txt" );
1: destination.deleteOnExit();
2: BufferedReader reader = new BufferedReader(new FileReader(source));
3: long count = 0;
2: try
2: {
4: BufferedWriter writer = new BufferedWriter(new FileWriter(destination));
4: try
4: {
2: String line = null;
2: while ((line = reader.readLine()) != null)
2: {
3: count++;
4: writer.append( line ).append( '\n' );
2: }
4: writer.close();
4: } catch (IOException e)
4: {
4: writer.close();
4: destination.delete();
4: }
2: } finally
2: {
2: reader.close();
2: }
1: System.out.println(count)
-------------
As the numbers to the left indicates, I've identified four parts in this type of code that could be separated from
each other.
1) is the client code that initiates a transfer, and which have to know the input and output source.
2) is the code that reads lines from an input.
3) is helper code that I use to keep track of what's going on, and which I'd like to reuse no matter what kind of
transfer is being done.
4) receives the data and writes it down. In this code, if I wanted to implement batching on the read and write side I
could do so by changing the 2 and 4 parts to read/write multiple lines at a time.
== The API ==
If you want to reproduce what's explained in this tutorial, remember to depend on the Core Runtime artifact that depends
on Core API, Core SPI, Core Bootstrap and Core Functional & I/O APIs:
include::../../../../core/runtime/build/docs/buildinfo/artifact.txt[]
See the <<howto-depend-on-zest>> tutorial for details.
Once theses parts were identified it was mostly just a matter of putting interfaces on these pieces, and making sure
they can be easily used in many different situations. The result is as follows.
To start with we have Input:
[snippet,java]
--------------
source=core/io/src/main/java/org/qi4j/io/Input.java
tag=input
-------------
Inputs, like Iterables, can be used over and over again to initiate transfers of data from one place to another, in
this case an Output. Since I want this to be generic the type of things that is sent is T, so can be anything
(byte[], String, EntityState, MyDomainObject). I also want the sender and receiver of data to be able to throw their
own exceptions, and this is marked by declaring these as generic exception types. For example, the input may want to
throw SQLException and the output IOException, if anything goes wrong. This should be strongly typed, and both sender
and receiver must know when either side screws up, so that they can recover properly and close any resources they have
opened.
On the receiving side we then have Output:
[snippet,java]
--------------
source=core/io/src/main/java/org/qi4j/io/Output.java
tag=output
-------------
When receiveFrom is invoked by an Input, as a result of invoking transferTo on the Input, the Output should open
whatever resources it needs to write to, and then expect data to come from a Sender. Both the Input and Output must
have the same type T, so that they agree on what is being sent. We will see later how this can be handled if this is
not the case.
Next we have Sender:
[snippet,java]
--------------
source=core/io/src/main/java/org/qi4j/io/Sender.java
tag=sender
-------------
The Output invokes sendTo and passes in a Receiver that the Sender will use to send individual items. The sender at
this point can start transferring data of type T to the receiver, one at a time. The Receiver looks like this:
[snippet,java]
--------------
source=core/io/src/main/java/org/qi4j/io/Receiver.java
tag=receiver
-------------
When the receiver gets the individual items from the sender it can either immediately write them to its underlying
resource, or batch them up. Since the receiver will know when the transfer is done (sendTo returns) it can write the
remaining batches properly and close any resource it holds.
This simple pattern, with two interfaces on the sending side and two on the receiving side, gives us the potential to
do scalable, performant and fault-tolerant transfers of data.
== Standard Inputs and Outputs ==
So now that the above API defines the contract of sending and receiving data, I can then create a couple of standard
inputs and outputs. Let's say, reading lines of text from a file, and writing lines of text to a file. These
implementations I can then put in static methods so they are easy to use. In the end, to make a copy of a text file
looks like this:
[snippet,java]
--------------
source=manual/src/main/java/org/qi4j/manual/recipes/io/Docs.java
tag=filter
--------------
One line of code that handles the reading, the writing, the cleaning up, buffering, and whatnot. Pretty nifty! The
transferTo method will throw IOException, which I can catch if I want to present any errors to the user. But actually
dealing with those errors, i.e. closing the files and potentially deleting the destination if the transfer failed, is
already handled by the Input and Output. I will never have to deal with the details of reading text from a file ever
again!
== Intercepting the transfer ==
While the above handles the basic input/output of a transfer, there are usually other things that I want to do as well.
I may want to count how many items were transferred, do some filtering, or log every 1000 items or so to see what's
going on. Since input and output are now separated this becomes simply a matter of inserting something in the middle
that mediates the input and output. Since many of these mediations have a similar character I can put these into
standard utility methods to make them easier to use.
The first standard decorator is a filter. I will implement this by means of supplying a Specification:
[source,java]
--------------
public static <T,ReceiverThrowableType extends Throwable> Output<T, ReceiverThrowableType> filter( final Specification<T> specification, final Output<T, ReceiverThrowableType> output)
{
... create an Output that filters items based on the Specification<T> ...
}
--------------
Where Specification is:
[source,java]
--------------
interface Specification<T>
{
boolean test(T item);
}
--------------
With this simple construct I can now perform transfers and easily filter out items I don't want on the receiving side.
This example removes empty lines from a text file.
[source,java]
--------------
File source = ...
File destination = ...
Inputs.text( source ).transferTo( Transforms.filter(new Specification<String>()
{
public boolean test(String string)
{
return string.length() != 0;
}
}, Outputs.text(destination) );
--------------
The second common operation is mapping from one type to the other. This deals with the case that one Input you have may
not match the Output you want to send to, but there's a way to map from the input type to the output type. An example
would be to map from String to JSONObject, for example. The operation itself looks like this:
[source,java]
--------------
public static <From,To,ReceiverThrowableType extends Throwable> Output<From, ReceiverThrowableType> map( Function<From,To> function, Output<To, ReceiverThrowableType> output)
--------------
Where Function is defined as:
[source,java]
--------------
interface Function<From, To>
{
To map(From from);
}
--------------
With this I can then connect an Input of Strings to an Output of JSONObject like so:
[source,java]
--------------
Input<String,IOException> input = ...;
Output<JSONObject,RuntimeException> output = ...;
input.transferTo(Transforms.map(new String2JSON(), output);
--------------
Where String2JSON implements Function and it's map method converts the String into a JSONObject.
At this point we can now deal with the last part of the initial example, the counting of items. This can be implemented
as a generic Map that has the same input and output type, and just maintains a count internally that updates on every
call to map(). The example can then be written as:
[source,java]
--------------
File source = ...
File destination = ...
Counter<String> counter = new Counter<String>();
Inputs.text( source ).transferTo( Transforms.map(counter, Outputs.text(destination) ));
System.out.println("Nr of lines:"+counter.getCount())
--------------
== Usage in the Zest™ SPI ==
Now I can finally get back to my initial problem that led me to look into this: how to implement a good way to access
EntityStates in a Zest™ EntityStore, and perform restores of backups. The current version of accessing EntityStates look
like this:
[source,java]
--------------
<ThrowableType extends Throwable> void visitEntityStates( EntityStateVisitor<ThrowableType> visitor, ModuleSPI module )
throws ThrowableType;
interface EntityStateVisitor<ThrowableType extends Throwable>
{
void visitEntityState( EntityState entityState )
throws ThrowableType;
}
--------------
This can now be replaced with:
[source,java]
--------------
Input<EntityState, EntityStoreException> entityStates(ModuleSPI module);
--------------
Because of the pattern outlined above, users of this will get more information about what's happening in the traversal,
such as if the EntityStore raised an EntityStoreException during the traversal, which they can then handle gracefully.
It also becomes easy to add decorators such as maps and filters to users of this. Let's say you only are interested in
EntityState's of a given type. Then add a filter for this, without changing the consumer.
For importing backup data into an EntityStore, the interface used to look like this:
[source,java]
--------------
interface ImportSupport
{
ImportResult importFrom( Reader in )
throws IOException;
}
--------------
This ties the EntityStore to only being able to read JSON lines from Reader's, the client will not know if the
IOException raised is due to errors in the Reader or writing in the store, and the ImportResult, which contains a list
of exceptions and count of stuff, is quite ugly to create and use. With the I/O API at hand this can now be replaced
with:
[source,java]
--------------
interface ImportSupport
{
Output<String,EntityStoreException> importJSON();
}
--------------
To use this, given the helpers outlined above, is as simple as:
[source,java]
--------------
File backup = ...
ImportSupport entityStore = ...
Inputs.text(backup).transferTo(entityStore.importJSON());
--------------
If the client wants any "extras", such as counting how many objects were imported, this can be done by adding filters
as previously shown. If you only want to, say, import entities modified before a particular date (let's say you know
some junk was introduced after a given time), then add a specification filter that performs this check. And so on.
== Conclusion ==
It is quite common while developing software that you have to shuffle data or objects from one input to another output,
possible with some transformations in the middle. Usually these things have to be done from scratch, which opens up for
errors and badly applied patterns. By introducing a generic Input/Output API that encapsulates and separates these
things properly it becomes easier to perform these tasks in a scalable, performant and error-free way, and while still
allowing these tasks to be decorated with extra features when needed.
This article has outlined one way to do this, and the API and helpers that I've described are available in the current
Zest™ Core 1.3-SNAPSHOT in Git (see Zest™ homepage for access details). The idea is to start using it throughout Zest
wherever we need to do I/O of the type described here.