| ////////////////////// |
| * Copyright (c) 2010, Rickard Öberg. All Rights Reserved. |
| * |
| * Licensed under the Apache License, Version 2.0 (the "License"); |
| * you may not use this file except in compliance with the License. |
| * You may obtain a copy of the License at |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * Unless required by applicable law or agreed to in writing, |
| * software distributed under the License is distributed on an |
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, |
| * either express or implied. |
| * See the License for the specific language governing permissions |
| * and limitations under the License. |
| ////////////////////// |
| |
| [[howto-use-io,Use I/O API]] |
| = Use I/O API = |
| |
| NOTE: This article was written on Rickard Öberg's blog, 6 Nov 2010 |
| |
| |
| The past week I've had to deal with a lot of data shuffling, both in raw form as bytes and strings, and as SPI and |
| domain level objects. What struck me is that it is notoriously hard to shuffle things from one place to another in a |
| way that is scalable, performant and handles errors correctly. And I had to do some things over and over again, like |
| reading strings from files. |
| |
| So the thought occurred: there must be a general pattern to how this thing works, which can be extracted and put into a |
| library. "Reading lines from a text file" should only have to be done once, and then used in whatever scenario requires |
| it. Let's take a look at a typical example of reading from one file and writing to another to see if we can find out |
| what the possible pieces could be: |
| |
| [source,java] |
| ------------- |
| 1: File source = new File( getClass().getResource( "/iotest.txt" ).getFile() ); |
| 1: File destination = File.createTempFile( "test", ".txt" ); |
| 1: destination.deleteOnExit(); |
| 2: BufferedReader reader = new BufferedReader(new FileReader(source)); |
| 3: long count = 0; |
| 2: try |
| 2: { |
| 4: BufferedWriter writer = new BufferedWriter(new FileWriter(destination)); |
| 4: try |
| 4: { |
| 2: String line = null; |
| 2: while ((line = reader.readLine()) != null) |
| 2: { |
| 3: count++; |
| 4: writer.append( line ).append( '\n' ); |
| 2: } |
| 4: writer.close(); |
| 4: } catch (IOException e) |
| 4: { |
| 4: writer.close(); |
| 4: destination.delete(); |
| 4: } |
| 2: } finally |
| 2: { |
| 2: reader.close(); |
| 2: } |
| 1: System.out.println(count) |
| ------------- |
| |
| As the numbers to the left indicates, I've identified four parts in this type of code that could be separated from |
| each other. |
| |
| 1) is the client code that initiates a transfer, and which have to know the input and output source. |
| |
| 2) is the code that reads lines from an input. |
| |
| 3) is helper code that I use to keep track of what's going on, and which I'd like to reuse no matter what kind of |
| transfer is being done. |
| |
| 4) receives the data and writes it down. In this code, if I wanted to implement batching on the read and write side I |
| could do so by changing the 2 and 4 parts to read/write multiple lines at a time. |
| |
| == The API == |
| |
| If you want to reproduce what's explained in this tutorial, remember to depend on the Core Runtime artifact that depends |
| on Core API, Core SPI, Core Bootstrap and Core Functional & I/O APIs: |
| |
| include::../../../../core/runtime/build/docs/buildinfo/artifact.txt[] |
| |
| See the <<howto-depend-on-zest>> tutorial for details. |
| |
| Once theses parts were identified it was mostly just a matter of putting interfaces on these pieces, and making sure |
| they can be easily used in many different situations. The result is as follows. |
| |
| To start with we have Input: |
| |
| [snippet,java] |
| -------------- |
| source=core/io/src/main/java/org/qi4j/io/Input.java |
| tag=input |
| ------------- |
| |
| Inputs, like Iterables, can be used over and over again to initiate transfers of data from one place to another, in |
| this case an Output. Since I want this to be generic the type of things that is sent is T, so can be anything |
| (byte[], String, EntityState, MyDomainObject). I also want the sender and receiver of data to be able to throw their |
| own exceptions, and this is marked by declaring these as generic exception types. For example, the input may want to |
| throw SQLException and the output IOException, if anything goes wrong. This should be strongly typed, and both sender |
| and receiver must know when either side screws up, so that they can recover properly and close any resources they have |
| opened. |
| |
| On the receiving side we then have Output: |
| |
| [snippet,java] |
| -------------- |
| source=core/io/src/main/java/org/qi4j/io/Output.java |
| tag=output |
| ------------- |
| |
| When receiveFrom is invoked by an Input, as a result of invoking transferTo on the Input, the Output should open |
| whatever resources it needs to write to, and then expect data to come from a Sender. Both the Input and Output must |
| have the same type T, so that they agree on what is being sent. We will see later how this can be handled if this is |
| not the case. |
| |
| Next we have Sender: |
| |
| [snippet,java] |
| -------------- |
| source=core/io/src/main/java/org/qi4j/io/Sender.java |
| tag=sender |
| ------------- |
| |
| The Output invokes sendTo and passes in a Receiver that the Sender will use to send individual items. The sender at |
| this point can start transferring data of type T to the receiver, one at a time. The Receiver looks like this: |
| |
| [snippet,java] |
| -------------- |
| source=core/io/src/main/java/org/qi4j/io/Receiver.java |
| tag=receiver |
| ------------- |
| |
| When the receiver gets the individual items from the sender it can either immediately write them to its underlying |
| resource, or batch them up. Since the receiver will know when the transfer is done (sendTo returns) it can write the |
| remaining batches properly and close any resource it holds. |
| |
| This simple pattern, with two interfaces on the sending side and two on the receiving side, gives us the potential to |
| do scalable, performant and fault-tolerant transfers of data. |
| |
| == Standard Inputs and Outputs == |
| |
| So now that the above API defines the contract of sending and receiving data, I can then create a couple of standard |
| inputs and outputs. Let's say, reading lines of text from a file, and writing lines of text to a file. These |
| implementations I can then put in static methods so they are easy to use. In the end, to make a copy of a text file |
| looks like this: |
| |
| [snippet,java] |
| -------------- |
| source=manual/src/main/java/org/qi4j/manual/recipes/io/Docs.java |
| tag=filter |
| -------------- |
| |
| One line of code that handles the reading, the writing, the cleaning up, buffering, and whatnot. Pretty nifty! The |
| transferTo method will throw IOException, which I can catch if I want to present any errors to the user. But actually |
| dealing with those errors, i.e. closing the files and potentially deleting the destination if the transfer failed, is |
| already handled by the Input and Output. I will never have to deal with the details of reading text from a file ever |
| again! |
| |
| == Intercepting the transfer == |
| |
| While the above handles the basic input/output of a transfer, there are usually other things that I want to do as well. |
| I may want to count how many items were transferred, do some filtering, or log every 1000 items or so to see what's |
| going on. Since input and output are now separated this becomes simply a matter of inserting something in the middle |
| that mediates the input and output. Since many of these mediations have a similar character I can put these into |
| standard utility methods to make them easier to use. |
| |
| The first standard decorator is a filter. I will implement this by means of supplying a Specification: |
| |
| [source,java] |
| -------------- |
| public static <T,ReceiverThrowableType extends Throwable> Output<T, ReceiverThrowableType> filter( final Specification<T> specification, final Output<T, ReceiverThrowableType> output) |
| { |
| ... create an Output that filters items based on the Specification<T> ... |
| } |
| -------------- |
| |
| Where Specification is: |
| |
| [source,java] |
| -------------- |
| interface Specification<T> |
| { |
| boolean test(T item); |
| } |
| -------------- |
| |
| With this simple construct I can now perform transfers and easily filter out items I don't want on the receiving side. |
| This example removes empty lines from a text file. |
| |
| [source,java] |
| -------------- |
| File source = ... |
| File destination = ... |
| Inputs.text( source ).transferTo( Transforms.filter(new Specification<String>() |
| { |
| public boolean test(String string) |
| { |
| return string.length() != 0; |
| } |
| }, Outputs.text(destination) ); |
| -------------- |
| |
| |
| The second common operation is mapping from one type to the other. This deals with the case that one Input you have may |
| not match the Output you want to send to, but there's a way to map from the input type to the output type. An example |
| would be to map from String to JSONObject, for example. The operation itself looks like this: |
| |
| [source,java] |
| -------------- |
| public static <From,To,ReceiverThrowableType extends Throwable> Output<From, ReceiverThrowableType> map( Function<From,To> function, Output<To, ReceiverThrowableType> output) |
| -------------- |
| |
| Where Function is defined as: |
| |
| [source,java] |
| -------------- |
| interface Function<From, To> |
| { |
| To map(From from); |
| } |
| -------------- |
| |
| With this I can then connect an Input of Strings to an Output of JSONObject like so: |
| |
| [source,java] |
| -------------- |
| Input<String,IOException> input = ...; |
| Output<JSONObject,RuntimeException> output = ...; |
| input.transferTo(Transforms.map(new String2JSON(), output); |
| -------------- |
| |
| Where String2JSON implements Function and it's map method converts the String into a JSONObject. |
| |
| At this point we can now deal with the last part of the initial example, the counting of items. This can be implemented |
| as a generic Map that has the same input and output type, and just maintains a count internally that updates on every |
| call to map(). The example can then be written as: |
| |
| [source,java] |
| -------------- |
| File source = ... |
| File destination = ... |
| Counter<String> counter = new Counter<String>(); |
| Inputs.text( source ).transferTo( Transforms.map(counter, Outputs.text(destination) )); |
| System.out.println("Nr of lines:"+counter.getCount()) |
| -------------- |
| |
| == Usage in the Zest™ SPI == |
| |
| Now I can finally get back to my initial problem that led me to look into this: how to implement a good way to access |
| EntityStates in a Zest™ EntityStore, and perform restores of backups. The current version of accessing EntityStates look |
| like this: |
| |
| [source,java] |
| -------------- |
| <ThrowableType extends Throwable> void visitEntityStates( EntityStateVisitor<ThrowableType> visitor, ModuleSPI module ) |
| throws ThrowableType; |
| |
| interface EntityStateVisitor<ThrowableType extends Throwable> |
| { |
| void visitEntityState( EntityState entityState ) |
| throws ThrowableType; |
| } |
| -------------- |
| |
| This can now be replaced with: |
| |
| [source,java] |
| -------------- |
| Input<EntityState, EntityStoreException> entityStates(ModuleSPI module); |
| -------------- |
| |
| Because of the pattern outlined above, users of this will get more information about what's happening in the traversal, |
| such as if the EntityStore raised an EntityStoreException during the traversal, which they can then handle gracefully. |
| It also becomes easy to add decorators such as maps and filters to users of this. Let's say you only are interested in |
| EntityState's of a given type. Then add a filter for this, without changing the consumer. |
| |
| For importing backup data into an EntityStore, the interface used to look like this: |
| |
| [source,java] |
| -------------- |
| interface ImportSupport |
| { |
| ImportResult importFrom( Reader in ) |
| throws IOException; |
| } |
| -------------- |
| |
| This ties the EntityStore to only being able to read JSON lines from Reader's, the client will not know if the |
| IOException raised is due to errors in the Reader or writing in the store, and the ImportResult, which contains a list |
| of exceptions and count of stuff, is quite ugly to create and use. With the I/O API at hand this can now be replaced |
| with: |
| |
| [source,java] |
| -------------- |
| interface ImportSupport |
| { |
| Output<String,EntityStoreException> importJSON(); |
| } |
| -------------- |
| |
| |
| To use this, given the helpers outlined above, is as simple as: |
| |
| [source,java] |
| -------------- |
| File backup = ... |
| ImportSupport entityStore = ... |
| Inputs.text(backup).transferTo(entityStore.importJSON()); |
| -------------- |
| |
| If the client wants any "extras", such as counting how many objects were imported, this can be done by adding filters |
| as previously shown. If you only want to, say, import entities modified before a particular date (let's say you know |
| some junk was introduced after a given time), then add a specification filter that performs this check. And so on. |
| |
| == Conclusion == |
| |
| It is quite common while developing software that you have to shuffle data or objects from one input to another output, |
| possible with some transformations in the middle. Usually these things have to be done from scratch, which opens up for |
| errors and badly applied patterns. By introducing a generic Input/Output API that encapsulates and separates these |
| things properly it becomes easier to perform these tasks in a scalable, performant and error-free way, and while still |
| allowing these tasks to be decorated with extra features when needed. |
| |
| This article has outlined one way to do this, and the API and helpers that I've described are available in the current |
| Zest™ Core 1.3-SNAPSHOT in Git (see Zest™ homepage for access details). The idea is to start using it throughout Zest |
| wherever we need to do I/O of the type described here. |