<?xml version="1.0" encoding="UTF-8"?> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ref_guide_slice"> | |
<title> | |
Distributed Persistence | |
</title> | |
<para> | |
The standard JPA runtime environment works with a <emphasis>single</emphasis> | |
database instance. OpenJPA can be extended via plug-in to work with | |
multiple databases within the same transaction without any change to the | |
existing application. This capability of OpenJPA for distributed | |
database environment is called <emphasis>Slice</emphasis> and is explained in | |
the following sections. | |
</para> | |
<section id="slice_overview"> | |
<title>Overview</title> | |
<para> | |
Enterprise applications are increasingly deployed for distributed database | |
environments. The reasons for distributed, often horizontally-partitioned | |
database environment can be to counter massive data growth, to | |
support multiple external clients on a hosted platform or many other | |
practical scenarios that can benefit from data partitioning. | |
</para> | |
<para> | |
Any JPA-based user application has to address serious technical and conceptual | |
challenges to directly interact with a set of physical databases | |
within a single transaction. | |
Slice encapsulates the complexity of distributed database environment | |
via the abstraction of <emphasis>virtual</emphasis> database which internally | |
manages multiple physical databases. We refer each physical database instance | |
as <emphasis>slice</emphasis>. | |
<emphasis>Virtualization</emphasis> of distributed databases | |
makes OpenJPA object management kernel and | |
the user application to work in the same way as in the case of a single physical | |
database. | |
</para> | |
</section> | |
<section id="Features and Limitations"> | |
<title>Salient Features</title> | |
<section><title>Transparency</title> | |
<para> | |
The existing application or the persistent domain model requires | |
<emphasis>no change</emphasis> to upgrade from a single database | |
to a distributed database environment. | |
</para> | |
</section> | |
<section><title>Custom Distribution Policy</title> | |
<para> | |
User application decides how the newly persistent instances be | |
distributed across the database slices. The data | |
distribution policy across the slices may be based on the attribute | |
of the data itself. For example, all Customer whose first name begins with | |
character 'A' to 'M' will be stored in one slice while names | |
beginning with 'N' to 'Z' will be stored in another slice. | |
</para> | |
<para> | |
This custom data distribution policy is specified by implementing | |
<classname>org.apache.openjpa.slice.DistributionPolicy</classname> | |
interface by the user application. | |
</para> | |
<para> | |
Slice tracks the original database for existing instances. When | |
an application issues a query, the resultant instances can be loaded | |
from different slices. This tracking is important as subsequent | |
update to any of these instances is committed to the appropriate | |
original database slice. | |
</para> | |
<note> | |
<para> | |
You can find the original slice of an instance <code>pc</code> by | |
the static utility method | |
<methodname>SlicePersistence.getSlice(pc)</methodname>. | |
This method returns the slice identifier string associated with the | |
given <emphasis>managed</emphasis> instance. If the instance is not | |
being managed then the method return null because any unmanaged or | |
detached instance is not associated with any slice. | |
</para> | |
</note> | |
<para> | |
<warning>Currently, there is no provision for migrating an | |
existing instance from one slice to another. | |
</warning> | |
</para> | |
</section> | |
<section><title>Heterogeneous Database</title> | |
<para> | |
Each slice can be configured independently with its own JDBC | |
driver and other connection parameters. Hence the target database | |
environment can constitute of heterogeneous databases. | |
</para> | |
</section> | |
<section><title>Parallel Execution</title> | |
<para> | |
All database operations such as query, commit or flush operates | |
in parallel across the database slices. The execution threading | |
policy is configurable. | |
</para> | |
</section> | |
<section><title>Distributed Query</title> | |
<para> | |
The queries are executed across all slices and the results are | |
merged into a single list. The query result that includes | |
<code>ORDER BY</code> clause are sorted correctly by merging | |
results from each individual slice. | |
</para> | |
The queries that specify an aggregate projection such as | |
<code>COUNT()</code>, <code>MAX()</code>, <code>MIN()</code> | |
and <code>SUM()</code> | |
are correctly evaluated <emphasis>only if</emphasis> they | |
return a single result. | |
<para> | |
</para> | |
<para> | |
<warning> | |
The aggregate operation <code>AVG()</code> is not supported. | |
</warning> | |
</para> | |
</section> | |
<section><title>Targeted Query</title> | |
<para> | |
You can target the query only to a subset of slices rather than | |
all slices by setting a <emphasis>hint</emphasis>. The hint key | |
<code>openjpa.hint.slice.Target</code> is set on any query and | |
hint value is | |
comma-separated list of slice identifiers. The following | |
example shows how to target a query only to slice <code>"One"</code> | |
<programlisting> | |
<![CDATA[EntityManager em = ...; | |
em.getTransaction().begin(); | |
String hint = "openjpa.hint.slice.Target"; | |
Query query = em.createQuery("SELECT p FROM PObject").setHint(hint, "One"); | |
List result = query.getResultList(); | |
// verify that each instance is originaing from the given slice | |
for (Object pc : result) { | |
String sliceOrigin = SlicePersistence.getSlice(pc); | |
assertTrue ("One", sliceOrigin); | |
} | |
]]> | |
</programlisting> | |
</para> | |
</section> | |
<section><title>Distributed Transaction</title> | |
<para> | |
The database slices participate in a global transaction provided | |
each slice is configured with a XA-complaint JDBC driver, even | |
when the persistence unit is configured for <code>RESOURCE_LOCAL</code> | |
transaction. | |
</para> | |
<para> | |
<warning> | |
If any of the configured slices is not XA-complaint <emphasis>and</emphasis> | |
the persistence unit is configured for <code>RESOURCE_LOCAL</code> | |
transaction then each slice is committed without any two-phase | |
commit protocol. If commit on any slice fails, then atomic nature of | |
the transaction is not ensured. | |
</warning> | |
</para> | |
</section> | |
<section id="collocation_constraint"><title>Collocation Constraint</title> | |
<para> | |
No relationship can exist across database slices. In O-R mapping parlance, | |
this condition translates to the limitation that the closure of an object graph must be | |
<emphasis>collocated</emphasis> in the same database. | |
For example, consider a domain model where Person relates to Adress. | |
Person X refers to Address A while Person Y refers to Address B. | |
Collocation Constraint means that <emphasis>both</emphasis> X and A | |
must be stored in the same | |
database slice. Similarly Y and B must be stored in a single slice. | |
</para> | |
<para> | |
Slice, however, helps to maintain collocation constraint automatically. | |
The instances in the closure set of any newly persistent instance | |
reachable via cascaded relationship is stored in the same slice. | |
The user-defined distribution policy requires to supply the slice | |
for the root instance only. | |
</para> | |
</section> | |
</section> | |
<section id="slice_configuration"> | |
<title>Usage</title> | |
<para> | |
Slice is activated via the following property settings: | |
</para> | |
<section> | |
<title>How to activate Slice Runtime?</title> | |
<para> | |
The basic configuration property is | |
<programlisting> | |
<![CDATA[ <property name="openjpa.BrokerFactory" value="slice"/>]]> | |
</programlisting> | |
This critical configuration activates a specialized factory class aliased | |
as <code>slice</code> to create object management kernel that | |
can work against multiple databases. | |
</para> | |
</section> | |
<section> | |
<title>How to configure each database slice?</title> | |
<para> | |
Each database slice is identified by a logical name unique within a | |
persistent unit. The list of the slices is specified by | |
<code>openjpa.slice.Names</code> property. | |
For example, specify three slices named <code>"One"</code>, | |
<code>"Two"</code> and <code>"Three"</code> as follows: | |
<programlisting> | |
<![CDATA[ <property name="openjpa.slice.Names" value="One, Two, Three"/>]]> | |
</programlisting> | |
</para> | |
<para> | |
This property is not mandatory. If this property is not specified then | |
the configuration is scanned for logical slice names. Any property | |
<code>"abc"</code> of the form <code>openjpa.slice.XYZ.abc</code> will | |
register a slice with logical | |
name <code>"XYZ"</code>. | |
</para> | |
<para> | |
The order of the names is significant when no <code>openjpa.slice.Master</code> | |
property is not specified. Then the persistence unit is scanned to find | |
all configured slice names and they are ordered alphabetically. | |
</para> | |
<para> | |
Each database slice properties can be configured independently. | |
For example, the | |
following configuration will register two slices with logical name | |
<code>One</code> and <code>Two</code>. | |
<programlisting> | |
<![CDATA[<property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"/> | |
<property name="openjpa.slice.Two.ConnectionURL" value="jdbc:mysql:localhost//slice2"/>]]> | |
</programlisting> | |
</para> | |
<para> | |
Any OpenJPA specific property can be configured per slice basis. | |
For example, the following configuration will use two different JDBC | |
drivers for slice <code>One</code> and <code>Two</code>. | |
<programlisting> | |
<![CDATA[<property name="openjpa.slice.One.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> | |
<property name="openjpa.slice.Two.ConnectionDriverName" value="com.mysql.jdbc.jdbc2.optional.MysqlXADataSource"/>]]> | |
</programlisting> | |
</para> | |
<para> | |
Any property if unspecified for a particular slice will be defaulted by | |
corresponding OpenJPA property. For example, consider following three slices | |
<programlisting> | |
<![CDATA[<property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"/> | |
<property name="openjpa.slice.Two.ConnectionURL" value="jdbc:mysql:localhost//slice2"/> | |
<property name="openjpa.slice.Three.ConnectionURL" value="jdbc:oracle:localhost//slice3"/> | |
<property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> | |
<property name="openjpa.slice.Three.ConnectionDriverName" value="oracle.jdbc.Driver"/>]]> | |
</programlisting> | |
In this example, <code>Three</code> will use slice-specific | |
<code>oracle.jdbc.Driver</code> driver while slice | |
<code>One</code> and <code>Two</code> will use | |
the driver <code>com.mysql.jdbc.Driver</code> as | |
specified by <code>openjpa.ConnectionDriverName</code> | |
property value. | |
</para> | |
</section> | |
<section id="distribution_policy"> | |
<title>Implement DistributionPolicy interface</title> | |
<para> | |
Slice needs to determine which slice will persist a new instance. | |
The application can only decide this policy (for example, | |
all PurchaseOrders before April 30 goes to slice <code>One</code>, | |
all the rest goes to slice <code>Two</code>). This is why | |
the application has to implement | |
<code>org.apache.openjpa.slice.DistributionPolicy</code> and | |
specify the implementation class in configuration | |
<programlisting> | |
<![CDATA[ <property name="openjpa.slice.DistributionPolicy" value="com.acme.foo.MyOptimialDistributionPolicy"/>]]> | |
</programlisting> | |
</para> | |
<para> | |
The interface <code>org.apache.openjpa.slice.DistributionPolicy</code> | |
is simple with a single method. The complete listing of the | |
documented interface follows: | |
<programlisting> | |
<![CDATA[ | |
public interface DistributionPolicy { | |
/** | |
* Gets the name of the slice where a given instance will be stored. | |
* | |
* @param pc The newly persistent or to-be-merged object. | |
* @param slices name of the configured slices. | |
* @param context persistence context managing the given instance. | |
* | |
* @return identifier of the slice. This name must match one of the | |
* configured slice names. | |
* @see DistributedConfiguration#getSliceNames() | |
*/ | |
String distribute(Object pc, List<String> slices, Object context); | |
} | |
]]> | |
</programlisting> | |
</para> | |
<para> | |
While implementing a distribution policy the most important thing to | |
remember is <link linkend="collocation_constraint">collocation constraint</link>. | |
Because Slice can not establish or query any cross-database relationship, all the | |
related instances must be stored in the same database slice. | |
Slice can determine the closure of a root object by traversal of | |
cascaded relationships. Hence user-defined policy has to only decide the | |
database for the root instance that is the explicit argument to | |
<methodname>EntityManager.persist()</methodname> call. | |
Slice will ensure that all other related instances that gets persisted by cascade | |
is assigned to the same database slice as that of the root instance. | |
However, the user-defined distribution policy must return the | |
same slice identifier for the instances that are logically related but | |
not cascaded for persist. | |
</para> | |
</section> | |
<section> | |
</section> | |
</section> | |
<title>Configuration Properties</title> | |
<para> | |
The properties to configure Slice can be classified in two broad groups. | |
The <emphasis>global</emphasis> properties apply to all the slices, for example, | |
the thread pool used to execute the queries in parallel or the transaction | |
manager used to coordinate transaction across multiple slices. | |
The <emphasis>per-slice</emphasis> properties apply to individual slice, for example, | |
the JDBC connection URL of a slice. | |
</para> | |
<section> | |
<title>Global Properties</title> | |
<section> | |
<title>openjpa.slice.DistributionPolicy</title> | |
<para> | |
This <emphasis>mandatory</emphasis> plug-in property determines how newly | |
persistent instances are distributed across individual slices. | |
The value of this property is a fully-qualified class name that implements | |
<ulink url="../javadoc/org/apache/openjpa/slice/DistributionPolicy.html"> | |
<classname>org.apache.openjpa.slice.DistributionPolicy</classname> | |
</ulink> interface. | |
</para> | |
</section> | |
<section><title>openjpa.slice.Lenient</title> | |
<para> | |
This boolean plug-in property controls the behavior when one or more slice | |
can not be connected or unavailable for some other reasons. | |
If <code>true</code>, the unreachable slices are ignored. If | |
<code>false</code> then any unreachable slice will raise an exception | |
during startup. | |
</para> | |
<para> | |
By default this value is set to <code>false</code> i.e. all configured | |
slices must be available. | |
</para> | |
</section> | |
<section> | |
<title>openjpa.slice.Master</title> | |
<para> | |
This plug-in property can be used to identify the name of the master slice. | |
Master slice is used when a primary key is to be generated from a database sequence. | |
</para> | |
<para> | |
By default the master slice is the first slice in the list of configured slice names. | |
</para> | |
<para> | |
<warning> | |
Currently, there is no provision to use sequence from | |
multiple database slices. | |
</warning> | |
</para> | |
</section> | |
<section> | |
<title>openjpa.slice.Names</title> | |
<para> | |
This plug-in property can be used to register the logical slice names. | |
The value of this property is comma-separated list of slice names. | |
The ordering of the names in this list is | |
<emphasis>significant</emphasis> because | |
<link linkend="distribution_policy">DistributionPolicy</link> receives | |
the input argument of the slice names in the same order. | |
</para> | |
<para> | |
If logical slice names are not registered explicitly via this property, | |
then all logical slice names available in the persistence unit are | |
registered. The ordering of the slice names in this case is alphabetical. | |
</para> | |
<para> | |
If logical slice names are registered explicitly via this property, then | |
any logical slice that is available in the persistence unit but excluded | |
from this list is ignored. | |
</para> | |
</section> | |
<section> | |
<title>openjpa.slice.ThreadingPolicy</title> | |
<para> | |
This plug-in property determines the nature of thread pool being used | |
for database operations such as query or flush on individual slices. | |
The value of the property is a | |
fully-qualified class name that implements | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ExecutorService.html"> | |
<classname>java.util.concurrent.ExecutorService</classname> | |
</ulink> interface. | |
Two pre-defined pools can be chosen via their aliases namely | |
<code>fixed</code> or <code>cached</code>. | |
</para> | |
<para> | |
The pre-defined alias <code>cached</code> activates a | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()">cached thread pool</ulink>. | |
A cached thread pool creates new threads as needed, but will reuse | |
previously constructed threads when they are available. This pool | |
is suitable in scenarios that execute many short-lived asynchronous tasks. | |
The way Slice uses the thread pool to execute database operations is | |
akin to such scenario and hence <code>cached</code> is the default | |
value for this plug-in property. | |
</para> | |
<para> | |
The <code>fixed</code> alias activates a | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)">fixed thread pool</ulink>. | |
The fixed thread pool can be further parameterized with | |
<code>CorePoolSize</code>, <code>MaximumPoolSize</code>, | |
<code>KeepAliveTime</code> and <code>RejectedExecutionHandler</code>. | |
The meaning of these parameters are described in | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html">JavaDoc</ulink>. | |
The users can exercise finer control on thread pool behavior via these | |
parameters. | |
By default, the core pool size is <code>10</code>, maximum pool size is | |
also <code>10</code>, keep alive time is <code>60</code> seconds and | |
rejected execution is | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.AbortPolicy.html">aborted</ulink>. | |
</para> | |
<para> | |
Both of the pre-defined aliases can be parameterized with a fully-qualified | |
class name that implements | |
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadFactory.html"> | |
<classname>java.util.concurrent.ThreadFactory</classname> | |
</ulink> interface. | |
</para> | |
</section> | |
<section> | |
<title>openjpa.slice.TransactionPolicy</title> | |
<para> | |
This plug-in property determines the policy for transaction commit | |
across multiple slices. The value of this property is a fully-qualified | |
class name that implements | |
<ulink url="http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/transaction/TransactionManager.html"> | |
<classname>javax.transaction.TransactionManager</classname> | |
</ulink> interface. | |
</para> | |
<para> | |
Three pre-defined policies can be chosen | |
by their aliases namely <code>default</code>, | |
<code>xa</code> and <code>jndi</code>. | |
</para> | |
<para> | |
The <code>default</code> policy employs | |
a Transaction Manager that commits or rolls back transaction on individual | |
slices <emphasis>without</emphasis> a two-phase commit protocol. | |
It does <emphasis>not</emphasis> | |
guarantee atomic nature of transaction across all the slices because if | |
one or more slice fails to commit, there is no way to rollback the transaction | |
on other slices that committed successfully. | |
</para> | |
<para> | |
The <code>xa</code> policy employs a Transaction Manager that that commits | |
or rolls back transaction on individual | |
slices using a two-phase commit protocol. The prerequisite to use this scheme | |
is, of course, that all the slices must be configured to use | |
XA-complaint JDBC driver. | |
</para> | |
<para> | |
The <code>jndi</code> policy employs a Transaction Manager by looking up the | |
JNDI context. The prerequisite to use this transaction | |
manager is, of course, that all the slices must be configured to use | |
XA-complaint JDBC driver. | |
<warning>This JNDI based policy is not available currently.</warning> | |
</para> | |
</section> | |
</section> | |
<section> | |
<title>Per-Slice Properties</title> | |
<para> | |
Any OpenJPA property can be configured for each individual slice. The property name | |
is of the form <code>openjpa.slice.[Logical slice name].[OpenJPA Property Name]</code>. | |
For example, <code>openjpa.slice.One.ConnectionURL</code> where <code>One</code> | |
is the logical slice name and <code>ConnectionURL</code> is a OpenJPA property | |
name. | |
</para> | |
<para> | |
If a property is not configured for a specific slice, then the value for | |
the property equals to the corresponding <code>openjpa.*</code> property. | |
</para> | |
</section> | |
</chapter> | |