blob: e9b76f918490d4b150bcd3b57d5452724efff553 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="ref_guide_slice">
<title>
Distributed Persistence
</title>
<para>
The standard JPA runtime environment works with a <emphasis>single</emphasis>
database instance. OpenJPA can be extended via plug-in to work with
multiple databases within the same transaction without any change to the
existing application. This capability of OpenJPA for distributed
database environment is called <emphasis>Slice</emphasis> and is explained in
the following sections.
</para>
<section id="slice_overview">
<title>Overview</title>
<para>
Enterprise applications are increasingly deployed for distributed database
environments. The reasons for distributed, often horizontally-partitioned
database environment can be to counter massive data growth, to
support multiple external clients on a hosted platform or many other
practical scenarios that can benefit from data partitioning.
</para>
<para>
Any JPA-based user application has to address serious technical and conceptual
challenges to directly interact with a set of physical databases
within a single transaction.
Slice encapsulates the complexity of distributed database environment
via the abstraction of <emphasis>virtual</emphasis> database which internally
manages multiple physical databases. We refer each physical database instance
as <emphasis>slice</emphasis>.
<emphasis>Virtualization</emphasis> of distributed databases
makes OpenJPA object management kernel and
the user application to work in the same way as in the case of a single physical
database.
</para>
</section>
<section id="Features and Limitations">
<title>Salient Features</title>
<section><title>Transparency</title>
<para>
The existing application or the persistent domain model requires
<emphasis>no change</emphasis> to upgrade from a single database
to a distributed database environment.
</para>
</section>
<section><title>Custom Distribution Policy</title>
<para>
User application decides how the newly persistent instances be
distributed across the database slices. The data
distribution policy across the slices may be based on the attribute
of the data itself. For example, all Customer whose first name begins with
character 'A' to 'M' will be stored in one slice while names
beginning with 'N' to 'Z' will be stored in another slice.
</para>
<para>
This custom data distribution policy is specified by implementing
<classname>org.apache.openjpa.slice.DistributionPolicy</classname>
interface by the user application.
</para>
<para>
Slice tracks the original database for existing instances. When
an application issues a query, the resultant instances can be loaded
from different slices. This tracking is important as subsequent
update to any of these instances is committed to the appropriate
original database slice.
</para>
<note>
<para>
You can find the original slice of an instance <code>pc</code> by
the static utility method
<methodname>SlicePersistence.getSlice(pc)</methodname>.
This method returns the slice identifier string associated with the
given <emphasis>managed</emphasis> instance. If the instance is not
being managed then the method return null because any unmanaged or
detached instance is not associated with any slice.
</para>
</note>
<para>
<warning>Currently, there is no provision for migrating an
existing instance from one slice to another.
</warning>
</para>
</section>
<section><title>Heterogeneous Database</title>
<para>
Each slice can be configured independently with its own JDBC
driver and other connection parameters. Hence the target database
environment can constitute of heterogeneous databases.
</para>
</section>
<section><title>Parallel Execution</title>
<para>
All database operations such as query, commit or flush operates
in parallel across the database slices. The execution threading
policy is configurable.
</para>
</section>
<section><title>Distributed Query</title>
<para>
The queries are executed across all slices and the results are
merged into a single list. The query result that includes
<code>ORDER BY</code> clause are sorted correctly by merging
results from each individual slice.
</para>
The queries that specify an aggregate projection such as
<code>COUNT()</code>, <code>MAX()</code>, <code>MIN()</code>
and <code>SUM()</code>
are correctly evaluated <emphasis>only if</emphasis> they
return a single result.
<para>
</para>
<para>
<warning>
The aggregate operation <code>AVG()</code> is not supported.
</warning>
</para>
</section>
<section><title>Targeted Query</title>
<para>
You can target the query only to a subset of slices rather than
all slices by setting a <emphasis>hint</emphasis>. The hint key
<code>openjpa.hint.slice.Target</code> is set on any query and
hint value is
comma-separated list of slice identifiers. The following
example shows how to target a query only to slice <code>"One"</code>
<programlisting>
<![CDATA[EntityManager em = ...;
em.getTransaction().begin();
String hint = "openjpa.hint.slice.Target";
Query query = em.createQuery("SELECT p FROM PObject").setHint(hint, "One");
List result = query.getResultList();
// verify that each instance is originaing from the given slice
for (Object pc : result) {
String sliceOrigin = SlicePersistence.getSlice(pc);
assertTrue ("One", sliceOrigin);
}
]]>
</programlisting>
</para>
</section>
<section><title>Distributed Transaction</title>
<para>
The database slices participate in a global transaction provided
each slice is configured with a XA-complaint JDBC driver, even
when the persistence unit is configured for <code>RESOURCE_LOCAL</code>
transaction.
</para>
<para>
<warning>
If any of the configured slices is not XA-complaint <emphasis>and</emphasis>
the persistence unit is configured for <code>RESOURCE_LOCAL</code>
transaction then each slice is committed without any two-phase
commit protocol. If commit on any slice fails, then atomic nature of
the transaction is not ensured.
</warning>
</para>
</section>
<section id="collocation_constraint"><title>Collocation Constraint</title>
<para>
No relationship can exist across database slices. In O-R mapping parlance,
this condition translates to the limitation that the closure of an object graph must be
<emphasis>collocated</emphasis> in the same database.
For example, consider a domain model where Person relates to Adress.
Person X refers to Address A while Person Y refers to Address B.
Collocation Constraint means that <emphasis>both</emphasis> X and A
must be stored in the same
database slice. Similarly Y and B must be stored in a single slice.
</para>
<para>
Slice, however, helps to maintain collocation constraint automatically.
The instances in the closure set of any newly persistent instance
reachable via cascaded relationship is stored in the same slice.
The user-defined distribution policy requires to supply the slice
for the root instance only.
</para>
</section>
</section>
<section id="slice_configuration">
<title>Usage</title>
<para>
Slice is activated via the following property settings:
</para>
<section>
<title>How to activate Slice Runtime?</title>
<para>
The basic configuration property is
<programlisting>
<![CDATA[ <property name="openjpa.BrokerFactory" value="slice"/>]]>
</programlisting>
This critical configuration activates a specialized factory class aliased
as <code>slice</code> to create object management kernel that
can work against multiple databases.
</para>
</section>
<section>
<title>How to configure each database slice?</title>
<para>
Each database slice is identified by a logical name unique within a
persistent unit. The list of the slices is specified by
<code>openjpa.slice.Names</code> property.
For example, specify three slices named <code>"One"</code>,
<code>"Two"</code> and <code>"Three"</code> as follows:
<programlisting>
<![CDATA[ <property name="openjpa.slice.Names" value="One, Two, Three"/>]]>
</programlisting>
</para>
<para>
This property is not mandatory. If this property is not specified then
the configuration is scanned for logical slice names. Any property
<code>"abc"</code> of the form <code>openjpa.slice.XYZ.abc</code> will
register a slice with logical
name <code>"XYZ"</code>.
</para>
<para>
The order of the names is significant when no <code>openjpa.slice.Master</code>
property is not specified. Then the persistence unit is scanned to find
all configured slice names and they are ordered alphabetically.
</para>
<para>
Each database slice properties can be configured independently.
For example, the
following configuration will register two slices with logical name
<code>One</code> and <code>Two</code>.
<programlisting>
<![CDATA[<property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"/>
<property name="openjpa.slice.Two.ConnectionURL" value="jdbc:mysql:localhost//slice2"/>]]>
</programlisting>
</para>
<para>
Any OpenJPA specific property can be configured per slice basis.
For example, the following configuration will use two different JDBC
drivers for slice <code>One</code> and <code>Two</code>.
<programlisting>
<![CDATA[<property name="openjpa.slice.One.ConnectionDriverName" value="com.mysql.jdbc.Driver"/>
<property name="openjpa.slice.Two.ConnectionDriverName" value="com.mysql.jdbc.jdbc2.optional.MysqlXADataSource"/>]]>
</programlisting>
</para>
<para>
Any property if unspecified for a particular slice will be defaulted by
corresponding OpenJPA property. For example, consider following three slices
<programlisting>
<![CDATA[<property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"/>
<property name="openjpa.slice.Two.ConnectionURL" value="jdbc:mysql:localhost//slice2"/>
<property name="openjpa.slice.Three.ConnectionURL" value="jdbc:oracle:localhost//slice3"/>
<property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/>
<property name="openjpa.slice.Three.ConnectionDriverName" value="oracle.jdbc.Driver"/>]]>
</programlisting>
In this example, <code>Three</code> will use slice-specific
<code>oracle.jdbc.Driver</code> driver while slice
<code>One</code> and <code>Two</code> will use
the driver <code>com.mysql.jdbc.Driver</code> as
specified by <code>openjpa.ConnectionDriverName</code>
property value.
</para>
</section>
<section id="distribution_policy">
<title>Implement DistributionPolicy interface</title>
<para>
Slice needs to determine which slice will persist a new instance.
The application can only decide this policy (for example,
all PurchaseOrders before April 30 goes to slice <code>One</code>,
all the rest goes to slice <code>Two</code>). This is why
the application has to implement
<code>org.apache.openjpa.slice.DistributionPolicy</code> and
specify the implementation class in configuration
<programlisting>
<![CDATA[ <property name="openjpa.slice.DistributionPolicy" value="com.acme.foo.MyOptimialDistributionPolicy"/>]]>
</programlisting>
</para>
<para>
The interface <code>org.apache.openjpa.slice.DistributionPolicy</code>
is simple with a single method. The complete listing of the
documented interface follows:
<programlisting>
<![CDATA[
public interface DistributionPolicy {
/**
* Gets the name of the slice where a given instance will be stored.
*
* @param pc The newly persistent or to-be-merged object.
* @param slices name of the configured slices.
* @param context persistence context managing the given instance.
*
* @return identifier of the slice. This name must match one of the
* configured slice names.
* @see DistributedConfiguration#getSliceNames()
*/
String distribute(Object pc, List<String> slices, Object context);
}
]]>
</programlisting>
</para>
<para>
While implementing a distribution policy the most important thing to
remember is <link linkend="collocation_constraint">collocation constraint</link>.
Because Slice can not establish or query any cross-database relationship, all the
related instances must be stored in the same database slice.
Slice can determine the closure of a root object by traversal of
cascaded relationships. Hence user-defined policy has to only decide the
database for the root instance that is the explicit argument to
<methodname>EntityManager.persist()</methodname> call.
Slice will ensure that all other related instances that gets persisted by cascade
is assigned to the same database slice as that of the root instance.
However, the user-defined distribution policy must return the
same slice identifier for the instances that are logically related but
not cascaded for persist.
</para>
</section>
<section>
</section>
</section>
<title>Configuration Properties</title>
<para>
The properties to configure Slice can be classified in two broad groups.
The <emphasis>global</emphasis> properties apply to all the slices, for example,
the thread pool used to execute the queries in parallel or the transaction
manager used to coordinate transaction across multiple slices.
The <emphasis>per-slice</emphasis> properties apply to individual slice, for example,
the JDBC connection URL of a slice.
</para>
<section>
<title>Global Properties</title>
<section>
<title>openjpa.slice.DistributionPolicy</title>
<para>
This <emphasis>mandatory</emphasis> plug-in property determines how newly
persistent instances are distributed across individual slices.
The value of this property is a fully-qualified class name that implements
<ulink url="../javadoc/org/apache/openjpa/slice/DistributionPolicy.html">
<classname>org.apache.openjpa.slice.DistributionPolicy</classname>
</ulink> interface.
</para>
</section>
<section><title>openjpa.slice.Lenient</title>
<para>
This boolean plug-in property controls the behavior when one or more slice
can not be connected or unavailable for some other reasons.
If <code>true</code>, the unreachable slices are ignored. If
<code>false</code> then any unreachable slice will raise an exception
during startup.
</para>
<para>
By default this value is set to <code>false</code> i.e. all configured
slices must be available.
</para>
</section>
<section>
<title>openjpa.slice.Master</title>
<para>
This plug-in property can be used to identify the name of the master slice.
Master slice is used when a primary key is to be generated from a database sequence.
</para>
<para>
By default the master slice is the first slice in the list of configured slice names.
</para>
<para>
<warning>
Currently, there is no provision to use sequence from
multiple database slices.
</warning>
</para>
</section>
<section>
<title>openjpa.slice.Names</title>
<para>
This plug-in property can be used to register the logical slice names.
The value of this property is comma-separated list of slice names.
The ordering of the names in this list is
<emphasis>significant</emphasis> because
<link linkend="distribution_policy">DistributionPolicy</link> receives
the input argument of the slice names in the same order.
</para>
<para>
If logical slice names are not registered explicitly via this property,
then all logical slice names available in the persistence unit are
registered. The ordering of the slice names in this case is alphabetical.
</para>
<para>
If logical slice names are registered explicitly via this property, then
any logical slice that is available in the persistence unit but excluded
from this list is ignored.
</para>
</section>
<section>
<title>openjpa.slice.ThreadingPolicy</title>
<para>
This plug-in property determines the nature of thread pool being used
for database operations such as query or flush on individual slices.
The value of the property is a
fully-qualified class name that implements
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ExecutorService.html">
<classname>java.util.concurrent.ExecutorService</classname>
</ulink> interface.
Two pre-defined pools can be chosen via their aliases namely
<code>fixed</code> or <code>cached</code>.
</para>
<para>
The pre-defined alias <code>cached</code> activates a
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()">cached thread pool</ulink>.
A cached thread pool creates new threads as needed, but will reuse
previously constructed threads when they are available. This pool
is suitable in scenarios that execute many short-lived asynchronous tasks.
The way Slice uses the thread pool to execute database operations is
akin to such scenario and hence <code>cached</code> is the default
value for this plug-in property.
</para>
<para>
The <code>fixed</code> alias activates a
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)">fixed thread pool</ulink>.
The fixed thread pool can be further parameterized with
<code>CorePoolSize</code>, <code>MaximumPoolSize</code>,
<code>KeepAliveTime</code> and <code>RejectedExecutionHandler</code>.
The meaning of these parameters are described in
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html">JavaDoc</ulink>.
The users can exercise finer control on thread pool behavior via these
parameters.
By default, the core pool size is <code>10</code>, maximum pool size is
also <code>10</code>, keep alive time is <code>60</code> seconds and
rejected execution is
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.AbortPolicy.html">aborted</ulink>.
</para>
<para>
Both of the pre-defined aliases can be parameterized with a fully-qualified
class name that implements
<ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadFactory.html">
<classname>java.util.concurrent.ThreadFactory</classname>
</ulink> interface.
</para>
</section>
<section>
<title>openjpa.slice.TransactionPolicy</title>
<para>
This plug-in property determines the policy for transaction commit
across multiple slices. The value of this property is a fully-qualified
class name that implements
<ulink url="http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/transaction/TransactionManager.html">
<classname>javax.transaction.TransactionManager</classname>
</ulink> interface.
</para>
<para>
Three pre-defined policies can be chosen
by their aliases namely <code>default</code>,
<code>xa</code> and <code>jndi</code>.
</para>
<para>
The <code>default</code> policy employs
a Transaction Manager that commits or rolls back transaction on individual
slices <emphasis>without</emphasis> a two-phase commit protocol.
It does <emphasis>not</emphasis>
guarantee atomic nature of transaction across all the slices because if
one or more slice fails to commit, there is no way to rollback the transaction
on other slices that committed successfully.
</para>
<para>
The <code>xa</code> policy employs a Transaction Manager that that commits
or rolls back transaction on individual
slices using a two-phase commit protocol. The prerequisite to use this scheme
is, of course, that all the slices must be configured to use
XA-complaint JDBC driver.
</para>
<para>
The <code>jndi</code> policy employs a Transaction Manager by looking up the
JNDI context. The prerequisite to use this transaction
manager is, of course, that all the slices must be configured to use
XA-complaint JDBC driver.
<warning>This JNDI based policy is not available currently.</warning>
</para>
</section>
</section>
<section>
<title>Per-Slice Properties</title>
<para>
Any OpenJPA property can be configured for each individual slice. The property name
is of the form <code>openjpa.slice.[Logical slice name].[OpenJPA Property Name]</code>.
For example, <code>openjpa.slice.One.ConnectionURL</code> where <code>One</code>
is the logical slice name and <code>ConnectionURL</code> is a OpenJPA property
name.
</para>
<para>
If a property is not configured for a specific slice, then the value for
the property equals to the corresponding <code>openjpa.*</code> property.
</para>
</section>
</chapter>