openjpa-project/src/doc/manual/ref_guide_slice.xml - openjpa - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
 -->
 <chapter id="ref_guide_slice">
   <title>
     Distributed Persistence
   </title>
   <para>
   The standard JPA runtime environment works with a <emphasis>single</emphasis>
   database instance. OpenJPA can be extended via plug-in to work with
   multiple databases within the same transaction without any change to the
   existing application. This capability of OpenJPA for distributed
   database environment is called <emphasis>Slice</emphasis> and is explained in
   the following sections.
   </para>

   <section id="slice_overview">
     <title>Overview</title>
     <para>
     Enterprise applications are increasingly deployed for distributed database
     environments. The reasons for distributed, often horizontally-partitioned
     database environment can be to counter massive data growth, to
     support multiple external clients on a hosted platform or many other
     practical scenarios that can benefit from data partitioning.
     </para>

     <para>
     Any JPA-based user application has to address serious technical and conceptual
     challenges to directly interact with a set of physical databases
     within a single transaction.
     Slice encapsulates the complexity of distributed database environment
     via the abstraction of <emphasis>virtual</emphasis> database which internally
     manages multiple physical databases. We refer each physical database instance
     as <emphasis>slice</emphasis>.
     <emphasis>Virtualization</emphasis> of distributed databases
     makes OpenJPA object management kernel and
     the user application to work in the same way as in the case of a single physical
     database.
     </para>
   </section>

     <section id="Features and Limitations">
        <title>Salient Features</title>
          <section><title>Transparency</title>
             <para>
               The existing application or the persistent domain model requires
               <emphasis>no change</emphasis> to upgrade from a single database
               to a distributed database environment.
             </para>
          </section>

          <section><title>Custom Distribution Policy</title>
             <para>
              User application decides how the newly persistent instances be
              distributed across the database slices. The data
              distribution policy across the slices may be based on the attribute
              of the data itself. For example, all Customer whose first name begins with
              character 'A' to 'M' will be stored in one slice while names
              beginning with 'N' to 'Z' will be stored in another slice.
              </para>
              <para>
              This custom data distribution policy is specified by implementing
              <classname>org.apache.openjpa.slice.DistributionPolicy</classname>
              interface by the user application.
              </para>

              <para>
              Slice tracks the original database for existing instances. When
              an application issues a query, the resultant instances can be loaded
              from different slices. This tracking is important as subsequent
              update to any of these instances is committed to the appropriate
              original database slice.
             </para>

             <note>
             <para>
             You can find the original slice of an instance <code>pc</code> by
             the static utility method
             <methodname>SlicePersistence.getSlice(pc)</methodname>.
             This method returns the slice identifier string associated with the
             given <emphasis>managed</emphasis> instance. If the instance is not
             being managed then the method return null because any unmanaged or
             detached instance is not associated with any slice.
             </para>
             </note>

             <para>
             <warning>Currently, there is no provision for migrating an
             existing instance from one slice to another.
             </warning>
             </para>
          </section>

          <section><title>Heterogeneous Database</title>
             <para>
               Each slice can be configured independently with its own JDBC
               driver and other connection parameters. Hence the target database
               environment can constitute of heterogeneous databases.
             </para>
         </section>

         <section><title>Parallel Execution</title>
             <para>
               All database operations such as query, commit or flush operates
               in parallel across the database slices. The execution threading
               policy is configurable.
             </para>
          </section>

          <section><title>Distributed Query</title>
             <para>
             The queries are executed across all slices and the results are
             merged into a single list. The query result that includes
             <code>ORDER BY</code> clause are sorted correctly by merging
             results from each individual slice.
             </para>
             The queries that specify an aggregate projection such as
             <code>COUNT()</code>, <code>MAX()</code>, <code>MIN()</code>
             and <code>SUM()</code>
             are correctly evaluated <emphasis>only if</emphasis> they
             return a single result.
             <para>
             </para>
             <para>
             <warning>
             The aggregate operation <code>AVG()</code> is not supported.
             </warning>
             </para>

          </section>

          <section><title>Targeted Query</title>
             <para>
             You can target the query only to a subset of slices rather than
             all slices by setting a <emphasis>hint</emphasis>. The hint key
             <code>openjpa.hint.slice.Target</code> is set on any query and
             hint value is
             comma-separated list of slice identifiers. The following
             example shows how to target a query only to slice <code>"One"</code>

             <programlisting>
               <![CDATA[EntityManager em = ...;
               em.getTransaction().begin();
               String hint = "openjpa.hint.slice.Target";
               Query query = em.createQuery("SELECT p FROM PObject").setHint(hint, "One");
               List result = query.getResultList();
               // verify that each instance is originaing from the given slice
               for (Object pc : result) {
                  String sliceOrigin = SlicePersistence.getSlice(pc);
                  assertTrue ("One", sliceOrigin);
               }
               ]]>
             </programlisting>
             </para>
          </section>


          <section><title>Distributed Transaction</title>
             <para>
             The database slices participate in a global transaction provided
             each slice is configured with a XA-complaint JDBC driver, even
             when the persistence unit is configured for <code>RESOURCE_LOCAL</code>
             transaction.
             </para>
             <para>
             <warning>
             If any of the configured slices is not XA-complaint <emphasis>and</emphasis>
             the persistence unit is configured for <code>RESOURCE_LOCAL</code>
             transaction then each slice is committed without any two-phase
             commit protocol. If commit on any slice fails, then atomic nature of
             the transaction is not ensured.
             </warning>
             </para>
           </section>


          <section id="collocation_constraint"><title>Collocation Constraint</title>
             <para>
             No relationship can exist across database slices. In O-R mapping parlance,
             this condition translates to the limitation that the closure of an object graph must be
             <emphasis>collocated</emphasis> in the same database.
             For example, consider a domain model where Person relates to Adress.
             Person X refers to Address A while Person Y refers to Address B.
             Collocation Constraint means that <emphasis>both</emphasis> X and A
             must be stored in the same
             database slice. Similarly Y and B must be stored in a single slice.
             </para>
             <para>
             Slice, however, helps to maintain collocation constraint automatically.
             The instances in the closure set of any newly persistent instance
             reachable via cascaded relationship is stored in the same slice.
             The user-defined distribution policy requires to supply the slice
             for the root instance only.
             </para>
          </section>
     </section>

   <section id="slice_configuration">
     <title>Usage</title>
     <para>
      Slice is activated via the following property settings:
     </para>
     <section>
       <title>How to activate Slice Runtime?</title>
       <para>
        The basic configuration property is
        <programlisting>
         <![CDATA[ <property name="openjpa.BrokerFactory" value="slice"/>]]>
        </programlisting>
        This critical configuration activates a specialized factory class aliased
        as <code>slice</code> to create object management kernel that
        can work against multiple databases.
       </para>
     </section>

     <section>
       <title>How to configure each database slice?</title>
       <para>
       Each database slice is identified by a logical name unique within a
       persistent unit. The list of the slices is specified by
       <code>openjpa.slice.Names</code> property.
       For example, specify three slices named <code>"One"</code>,
       <code>"Two"</code> and <code>"Three"</code> as follows:
       <programlisting>
       <![CDATA[ <property name="openjpa.slice.Names" value="One, Two, Three"/>]]>
       </programlisting>
       </para>
       <para>
       This property is not mandatory. If this property is not specified then
       the configuration is scanned for logical slice names. Any property
       <code>"abc"</code> of the form <code>openjpa.slice.XYZ.abc</code> will
       register a slice with logical
       name <code>"XYZ"</code>.
       </para>
       <para>
       The order of the names is significant when no <code>openjpa.slice.Master</code>
       property is not specified. Then the persistence unit is scanned to find
       all configured slice names and they are ordered alphabetically.
       </para>

       <para>
        Each database slice properties can be configured independently.
        For example, the
        following configuration will register two slices with logical name
        <code>One</code> and <code>Two</code>.
        <programlisting>
         <![CDATA[<property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"/>
         <property name="openjpa.slice.Two.ConnectionURL" value="jdbc:mysql:localhost//slice2"/>]]>
        </programlisting>
       </para>

       <para>
        Any OpenJPA specific property can be configured per slice basis.
        For example, the following configuration will use two different JDBC
        drivers for slice <code>One</code> and <code>Two</code>.
        <programlisting>
         <![CDATA[<property name="openjpa.slice.One.ConnectionDriverName" value="com.mysql.jdbc.Driver"/>
         <property name="openjpa.slice.Two.ConnectionDriverName" value="com.mysql.jdbc.jdbc2.optional.MysqlXADataSource"/>]]>
        </programlisting>
       </para>

       <para>
         Any property if unspecified for a particular slice will be defaulted by
         corresponding OpenJPA property. For example, consider following three slices
         <programlisting>
          <![CDATA[<property name="openjpa.slice.One.ConnectionURL"          value="jdbc:mysql:localhost//slice1"/>
          <property name="openjpa.slice.Two.ConnectionURL"          value="jdbc:mysql:localhost//slice2"/>
          <property name="openjpa.slice.Three.ConnectionURL"        value="jdbc:oracle:localhost//slice3"/>

          <property name="openjpa.ConnectionDriverName"     value="com.mysql.jdbc.Driver"/>
          <property name="openjpa.slice.Three.ConnectionDriverName" value="oracle.jdbc.Driver"/>]]>
         </programlisting>
         In this example, <code>Three</code> will use slice-specific
         <code>oracle.jdbc.Driver</code> driver while slice
         <code>One</code> and <code>Two</code> will use
         the driver <code>com.mysql.jdbc.Driver</code> as
         specified by <code>openjpa.ConnectionDriverName</code>
         property value.
       </para>
     </section>

     <section id="distribution_policy">
        <title>Implement DistributionPolicy interface</title>
        <para>
         Slice needs to determine which slice will persist a new instance.
         The application can only decide this policy (for example,
         all PurchaseOrders before April 30 goes to slice <code>One</code>,
         all the rest goes to slice <code>Two</code>). This is why
         the application has to implement
         <code>org.apache.openjpa.slice.DistributionPolicy</code> and
         specify the implementation class in configuration
         <programlisting>
          <![CDATA[ <property name="openjpa.slice.DistributionPolicy" value="com.acme.foo.MyOptimialDistributionPolicy"/>]]>
         </programlisting>
        </para>

        <para>
         The interface <code>org.apache.openjpa.slice.DistributionPolicy</code>
         is simple with a single method. The complete listing of the
         documented interface follows:
        <programlisting>
        <![CDATA[
 public interface DistributionPolicy {
     /**
      * Gets the name of the slice where a given instance will be stored.
      *
      * @param pc The newly persistent or to-be-merged object.
      * @param slices name of the configured slices.
      * @param context persistence context managing the given instance.
      *
      * @return identifier of the slice. This name must match one of the
      * configured slice names.
      * @see DistributedConfiguration#getSliceNames()
      */
     String distribute(Object pc, List<String> slices, Object context);
 }
 ]]>
        </programlisting>
         </para>

         <para>
         While implementing a distribution policy the most important thing to
         remember is <link linkend="collocation_constraint">collocation constraint</link>.
         Because Slice can not establish or query any cross-database relationship, all the
         related instances must be stored in the same database slice.

         Slice can determine the closure of a root object by traversal of
         cascaded relationships. Hence user-defined policy has to only decide the
         database for the root instance that is the explicit argument to
         <methodname>EntityManager.persist()</methodname> call.
         Slice will ensure that all other related instances that gets persisted by cascade
         is assigned to the same database slice as that of the root instance.
         However, the user-defined distribution policy must return the
         same slice identifier for the instances that are logically related but
         not cascaded for persist.
         </para>
     </section>

     <section>
     </section>
   </section>

   <title>Configuration Properties</title>
     <para>
     The properties to configure Slice can be classified in two broad groups.
 The <emphasis>global</emphasis> properties apply to all the slices, for example,
 the thread pool used to execute the queries in parallel or the transaction
 manager used to coordinate transaction across multiple slices.
 The <emphasis>per-slice</emphasis> properties apply to individual slice, for example,
 the JDBC connection URL of a slice.
    </para>

    <section>
      <title>Global Properties</title>

      <section>
         <title>openjpa.slice.DistributionPolicy</title>
         <para>
          This <emphasis>mandatory</emphasis> plug-in property determines how newly
          persistent instances are distributed across individual slices.
          The value of this property is a fully-qualified class name that implements
          <ulink url="../javadoc/org/apache/openjpa/slice/DistributionPolicy.html">
          <classname>org.apache.openjpa.slice.DistributionPolicy</classname>
          </ulink> interface.
         </para>
      </section>

      <section><title>openjpa.slice.Lenient</title>
       <para>
         This boolean plug-in property controls the behavior when one or more slice
         can not be connected or unavailable for some other reasons.
         If <code>true</code>, the unreachable slices are ignored. If
         <code>false</code> then any unreachable slice will raise an exception
         during startup.
         </para>
         <para>
         By default this value is set to <code>false</code> i.e. all configured
         slices must be available.
         </para>
      </section>

      <section>
       <title>openjpa.slice.Master</title>
       <para>
        This plug-in property can be used to identify the name of the master slice.
        Master slice is used when a primary key is to be generated from a database sequence.
        </para>
        <para>
         By default the master slice is the first slice in the list of configured slice names.
        </para>
        <para>
               <warning>
               Currently, there is no provision to use sequence from
               multiple database slices.
               </warning>
        </para>
      </section>

      <section>
         <title>openjpa.slice.Names</title>
         <para>
          This plug-in property can be used to register the logical slice names.
          The value of this property is comma-separated list of slice names.
          The ordering of the names in this list is
          <emphasis>significant</emphasis> because
          <link linkend="distribution_policy">DistributionPolicy</link> receives
          the input argument of the slice names in the same order.
         </para>
         <para>
         If logical slice names are not registered explicitly via this property,
         then all logical slice names available in the persistence unit are
         registered. The ordering of the slice names in this case is alphabetical.
         </para>
         <para>
         If logical slice names are registered explicitly via this property, then
         any logical slice that is available in the persistence unit but excluded
         from this list is ignored.
         </para>
      </section>

      <section>
         <title>openjpa.slice.ThreadingPolicy</title>
         <para>
         This plug-in property determines the nature of thread pool being used
         for database operations such as query or flush on individual slices.
         The value of the property is a
         fully-qualified class name that implements
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ExecutorService.html">
         <classname>java.util.concurrent.ExecutorService</classname>
         </ulink> interface.
         Two pre-defined pools can be chosen via their aliases namely
         <code>fixed</code> or <code>cached</code>.
         </para>
         <para>
         The pre-defined alias <code>cached</code> activates a
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()">cached thread pool</ulink>.
         A cached thread pool creates new threads as needed, but will reuse
         previously constructed threads when they are available. This pool
         is suitable in scenarios that execute many short-lived asynchronous tasks.
         The way Slice uses the thread pool to execute database operations is
         akin to such scenario and hence <code>cached</code> is the default
         value for this plug-in property.
         </para>
         <para>
         The <code>fixed</code> alias activates a
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)">fixed thread pool</ulink>.
         The fixed thread pool can be further parameterized with
         <code>CorePoolSize</code>, <code>MaximumPoolSize</code>,
         <code>KeepAliveTime</code> and <code>RejectedExecutionHandler</code>.
         The meaning of these parameters are described in
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html">JavaDoc</ulink>.
         The users can exercise finer control on thread pool behavior via these
         parameters.
         By default, the core pool size is <code>10</code>, maximum pool size is
         also <code>10</code>, keep alive time is <code>60</code> seconds and
         rejected execution is
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.AbortPolicy.html">aborted</ulink>.
         </para>
         <para>
         Both of the pre-defined aliases can be parameterized with a fully-qualified
         class name that implements
         <ulink url="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadFactory.html">
         <classname>java.util.concurrent.ThreadFactory</classname>
         </ulink> interface.
         </para>
      </section>

      <section>
       <title>openjpa.slice.TransactionPolicy</title>
       <para>
       This plug-in property determines the policy for transaction commit
       across multiple slices. The value of this property is a fully-qualified
       class name that implements
       <ulink url="http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/transaction/TransactionManager.html">
       <classname>javax.transaction.TransactionManager</classname>
       </ulink> interface.
       </para>
       <para>
       Three pre-defined policies can be chosen
       by their aliases namely <code>default</code>,
       <code>xa</code> and <code>jndi</code>.
       </para>
       <para>
       The <code>default</code> policy employs
       a Transaction Manager that commits or rolls back transaction on individual
       slices <emphasis>without</emphasis> a two-phase commit protocol.
       It does <emphasis>not</emphasis>
       guarantee atomic nature of transaction across all the slices because if
       one or more slice fails to commit, there is no way to rollback the transaction
       on other slices that committed successfully.
       </para>
       <para>
       The <code>xa</code> policy employs a Transaction Manager that that commits
       or rolls back transaction on individual
       slices using a two-phase commit protocol. The prerequisite to use this scheme
       is, of course, that all the slices must be configured to use
       XA-complaint JDBC driver.
       </para>
       <para>
       The <code>jndi</code> policy employs a Transaction Manager by looking up the
       JNDI context. The prerequisite to use this transaction
       manager is, of course, that all the slices must be configured to use
       XA-complaint JDBC driver.
       <warning>This JNDI based policy is not available currently.</warning>
       </para>
     </section>
    </section>

    <section>
      <title>Per-Slice Properties</title>
      <para>
      Any OpenJPA property can be configured for each individual slice. The property name
      is of the form <code>openjpa.slice.[Logical slice name].[OpenJPA Property Name]</code>.
      For example, <code>openjpa.slice.One.ConnectionURL</code> where <code>One</code>
      is the logical slice name and <code>ConnectionURL</code> is a OpenJPA property
      name.
      </para>
      <para>
      If a property is not configured for a specific slice, then the value for
      the property equals to the corresponding <code>openjpa.*</code> property.
      </para>
    </section>

 </chapter>