docs/isolation.html - accumulo - Git at Google

 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
 <html>
 <head>
 <title>Accumulo Isolation</title>
 <link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
 </head>
 <body>

 <h1>Apache Accumulo Documentation : Isolation</h1>

 <h3>Scanning</h3>

 <p>Accumulo supports the ability to present an isolated view of rows when scanning.  There are three possible ways that a row could change in accumulo :
 <ul>
  <li>a mutation applied to a table
  <li>iterators executed as part of a minor or major compaction
  <li>bulk import of new files
 </ul>
 Isolation guarantees that either all or none of the changes made by these operations on a row are seen.  Use the <a href='apidocs/org/apache/accumulo/core/client/IsolatedScanner.html'>IsolatedScanner</a> to obtain an isolated view of an accumulo table.  When using the regular scanner it is possible to see a non isolated view of a row.  For example if a mutation modifies three columns, it is possible that you will only see two of those modifications.  With the isolated scanner either all three of the changes are seen or none.  For an example of this try running the <a href='apidocs/org/apache/accumulo/examples/isolation/InterferenceTest.html'>InterferenceTest</a> example.

 <p>At this time there is no client side isolation support for the <a href='apidocs/org/apache/accumulo/core/client/BatchScanner.html'>BatchScanner</a>.  You may consider using the <a href='apidocs/org/apache/accumulo/core/iterators/WholeRowIterator.html'>WholeRowIterator</a> with the  <a href='apidocs/org/apache/accumulo/core/client/BatchScanner.html'>BatchScanner</a> to achieve isolation though. This drawback of doing this is that entire rows are read into memory on the server side.  If a row is too big, it may crash a tablet server.  The <a href='apidocs/org/apache/accumulo/core/client/IsolatedScanner.html'>IsolatedScanner</a> buffers rows on the client side so a large row will not crash a tablet server.

 <h3>Iterators</h3>
 <p>When writing server side iterators for accumulo isolation is something to be aware of.  A scan time iterator in accumulo reads from a set of data sources.  While an iterator is reading data it has an isolated view.  However, after it returns a key/value it is possible that accumulo may switch data sources and re-seek the iterator.  This is done so that resources may be reclaimed.  When the user does not request isolation this can occur after any key is returned.  When a user request isolation this will only occur after a new row is returned, in which case it will reseek to the very beginning of the next possible row.
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<head>
	<title>Accumulo Isolation</title>
	<link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
	</head>
	<body>

	<h1>Apache Accumulo Documentation : Isolation</h1>

	<h3>Scanning</h3>

	<p>Accumulo supports the ability to present an isolated view of rows when scanning. There are three possible ways that a row could change in accumulo :
	<ul>
	<li>a mutation applied to a table
	<li>iterators executed as part of a minor or major compaction
	<li>bulk import of new files
	</ul>
	Isolation guarantees that either all or none of the changes made by these operations on a row are seen. Use the <a href='apidocs/org/apache/accumulo/core/client/IsolatedScanner.html'>IsolatedScanner</a> to obtain an isolated view of an accumulo table. When using the regular scanner it is possible to see a non isolated view of a row. For example if a mutation modifies three columns, it is possible that you will only see two of those modifications. With the isolated scanner either all three of the changes are seen or none. For an example of this try running the <a href='apidocs/org/apache/accumulo/examples/isolation/InterferenceTest.html'>InterferenceTest</a> example.

	<p>At this time there is no client side isolation support for the <a href='apidocs/org/apache/accumulo/core/client/BatchScanner.html'>BatchScanner</a>. You may consider using the <a href='apidocs/org/apache/accumulo/core/iterators/WholeRowIterator.html'>WholeRowIterator</a> with the <a href='apidocs/org/apache/accumulo/core/client/BatchScanner.html'>BatchScanner</a> to achieve isolation though. This drawback of doing this is that entire rows are read into memory on the server side. If a row is too big, it may crash a tablet server. The <a href='apidocs/org/apache/accumulo/core/client/IsolatedScanner.html'>IsolatedScanner</a> buffers rows on the client side so a large row will not crash a tablet server.

	<h3>Iterators</h3>
	<p>When writing server side iterators for accumulo isolation is something to be aware of. A scan time iterator in accumulo reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user request isolation this will only occur after a new row is returned, in which case it will reseek to the very beginning of the next possible row.