docs/platform.html - incubator-retired-blur - Git at Google

 <!DOCTYPE html>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
 <html>
   <head>
     <title>Platform - Apache Blur (Incubator) Documentation</title>
     <meta name="viewport" content="width=device-width, initial-scale=1.0">
     <!-- Bootstrap -->
     <link href="resources/css/bootstrap.min.css" rel="stylesheet" media="screen">
     <link href="resources/css/bs-docs.css" rel="stylesheet" media="screen">
   </head>
   <body>
     <div class="navbar navbar-inverse navbar-fixed-top">
       <div class="container">
         <div class="navbar-header">
           <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
             <span class="icon-bar"></span>
             <span class="icon-bar"></span>
             <span class="icon-bar"></span>
           </button>
           <a class="navbar-brand" href="http://incubator.apache.org/blur">Apache Blur (Incubator)</a>
         </div>
         <div class="collapse navbar-collapse">
           <ul class="nav navbar-nav">
             <li><a href="index.html">Main</a></li>
             <li><a href="getting-started.html">Getting Started</a></li>
             <li class="active"><a href="platform.html">Platform</a></li>
             <li><a href="data-model.html">Data Model</a></li>
             <li><a href="cluster-setup.html">Cluster Setup</a></li>
             <li><a href="using-blur.html">Using Blur</a></li>
             <li><a href="Blur.html">Blur API</a></li>
             <li><a href="console.html">Console</a></li>
           </ul>
         </div>
       </div>
     </div>
     <div class="container bs-docs-container">
       <div class="row">
         <div class="col-md-3">
           <div class="bs-sidebar hidden-print affix" role="complementary">
             <ul class="nav bs-sidenav">
             <li><a href="#intro">Introduction</a></li>
             <li><a href="#motivation">Motivation</a></li>
             <li><a href="#arch">Blur Architecture Review</a></li>
             <li><a href="#affordances">Platform Affordances</a></li>
             <li><a href="#commands">Command Overview</a></li>
             <li><a href="#arguments">Arguments</a></li>
             <li><a href="#installation">Installation</a></li>
             <li><a href="#docs">Documentation</a></li>
             <li><a href="#cli">CLI</a></li>
             </ul>
           </div>
         </div>
         <div class="col-md-9" role="main">
         <!-- In some places, we need to describe both where the system is heading and
            describe its current limitations.  For ease of documentation maintenance, let
            us try to write in a way that disclaimers are in their own paragraphs that can
            easily be stripped out when they no longer apply, thus not requiring a bunch
            of re-writing.
         -->
           <section>
             <div class="page-header">
               <h1 id="intro">Introduction</h1>
             </div>
             <p class="lead">
       				While many users of Blur will find the search system sufficient for their
       				needs out of the box, the Blur platform exposes a simple set of lower-level
       				primitives that allow the user to easily and quickly introduce new system behavior.
           </p>
           <!-- Disclaimer para for 0.2.4 -->
           <p class="lead">
           With this release, we expose the initial read-only constructs for the platform.
           Future releases, will allow introduce more rich read-write constructs.
           </p>
           <!-- Disclaimer para for 0.2.4 -->
           <p class="alert">
           <strong>NOTE:</strong> In 0.2.4, the platform capability described here exists,
           but existing functionality of Blur has not yet been ported to use it.
           </p>
           </section>
           <section>
             <div class="page-header">
               <h2 id="motivation">Motivation</h2>
             </div>
             <p>
 							In modern open source search platforms, we find Lucene at the very core and
 							a monolithic application stack implemented on top of it handling the distributed
 							indexing, searching, failures, features, etc. Indeed, this was true of Blur
 							as well.
 						</p>
 						<p>We wanted more flexibility.  We wanted to rapidly be able to introduce brand new
 							features into the system.  So, we supposed it would be helpful if an
 							intermediate abstraction could be introduced providing the primitives for a
 							distributed Lucene server on which specific search applications could be built.
           	</p>
           	<p>Some specific goals we had in mind:</p>
           	<ul>
           	  <li>To allow for indexing/searching based on other/new data models (e.g.
           	  		more than just the Row/Record constructs).</li>
 							<li>Allow implementations to build whole new APIs given direct access to the Lucene primitives.</li>
 							<li>Allow flexibility to build totally custom applications.</li>
 							<li>Remove the complexities of threading, networking and concurrency from
 									new feature creation.</li>
  						</ul>
           </section>
           <section>
             <div class="page-header">
               <h2 id="arch">Blur Architecture Review</h2>
             </div>
           	<p>
           		The Blur platform provides a set of <code>Command</code> classes that can
 							be implemented to achieve new functionality.  A basic understanding of how Blur
 							works will greatly help in understanding how to implement commands.  So let's
 							take a moment to review.
           	</p>
           	<!--
           		@TODO: Does this content exist somewhere we can just point to?
           	  @TODO: If the answer is no, we should beef up this quick-n-dirty explanation.
           	-->
           	<p>In Blur, we refer to a logical Lucene index as a table.  Tables are typically
           	very large, and so we divide them up into 'shards'.  Now, each shard is exposed
           	through a Shard Server, which is sort of a container of shards.  The Shard Server(s)
           	are organized into a cluster that work together to make all the shards of the table(s)
           	available.  For scalability, we've divided up the logical table into shards spread
           	across the Shard Server(s).  We then put another type of server, called a Controller,
           	in front of the cluster to present all the shards as a single logical table.
           	</p>
           	<!--
           	  @TODO: Find a graphic of the architecture so the bevy of words above can be simplified.
              -->
           	<p>For the controller to present all the shards as a single index, it needs to accept
           	a request, then scatter the request to all the shard servers, combine the results in
           	some meaningful way, and send them back to the client.
           	</p>
           </section>
           <section>
             <div class="page-header">
               <h2 id="affordances">Platform Affordances</h2>
             </div>
             <p>
 							<code>@TODO</code>
 						</p>

           </section>
           <section>
             <div class="page-header">
               <h2 id="commands">Command Overview</h2>
             </div>
             <p>
 						As we've gathered from above, the heart of a distributed search system is the ability
 						to execute some function across a set of indices and combine the results in a logical
 						way to be returned to the user. Not surprisingly, this is also at the heart of the
 						Blur Platform.  As an introduction, we'll explore how to take a look at finding the number
 						of documents that contain a particular term across all shards in a table.
 						</p>
 						<p>Our first step will be to find the answer for a single shard/index.  Lucene's
 						<code>IndexReader</code>, to which we'll have access in our command, conveniently
 						gives us that.  Getting the answer for a single index requires implementing an <code>execute</code>
 						method.
 						</p>
 						<pre>@Override
 public Long execute(IndexContext context) throws IOException {
   return (long) context.getIndexReader().numDocs();
 }</pre>
 						<p>We'll learn where the field name and term are defined later in the Arguments
 						 section. Inside of the <code>execute</code> method, we're focused on finding the answer for
 						 a single shard/index.  To find our answer, we're given an <code>IndexContext</code> which
 						 provides us access to the underlying Lucene index, so for our trivial command we can simply
 						 return the answer directly from the IndexReader.
 						 </p>
 						 <p>Now we need to let Blur know how to combine the results from the individual shards
 						 into a single logical response.  We do this by implementing the <code>combine</code> method.
 						 </p>
 						 <pre>@Override
 public Long combine(CombiningContext context, Map&lt;? extends Location&lt;?&gt;, Long&gt; results) throws IOException {
   long total = 0;
   for (Long l : results.values()) {
     total += l;
   }
   return total;
 }</pre>
 						 <p>
 						 Again, we're given some execution context (which we don't need for our sample command) and we're
 						 given an <code>Map&lt;? extends Location&lt;?&gt;, Long&gt;</code> of result values.
 						 </p>

           </section>
           <section>
             <div class="page-header">
               <h2 id="arguments">Arguments</h2>
             </div>
             <p>
 							Recall from above that in the execute method we were able to use some member variables that
 							were treated like arguments to the command.  Now, let's take a closer look how they were provided.
 						</p>
 						<p>
 							We've kept it very simple for you to declare arguments and for your users	to provide arguments. We provide
 							two simple annotations that you can place right on your member field declarations indicating whether
 							they are required or optional.  You can [and are encouraged to] provide some helpful documentation
 							on the intent of the argument.  As an example, by extending the <code>TableReadCommand</code> you get
 							the <code>table</code> argument as required for free.  Let's look at how it's declared:
 						</p>
 						<pre>
 @RequiredArgument("The name of the table.")
 private String table;
 						</pre>
 						<p>Naturally, we can also declare optional arguments as well:</p>
 						<pre>
 @OptionalArgument("The number of results to be returned. default=10")
 private short size = 10;
 						</pre>
 						<p>
 						  By annotating your parameters, the Blur Platform is able to do the basic requirement checking for you
 						  allowing you to keep the inside of your execute/combine clean of argument validation.
 						</p>
           </section>
           <section>
             <div class="page-header">
               <h2 id="installation">Installation</h2>
             </div>
             <p>
 							<code>@TODO</code>
 						</p>

           </section>
           <section>
             <div class="page-header">
               <h2 id="docs">Documentation</h2>
             </div>
             <p>
 							Commands should be self-documenting starting with a good name.  But good naming
 							is not sufficient, so Blur offers a <code>@Description</code> annotation to provide
 							a nice way to better express what your command does.  It's simply used like so:
 						</p>
 						<pre>
 @Description("Returns the number of documents containing the term in the given field.")
 public class DocFreqCommand extends TableReadCommand<Long> {
   ...
 }
 						</pre>

           </section>
           <section>
             <div class="page-header">
               <h2 id="cli">CLI</h2>
             </div>
             <p>
 See Using Blur -> Shell -> <a href="using-blur.html#shell_platform_commands">Platform Commands</a>.
 						</p>

           </section>
         </div>
       </div>
     </div>

     <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
     <script src="resources/js/jquery-2.0.3.min.js"></script>
     <!-- Include all compiled plugins (below), or include individual files as needed -->
     <script src="resources/js/bootstrap.min.js"></script>
     <!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
     <script src="resources/js/respond.min.js"></script>
     <script src="resources/js/docs.js"></script>
   </body>
 </html>
	<!DOCTYPE html>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<head>
	<title>Platform - Apache Blur (Incubator) Documentation</title>
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<!-- Bootstrap -->
	<link href="resources/css/bootstrap.min.css" rel="stylesheet" media="screen">
	<link href="resources/css/bs-docs.css" rel="stylesheet" media="screen">
	</head>
	<body>
	<div class="navbar navbar-inverse navbar-fixed-top">
	<div class="container">
	<div class="navbar-header">
	<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	</button>
	<a class="navbar-brand" href="http://incubator.apache.org/blur">Apache Blur (Incubator)</a>
	</div>
	<div class="collapse navbar-collapse">
	<ul class="nav navbar-nav">
	<li><a href="index.html">Main</a></li>
	<li><a href="getting-started.html">Getting Started</a></li>
	<li class="active"><a href="platform.html">Platform</a></li>
	<li><a href="data-model.html">Data Model</a></li>
	<li><a href="cluster-setup.html">Cluster Setup</a></li>
	<li><a href="using-blur.html">Using Blur</a></li>
	<li><a href="Blur.html">Blur API</a></li>
	<li><a href="console.html">Console</a></li>
	</ul>
	</div>
	</div>
	</div>
	<div class="container bs-docs-container">
	<div class="row">
	<div class="col-md-3">
	<div class="bs-sidebar hidden-print affix" role="complementary">
	<ul class="nav bs-sidenav">
	<li><a href="#intro">Introduction</a></li>
	<li><a href="#motivation">Motivation</a></li>
	<li><a href="#arch">Blur Architecture Review</a></li>
	<li><a href="#affordances">Platform Affordances</a></li>
	<li><a href="#commands">Command Overview</a></li>
	<li><a href="#arguments">Arguments</a></li>
	<li><a href="#installation">Installation</a></li>
	<li><a href="#docs">Documentation</a></li>
	<li><a href="#cli">CLI</a></li>
	</ul>
	</div>
	</div>
	<div class="col-md-9" role="main">
	<!-- In some places, we need to describe both where the system is heading and
	describe its current limitations. For ease of documentation maintenance, let
	us try to write in a way that disclaimers are in their own paragraphs that can
	easily be stripped out when they no longer apply, thus not requiring a bunch
	of re-writing.
	-->
	<section>
	<div class="page-header">
	<h1 id="intro">Introduction</h1>
	</div>
	<p class="lead">
	While many users of Blur will find the search system sufficient for their
	needs out of the box, the Blur platform exposes a simple set of lower-level
	primitives that allow the user to easily and quickly introduce new system behavior.
	</p>
	<!-- Disclaimer para for 0.2.4 -->
	<p class="lead">
	With this release, we expose the initial read-only constructs for the platform.
	Future releases, will allow introduce more rich read-write constructs.
	</p>
	<!-- Disclaimer para for 0.2.4 -->
	<p class="alert">
	<strong>NOTE:</strong> In 0.2.4, the platform capability described here exists,
	but existing functionality of Blur has not yet been ported to use it.
	</p>
	</section>
	<section>
	<div class="page-header">
	<h2 id="motivation">Motivation</h2>
	</div>
	<p>
	In modern open source search platforms, we find Lucene at the very core and
	a monolithic application stack implemented on top of it handling the distributed
	indexing, searching, failures, features, etc. Indeed, this was true of Blur
	as well.
	</p>
	<p>We wanted more flexibility. We wanted to rapidly be able to introduce brand new
	features into the system. So, we supposed it would be helpful if an
	intermediate abstraction could be introduced providing the primitives for a
	distributed Lucene server on which specific search applications could be built.
	</p>
	<p>Some specific goals we had in mind:</p>
	<ul>
	<li>To allow for indexing/searching based on other/new data models (e.g.
	more than just the Row/Record constructs).</li>
	<li>Allow implementations to build whole new APIs given direct access to the Lucene primitives.</li>
	<li>Allow flexibility to build totally custom applications.</li>
	<li>Remove the complexities of threading, networking and concurrency from
	new feature creation.</li>
	</ul>
	</section>
	<section>
	<div class="page-header">
	<h2 id="arch">Blur Architecture Review</h2>
	</div>
	<p>
	The Blur platform provides a set of <code>Command</code> classes that can
	be implemented to achieve new functionality. A basic understanding of how Blur
	works will greatly help in understanding how to implement commands. So let's
	take a moment to review.
	</p>
	<!--
	@TODO: Does this content exist somewhere we can just point to?
	@TODO: If the answer is no, we should beef up this quick-n-dirty explanation.
	-->
	<p>In Blur, we refer to a logical Lucene index as a table. Tables are typically
	very large, and so we divide them up into 'shards'. Now, each shard is exposed
	through a Shard Server, which is sort of a container of shards. The Shard Server(s)
	are organized into a cluster that work together to make all the shards of the table(s)
	available. For scalability, we've divided up the logical table into shards spread
	across the Shard Server(s). We then put another type of server, called a Controller,
	in front of the cluster to present all the shards as a single logical table.
	</p>
	<!--
	@TODO: Find a graphic of the architecture so the bevy of words above can be simplified.
	-->
	<p>For the controller to present all the shards as a single index, it needs to accept
	a request, then scatter the request to all the shard servers, combine the results in
	some meaningful way, and send them back to the client.
	</p>
	</section>
	<section>
	<div class="page-header">
	<h2 id="affordances">Platform Affordances</h2>
	</div>
	<p>
	<code>@TODO</code>
	</p>

	</section>
	<section>
	<div class="page-header">
	<h2 id="commands">Command Overview</h2>
	</div>
	<p>
	As we've gathered from above, the heart of a distributed search system is the ability
	to execute some function across a set of indices and combine the results in a logical
	way to be returned to the user. Not surprisingly, this is also at the heart of the
	Blur Platform. As an introduction, we'll explore how to take a look at finding the number
	of documents that contain a particular term across all shards in a table.
	</p>
	<p>Our first step will be to find the answer for a single shard/index. Lucene's
	<code>IndexReader</code>, to which we'll have access in our command, conveniently
	gives us that. Getting the answer for a single index requires implementing an <code>execute</code>
	method.
	</p>
	<pre>@Override
	public Long execute(IndexContext context) throws IOException {
	return (long) context.getIndexReader().numDocs();
	}</pre>
	<p>We'll learn where the field name and term are defined later in the Arguments
	section. Inside of the <code>execute</code> method, we're focused on finding the answer for
	a single shard/index. To find our answer, we're given an <code>IndexContext</code> which
	provides us access to the underlying Lucene index, so for our trivial command we can simply
	return the answer directly from the IndexReader.
	</p>
	<p>Now we need to let Blur know how to combine the results from the individual shards
	into a single logical response. We do this by implementing the <code>combine</code> method.
	</p>
	<pre>@Override
	public Long combine(CombiningContext context, Map<? extends Location<?>, Long> results) throws IOException {
	long total = 0;
	for (Long l : results.values()) {
	total += l;
	}
	return total;
	}</pre>
	<p>
	Again, we're given some execution context (which we don't need for our sample command) and we're
	given an <code>Map<? extends Location<?>, Long></code> of result values.
	</p>

	</section>
	<section>
	<div class="page-header">
	<h2 id="arguments">Arguments</h2>
	</div>
	<p>
	Recall from above that in the execute method we were able to use some member variables that
	were treated like arguments to the command. Now, let's take a closer look how they were provided.
	</p>
	<p>
	We've kept it very simple for you to declare arguments and for your users to provide arguments. We provide
	two simple annotations that you can place right on your member field declarations indicating whether
	they are required or optional. You can [and are encouraged to] provide some helpful documentation
	on the intent of the argument. As an example, by extending the <code>TableReadCommand</code> you get
	the <code>table</code> argument as required for free. Let's look at how it's declared:
	</p>
	<pre>
	@RequiredArgument("The name of the table.")
	private String table;
	</pre>
	<p>Naturally, we can also declare optional arguments as well:</p>
	<pre>
	@OptionalArgument("The number of results to be returned. default=10")
	private short size = 10;
	</pre>
	<p>
	By annotating your parameters, the Blur Platform is able to do the basic requirement checking for you
	allowing you to keep the inside of your execute/combine clean of argument validation.
	</p>
	</section>
	<section>
	<div class="page-header">
	<h2 id="installation">Installation</h2>
	</div>
	<p>
	<code>@TODO</code>
	</p>

	</section>
	<section>
	<div class="page-header">
	<h2 id="docs">Documentation</h2>
	</div>
	<p>
	Commands should be self-documenting starting with a good name. But good naming
	is not sufficient, so Blur offers a <code>@Description</code> annotation to provide
	a nice way to better express what your command does. It's simply used like so:
	</p>
	<pre>
	@Description("Returns the number of documents containing the term in the given field.")
	public class DocFreqCommand extends TableReadCommand<Long> {
	...
	}
	</pre>

	</section>
	<section>
	<div class="page-header">
	<h2 id="cli">CLI</h2>
	</div>
	<p>
	See Using Blur -> Shell -> <a href="using-blur.html#shell_platform_commands">Platform Commands</a>.
	</p>

	</section>
	</div>
	</div>
	</div>

	<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
	<script src="resources/js/jquery-2.0.3.min.js"></script>
	<!-- Include all compiled plugins (below), or include individual files as needed -->
	<script src="resources/js/bootstrap.min.js"></script>
	<!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
	<script src="resources/js/respond.min.js"></script>
	<script src="resources/js/docs.js"></script>
	</body>
	</html>