blob: 6d6418bdfe7f3907abfda988153309de3029af25 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Storm Solr Integration</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-10">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li><a href="/documentation.html" id="documentation">Documentation</a></li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2015/11/05/storm096-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Storm Solr Integration</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<p>Storm and Trident integration for Apache Solr. This package includes a bolt and a trident state that enable a Storm topology
stream the contents of storm tuples to index Solr collections.</p>
<h1 id="index-storm-tuples-into-a-solr-collection">Index Storm tuples into a Solr collection</h1>
<p>The The bolt and trident state provided use one of the supplied mappers to build a <code>SolrRequest</code> object that is
responsible for making the update calls to Solr, thus updating the index of the collection specified.</p>
<h1 id="usage-examples">Usage Examples</h1>
<p>In this section we provide some simple code snippets on how to build Storm and Trident topologies to index Solr. In subsequent sections we
describe in detail the two key components of the Storm Solr integration, the <code>SolrUpdateBolt</code>, and the <code>Mappers</code>, <code>SolrFieldsMapper</code>, and <code>SolrJsonMapper</code>.</p>
<h2 id="storm-bolt-with-json-mapper-and-count-based-commit-strategy">Storm Bolt With JSON Mapper and Count Based Commit Strategy</h2>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="k">new</span> <span class="nf">SolrUpdateBolt</span><span class="p">(</span><span class="n">solrConfig</span><span class="o">,</span> <span class="n">solrMapper</span><span class="o">,</span> <span class="n">solrCommitStgy</span><span class="o">)</span>
<span class="c1">// zkHostString for Solr 'gettingstarted' example</span>
<span class="n">SolrConfig</span> <span class="n">solrConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SolrConfig</span><span class="o">(</span><span class="s">"127.0.0.1:9983"</span><span class="o">);</span>
<span class="c1">// JSON Mapper used to generate 'SolrRequest' requests to update the "gettingstarted" Solr collection with JSON content declared the tuple field with name "JSON"</span>
<span class="n">SolrMapper</span> <span class="n">solrMapper</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SolrJsonMapper</span><span class="o">.</span><span class="na">Builder</span><span class="o">(</span><span class="s">"gettingstarted"</span><span class="o">,</span> <span class="s">"JSON"</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
<span class="c1">// Acks every other five tuples. Setting to null acks every tuple</span>
<span class="n">SolrCommitStrategy</span> <span class="n">solrCommitStgy</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CountBasedCommit</span><span class="o">(</span><span class="mi">5</span><span class="o">);</span>
</code></pre></div>
<h2 id="trident-topology-with-fields-mapper">Trident Topology With Fields Mapper</h2>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="k">new</span> <span class="nf">SolrStateFactory</span><span class="p">(</span><span class="n">solrConfig</span><span class="o">,</span> <span class="n">solrMapper</span><span class="o">);</span>
<span class="c1">// zkHostString for Solr 'gettingstarted' example</span>
<span class="n">SolrConfig</span> <span class="n">solrConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SolrConfig</span><span class="o">(</span><span class="s">"127.0.0.1:9983"</span><span class="o">);</span>
<span class="cm">/* Solr Fields Mapper used to generate 'SolrRequest' requests to update the "gettingstarted" Solr collection. The Solr index is updated using the field values of the tuple fields that match static or dynamic fields declared in the schema object build using schemaBuilder */</span>
<span class="n">SolrMapper</span> <span class="n">solrMapper</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SolrFieldsMapper</span><span class="o">.</span><span class="na">Builder</span><span class="o">(</span><span class="n">schemaBuilder</span><span class="o">,</span> <span class="s">"gettingstarted"</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
<span class="c1">// builds the Schema object from the JSON representation of the schema as returned by the URL http://localhost:8983/solr/gettingstarted/schema/ </span>
<span class="n">SchemaBuilder</span> <span class="n">schemaBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RestJsonSchemaBuilder</span><span class="o">(</span><span class="s">"localhost"</span><span class="o">,</span> <span class="s">"8983"</span><span class="o">,</span> <span class="s">"gettingstarted"</span><span class="o">)</span>
</code></pre></div>
<h2 id="solrupdatebolt">SolrUpdateBolt</h2>
<p><code>SolrUpdateBolt</code> streams tuples directly into Apache Solr. The Solr index is updated using<code>SolrRequest</code> requests.
The <code>SolrUpdateBolt</code> is configurable using implementations of <code>SolrConfig</code>, <code>SolrMapper</code>, and optionally <code>SolrCommitStrategy</code>.</p>
<p>The data to stream onto Solr is extracted from the tuples using the strategy defined in the <code>SolrMapper</code> implementation.</p>
<p>The <code>SolrRquest</code> can be sent every tuple, or according to a strategy defined by <code>SolrCommitStrategy</code> implementations.
If a <code>SolrCommitStrategy</code> is in place and one of the tuples in the batch fails, the batch is not committed, and all the tuples in that
batch are marked as Fail, and retried. On the other hand, if all tuples succeed, the <code>SolrRequest</code> is committed and all tuples are successfully acked.</p>
<p><code>SolrConfig</code> is the class containing Solr configuration to be made available to Storm Solr bolts. Any configuration needed in the bolts should be put in this class.</p>
<h2 id="solrmapper">SolrMapper</h2>
<p><code>SorlMapper</code> implementations define the strategy to extract information from the tuples. The public method
<code>toSolrRequest</code> receives a tuple or a list of tuples and returns a <code>SolrRequest</code> object that is used to update the Solr index.</p>
<h3 id="solrjsonmapper">SolrJsonMapper</h3>
<p>The <code>SolrJsonMapper</code> creates a Solr update request that is sent to the URL endpoint defined by Solr as the resource
destination for requests in JSON format. </p>
<p>To create a <code>SolrJsonMapper</code> the client must specify the name of the collection to update as well as the
tuple field that contains the JSON object used to update the Solr index. If the tuple does not contain the field specified,
a <code>SolrMapperException</code> will be thrown when the method <code>toSolrRequest</code>is called. If the field exists, its value can either
be a String with the contents in JSON format, or a Java object that will be serialized to JSON</p>
<p>Code snippet illustrating how to create a <code>SolrJsonMapper</code> object to update the <code>gettingstarted</code> Solr collection with JSON content
declared in the tuple field with name &quot;JSON&quot;
<code>java
SolrMapper solrMapper = new SolrJsonMapper.Builder(&quot;gettingstarted&quot;, &quot;JSON&quot;).build();
</code></p>
<h3 id="solrfieldsmapper">SolrFieldsMapper</h3>
<p>The <code>SolrFieldsMapper</code> creates a Solr update request that is sent to the Solr URL endpoint that handles the updates of <code>SolrInputDocument</code> objects.</p>
<p>To create a <code>SolrFieldsMapper</code> the client must specify the name of the collection to update as well as the <code>SolrSchemaBuilder</code>.
The Solr <code>Schema</code> is used to extract information about the Solr schema fields and corresponding types. This metadata is used
to get the information from the tuples. Only tuple fields that match a static or dynamic Solr fields are added to the document. Tuple fields
that do not match the schema are not added to the <code>SolrInputDocument</code> being prepared for indexing. A debug log message is printed for the
tuple fields that do not match the schema and hence are not indexed.</p>
<p>The <code>SolrFieldsMapper</code> supports multivalue fields. A multivalue tuple field must be tokenized. The default token is |. Any
arbitrary token can be specified by calling the method <code>org.apache.storm.solr.mapper.SolrFieldsMapper.Builder.setMultiValueFieldToken</code>
that is part of the <code>SolrFieldsMapper.Builder</code> builder class. </p>
<p>Code snippet illustrating how to create a <code>SolrFieldsMapper</code> object to update the <code>gettingstarted</code> Solr collection. The multivalue
field separates each value with the token % instead of the default | . To use the default token you can ommit the call to the method
<code>setMultiValueFieldToken</code>.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="k">new</span> <span class="n">SolrFieldsMapper</span><span class="o">.</span><span class="na">Builder</span><span class="o">(</span>
<span class="k">new</span> <span class="n">RestJsonSchemaBuilder</span><span class="o">(</span><span class="s">"localhost"</span><span class="o">,</span> <span class="s">"8983"</span><span class="o">,</span> <span class="s">"gettingstarted"</span><span class="o">),</span> <span class="s">"gettingstarted"</span><span class="o">)</span>
<span class="o">.</span><span class="na">setMultiValueFieldToken</span><span class="o">(</span><span class="s">"%"</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
</code></pre></div>
<h1 id="build-and-run-bundled-examples">Build And Run Bundled Examples</h1>
<p>To be able to run the examples you must first build the java code in the package <code>storm-solr</code>,
and then generate an uber jar with all the dependencies.</p>
<h2 id="build-the-storm-apache-solr-integration-code">Build the Storm Apache Solr Integration Code</h2>
<p><code>mvn clean install -f REPO_HOME/storm/external/storm-solr/pom.xml</code></p>
<h2 id="use-the-maven-shade-plugin-to-build-the-uber-jar">Use the Maven Shade Plugin to Build the Uber Jar</h2>
<p>Add the following to <code>REPO_HOME/storm/external/storm-solr/pom.xml</code></p>
<div class="highlight"><pre><code class="language-" data-lang=""> &lt;plugin&gt;
&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
&lt;artifactId&gt;maven-shade-plugin&lt;/artifactId&gt;
&lt;version&gt;2.4.1&lt;/version&gt;
&lt;executions&gt;
&lt;execution&gt;
&lt;phase&gt;package&lt;/phase&gt;
&lt;goals&gt;
&lt;goal&gt;shade&lt;/goal&gt;
&lt;/goals&gt;
&lt;configuration&gt;
&lt;transformers&gt;
&lt;transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"&gt;
&lt;mainClass&gt;org.apache.storm.solr.topology.SolrJsonTopology&lt;/mainClass&gt;
&lt;/transformer&gt;
&lt;/transformers&gt;
&lt;/configuration&gt;
&lt;/execution&gt;
&lt;/executions&gt;
&lt;/plugin&gt;
</code></pre></div>
<p>create the uber jar by running the commmand:</p>
<p><code>mvn package -f REPO_HOME/storm/external/storm-solr/pom.xml</code></p>
<p>This will create the uber jar file with the name and location matching the following pattern:</p>
<p><code>REPO_HOME/storm/external/storm/target/storm-solr-0.11.0-SNAPSHOT.jar</code></p>
<h2 id="run-examples">Run Examples</h2>
<p>Copy the file <code>REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT.jar</code> to <code>STORM_HOME/extlib</code></p>
<p><strong>The code examples provided require that you first run the <a href="http://lucene.apache.org/solr/quickstart.html">Solr gettingstarted</a> example</strong> </p>
<h3 id="run-storm-topology">Run Storm Topology</h3>
<p>STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.topology.SolrFieldsTopology</p>
<p>STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.topology.SolrJsonTopology</p>
<h3 id="run-trident-topology">Run Trident Topology</h3>
<p>STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.trident.SolrFieldsTridentTopology</p>
<p>STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.trident.SolrJsonTridentTopology</p>
<h3 id="verify-results">Verify Results</h3>
<p>The aforementioned Storm and Trident topologies index the Solr <code>gettingstarted</code> collection with objects that have the following <code>id</code> pattern:</p>
<p>*id_fields_test_val* for <code>SolrFieldsTopology</code> and <code>SolrFieldsTridentTopology</code></p>
<p>*json_test_val* for <code>SolrJsonTopology</code> and <code>SolrJsonTridentTopology</code></p>
<p>Querying Solr for these patterns, you will see the values that have been indexed by the Storm Apache Solr integration: </p>
<p>curl -X GET -H &quot;Content-type:application/json&quot; -H &quot;Accept:application/json&quot; <a href="http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*id_fields_test_val*&amp;wt=json&amp;indent=true">http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*id_fields_test_val*&amp;wt=json&amp;indent=true</a></p>
<p>curl -X GET -H &quot;Content-type: application/json&quot; -H &quot;Accept: application/json&quot; <a href="http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*id_fields_test_val*&amp;wt=json&amp;indent=true">http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*id_fields_test_val*&amp;wt=json&amp;indent=true</a></p>
<p>You can also see the results by opening the Apache Solr UI and pasting the <code>id</code> pattern in the <code>q</code> textbox in the queries page</p>
<p><a href="http://localhost:8983/solr/#/gettingstarted_shard1_replica2/query">http://localhost:8983/solr/#/gettingstarted_shard1_replica2/query</a></p>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Storm</h5>
<p>Storm integrates with any queueing system and any database system. Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/documentation/Rationale.html">Rationale</a></li>
<li><a href="/tutorial.html">Tutorial</a></li>
<li><a href="/documentation/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/documentation/Creating-a-new-Storm-project.html">Creating a new Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/doc-index.html">Index</a></li>
<li><a href="/documentation.html">Manual</a></li>
<li><a href="https://storm.apache.org/javadoc/apidocs/index.html">Javadoc</a></li>
<li><a href="/documentation/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2015 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>