blob: a62b42bf4de2abfaf1fec1fae393c1f92f211292 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="author" content="dev@gora.apache.org" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<meta name="Description" content="Apache Gora -- Gora Core Module" />
<meta name="Keywords" content="Apache Gora NoSQL Framework" />
<meta name="Owner" content="dev@gora.apache.org" />
<meta name="Robots" content="index, follow" />
<meta name="Security" content="Public" />
<meta name="Source" content="wiki template" />
<meta
name="DC.Rights"
content="Copyright 2010-2024, The Apache Software Foundation"
/>
<link href="/resources/css/bootstrap.min.css" rel="stylesheet" />
<!-- Fav and touch icons -->
<link
rel="apple-touch-icon-precomposed"
sizes="144x144"
href="http://twitter.github.com/bootstrap/assets/ico/apple-touch-icon-144-precomposed.png"
/>
<link
rel="apple-touch-icon-precomposed"
sizes="114x114"
href="http://twitter.github.com/bootstrap/assets/ico/apple-touch-icon-114-precomposed.png"
/>
<link
rel="apple-touch-icon-precomposed"
sizes="72x72"
href="http://twitter.github.com/bootstrap/assets/ico/apple-touch-icon-72-precomposed.png"
/>
<link
rel="apple-touch-icon-precomposed"
href="http://twitter.github.com/bootstrap/assets/ico/apple-touch-icon-57-precomposed.png"
/>
<link rel="shortcut icon" href="/resources/img/feather-small.png" />
<title>Apache Gora&trade; - Gora Core Module</title>
</head>
<body style="padding-top: 100px">
<nav class="navbar navbar-expand-lg navbar-dark bg-dark fixed-top shadow-lg">
<div class="container-fluid">
<a class="navbar-brand" href="/index.html"
><img
src="/resources/img/gora-logo.png"
alt="Apache Gora"
title="Apache Gora"
height="50px"
/></a>
<button
class="navbar-toggler"
type="button"
data-bs-toggle="collapse"
data-bs-target="#navbarNav"
aria-controls="navbarNav"
aria-expanded="false"
aria-label="Toggle navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarNav">
<ul class="navbar-nav me-auto">
<li class="nav-item">
<a class="nav-link" href="/downloads.html">Downloads</a>
</li>
<li class="nav-item dropdown">
<a
class="nav-link dropdown-toggle"
href="#"
id="navbarDropdown1"
role="button"
data-bs-toggle="dropdown"
aria-expanded="false"
>Community</a
>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown1">
<li>
<a
class="dropdown-item"
href="https://whimsy.apache.org/board/minutes/Gora.html"
>Board Reporting</a
>
</li>
<li>
<a class="dropdown-item" href="/contribute.html"
>Contribute</a
>
</li>
<li>
<a class="dropdown-item" href="/mailing_lists.html"
>Mailing Lists</a
>
</li>
<li>
<a class="dropdown-item" href="/credits.html">People</a>
</li>
<li>
<a class="dropdown-item" href="/related.html"
>Related Projects</a
>
</li>
</ul>
</li>
<li class="nav-item dropdown">
<a
class="nav-link dropdown-toggle"
href="#"
id="navbarDropdown2"
role="button"
data-bs-toggle="dropdown"
aria-expanded="false"
>Documentation</a
>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown2">
<li><a class="dropdown-item" href="/about.html">About</a></li>
<li>
<a class="dropdown-item" href="/current/index.html"
>Current Documentation</a
>
</li>
<li>
<a class="dropdown-item" href="/current/api/javadoc.html"
>JavaDoc Documentation</a
>
</li>
<li>
<a class="dropdown-item" href="/current/tutorial.html"
>Gora Tutorial</a
>
</li>
<li>
<a
class="dropdown-item"
href="https://cwiki.apache.org/confluence/display/GORA/"
>Gora Wiki</a
>
</li>
</ul>
</li>
<li class="nav-item dropdown">
<a
class="nav-link dropdown-toggle"
href="#"
id="navbarDropdown3"
role="button"
data-bs-toggle="dropdown"
aria-expanded="false"
>Development</a
>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown3">
<li>
<a
class="dropdown-item"
href="https://issues.apache.org/jira/browse/GORA"
>Issue Tracking</a
>
</li>
<li>
<a class="dropdown-item" href="/mailing_lists.html"
>Mailing Lists</a
>
</li>
<li>
<a class="dropdown-item" href="/version_control.html"
>Version Control</a
>
</li>
<li>
<a class="dropdown-item" href="/roadmap.html">Roadmap</a>
</li>
</ul>
</li>
<li class="nav-item dropdown">
<a
class="nav-link dropdown-toggle"
href="#"
id="navbarDropdown4"
role="button"
data-bs-toggle="dropdown"
aria-expanded="false"
>
<img
src="/resources/img/feather-small.png"
alt="Apache"
title="Apache"
/>
</a>
<ul class="dropdown-menu" aria-labelledby="navbarDropdown4">
<li>
<a class="dropdown-item" href="http://www.apache.org"
>Apache Home</a
>
</li>
<li>
<a
class="dropdown-item"
href="http://www.apache.org/licenses/"
>Apache License</a
>
</li>
<li>
<a
class="dropdown-item"
href="http://www.apache.org/security/"
>Security</a
>
</li>
<li>
<a
class="dropdown-item"
href="http://www.apache.org/foundation/sponsorship.html"
>Support</a
>
</li>
<li>
<a
class="dropdown-item"
href="http://www.apache.org/foundation/thanks.html"
>Thanks</a
>
</li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<div class="container top-buffer" id="Gora_Gora Core Module">
<h1 id="overview">Overview<a class="headerlink" href="#overview" title="Permalink">&para;</a></h1>
<p>This is the main documentation for DataStore's contained within the
<code>gora-core</code> module which (as it's name implies)
holds most of the core functionality for the gora project.</p>
<p>Every module
in gora depends on gora-core therefore most of the generic documentation
about the project is gathered here as well as the documentation for <code>AvroStore</code>,
<code>DataFileAvroStore</code> and <code>MemStore</code>. In addition to this, gora-core holds all of the
core <strong>MapReduce</strong>, <strong>GoraSparkEngine</strong>, <strong>Persistency</strong>, <strong>Query</strong>, <strong>DataStoreBase</strong> and <strong>Utility</strong> functionality.</p>
<div id="toc"><ul><li><a class="toc-href" href="#avrostore" title="AvroStore">AvroStore</a><ul><li><a class="toc-href" href="#description" title="Description">Description</a></li><li><a class="toc-href" href="#goraproperties" title="gora.properties">gora.properties</a></li><li><a class="toc-href" href="#avrostore-xml-mappings" title="AvroStore XML mappings">AvroStore XML mappings</a></li></ul></li><li><a class="toc-href" href="#datafileavrostore" title="DataFileAvroStore">DataFileAvroStore</a><ul><li><a class="toc-href" href="#description_1" title="Description">Description</a></li><li><a class="toc-href" href="#goraproperties_1" title="gora.properties">gora.properties</a></li><li><a class="toc-href" href="#gora-core-mappings" title="Gora Core mappings">Gora Core mappings</a></li></ul></li><li><a class="toc-href" href="#memstore" title="MemStore">MemStore</a><ul><li><a class="toc-href" href="#description_2" title="Description">Description</a></li><li><a class="toc-href" href="#goraproperties_2" title="gora.properties">gora.properties</a></li><li><a class="toc-href" href="#memstore-xml-mappings" title="MemStore XML mappings">MemStore XML mappings</a></li></ul></li><li><a class="toc-href" href="#gorasparkengine" title="GoraSparkEngine">GoraSparkEngine</a><ul><li><a class="toc-href" href="#description_3" title="Description">Description</a></li></ul></li></ul></div>
<h1 id="avrostore">AvroStore<a class="headerlink" href="#avrostore" title="Permalink">&para;</a></h1>
<h2 id="description">Description<a class="headerlink" href="#description" title="Permalink">&para;</a></h2>
<p>AvroStore can be used for binary-compatible Avro serializations. It supports Binary and JSON serializations.</p>
<h2 id="goraproperties">gora.properties<a class="headerlink" href="#goraproperties" title="Permalink">&para;</a></h2>
<table class="table">
<thead>
<tr>
<th align="left">Property Key</th>
<th align="left">Property Value</th>
<th align="left">Required</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>gora.datastore.default=</td>
<td>org.apache.gora.avro.store.AvroStore</td>
<td>Yes</td>
<td>Implementation of the persistent Java storage class</td>
</tr>
<tr>
<td>gora.avrostore.input.path=</td>
<td>*hdfs://uri/path/to/hdfs/input/path* || *file:///uri/path/to/local/input/path*</td>
<td>Yes</td>
<td>This value should point to the input directory on hdfs (if running Gora in a distributed Hadoop environment) or to some location input directory on the local file system (if running Gora locally).</td>
</tr>
<tr>
<td>gora.avrostore.output.path=</td>
<td>*hdfs://uri/path/to/hdfs/output/path* || *file:///uri/path/to/local/output/path*</td>
<td>Yes</td>
<td>This value should point to the output directory on hdfs (if running Gora in a distributed Hadoop environment) or to some location output location on the local file system (if running Gora locally).</td>
</tr>
<tr>
<td>gora.avrostore.codec.type=</td>
<td>BINARY || JSON</td>
<td>No</td>
<td>The property key specifying avro encoder/decoder type to use. Can take values <code>BINARY</code> or <code>JSON</code> but resolves to BINARY is one is not supplied.</td>
</tr>
</tbody></table>
<h2 id="avrostore-xml-mappings">AvroStore XML mappings<a class="headerlink" href="#avrostore-xml-mappings" title="Permalink">&para;</a></h2>
<p>In the stores covered within the gora-core module, no physical mappings are required.</p>
<h1 id="datafileavrostore">DataFileAvroStore<a class="headerlink" href="#datafileavrostore" title="Permalink">&para;</a></h1>
<h2 id="description_1">Description<a class="headerlink" href="#description_1" title="Permalink">&para;</a></h2>
<p>DataFileAvroStore is file based store which extends &lt;codeAvroStore to use Avro's <code>DataFile{Writer,Reader}</code>'s as a backend.
This datastore supports MapReduce.</p>
<h2 id="goraproperties_1">gora.properties<a class="headerlink" href="#goraproperties_1" title="Permalink">&para;</a></h2>
<p>DataFileAvroStore would be configured exactly the same as in AvroStore above with the following exception</p>
<table class="table">
<thead>
<tr>
<th align="left">Property Key</th>
<th align="left">Property Value</th>
<th align="left">Required</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>gora.datastore.default=</td>
<td>org.apache.gora.avro.store.DataFileAvroStore</td>
<td>Yes</td>
<td>Implementation of the persistent Java storage class</td>
</tr>
</tbody></table>
<h2 id="gora-core-mappings">Gora Core mappings<a class="headerlink" href="#gora-core-mappings" title="Permalink">&para;</a></h2>
<p>In the stores covered within the gora-core module, no physical mappings are required.</p>
<h1 id="memstore">MemStore<a class="headerlink" href="#memstore" title="Permalink">&para;</a></h1>
<h2 id="description_2">Description<a class="headerlink" href="#description_2" title="Permalink">&para;</a></h2>
<p>Essentially this store is a ConcurrentSkipListMap in which operations run as follows</p>
<ul>
<li>put(K key, T Object) - expect average log(n)</li>
<li>get(K key, String [] fields) - expect average log(n)</li>
<li>delete(K key) - expect average log(n)</li>
</ul>
<h2 id="goraproperties_2">gora.properties<a class="headerlink" href="#goraproperties_2" title="Permalink">&para;</a></h2>
<p>MemStore would be configured exactly the same as in AvroStore above with the following exception</p>
<table class="table">
<thead>
<tr>
<th align="left">Property Key</th>
<th align="left">Property Value</th>
<th align="left">Required</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>gora.datastore.default=</td>
<td>org.apache.gora.memory.store.MemStore</td>
<td>Yes</td>
<td>Implementation of the Java class used to hold data in memory</td>
</tr>
</tbody></table>
<h2 id="memstore-xml-mappings">MemStore XML mappings<a class="headerlink" href="#memstore-xml-mappings" title="Permalink">&para;</a></h2>
<p>In the stores covered within the gora-core module, no physical mappings are required.</p>
<h1 id="gorasparkengine">GoraSparkEngine<a class="headerlink" href="#gorasparkengine" title="Permalink">&para;</a></h1>
<h2 id="description_3">Description<a class="headerlink" href="#description_3" title="Permalink">&para;</a></h2>
<p>GoraSparkEngine is Spark backend of Gora. Assume that input and output data stores are:</p>
<pre><code>DataStore&lt;K1, V1&gt; inStore;
DataStore&lt;K2, V2&gt; outStore;
</code></pre>
<p>First step of using GoraSparkEngine is to initialize it:</p>
<pre><code>GoraSparkEngine&lt;K1, V1&gt; goraSparkEngine = new GoraSparkEngine&lt;&gt;(K1.class, V1.class);
</code></pre>
<p>Construct a <code>JavaSparkContext</code>. Register input data store&rsquo;s value class as Kryo class:</p>
<pre><code>SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration Application").setMaster("local");
Class[] c = new Class[1];
c[0] = inStore.getPersistentClass();
sparkConf.registerKryoClasses(c);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
</code></pre>
<p>JavaPairRDD can be retrieved from input data store:</p>
<pre><code>JavaPairRDD&lt;Long, Pageview&gt; goraRDD = goraSparkEngine.initialize(sc, inStore);
</code></pre>
<p>After that, all Spark functionality can be applied. For example running count can be done as follows:</p>
<pre><code>long count = goraRDD.count();
</code></pre>
<p>Map and Reduce functions can be run on a <code>JavaPairRDD</code> as well. Assume that this is the variable after map/reduce is applied:</p>
<pre><code>JavaPairRDD&lt;String, MetricDatum&gt; mapReducedGoraRdd;
</code></pre>
<p>Result can be written as follows:</p>
<pre><code>Configuration sparkHadoopConf = goraSparkEngine.generateOutputConf(outStore);
mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);
</code></pre>
</div>
<!-- /container (main block) -->
<hr />
<div class="container">
<footer>
<p>
Copyright © 2010-2024 The Apache Software Foundation.
Licensed under
<a href="http://www.apache.org/licenses/LICENSE-2.0"
>Apache License 2.0</a
>.
</p>
<p>
Apache Gora, Gora, Apache, the Apache feather logo, and the Apache
Gora project logo are trademarks of The Apache Software Foundation.
</p>
</footer>
</div>
<!-- /container -->
<script src="/resources/js/bootstrap.bundle.min.js"></script>
<script type="text/javascript">
stLight.options({
publisher: "4059fafd-3891-49f9-8c96-e4100290d8e6",
doNotHash: false,
doNotCopy: false,
hashAddressBar: false,
});
</script>
<script src="//cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.0.1/build/highlight.min.js"></script>
<script>
hljs.highlightAll();
</script>
</body>
</html>