Title: Gora Core Module

Overview

This is the main documentation for DataStore‘s contained within the gora-core module which (as it’s name implies) holds most of the core functionality for the gora project.

Every module in gora depends on gora-core therefore most of the generic documentation about the project is gathered here as well as the documentation for AvroStore, DataFileAvroStore and MemStore. In addition to this, gora-core holds all of the core MapReduce, GoraSparkEngine, Persistency, Query, DataStoreBase and Utility functionality.

AvroStore

Description

AvroStore can be used for binary-compatible Avro serializations. It supports Binary and JSON serializations.

gora.properties

AvroStore XML mappings

In the stores covered within the gora-core module, no physical mappings are required.

DataFileAvroStore

Description

DataFileAvroStore is file based store which extends <codeAvroStore to use Avro's DataFile{Writer,Reader}'s as a backend. This datastore supports MapReduce.

gora.properties

DataFileAvroStore would be configured exactly the same as in AvroStore above with the following exception

Gora Core mappings

In the stores covered within the gora-core module, no physical mappings are required.

MemStore

Description

Essentially this store is a ConcurrentSkipListMap in which operations run as follows

  • put(K key, T Object) - expect average log(n)
  • get(K key, String [] fields) - expect average log(n)
  • delete(K key) - expect average log(n)

gora.properties

MemStore would be configured exactly the same as in AvroStore above with the following exception

MemStore XML mappings

In the stores covered within the gora-core module, no physical mappings are required.

GoraSparkEngine

Description

GoraSparkEngine is Spark backend of Gora. Assume that input and output data stores are:

DataStore<K1, V1> inStore;
DataStore<K2, V2> outStore;

First step of using GoraSparkEngine is to initialize it:

GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class, V1.class);

Construct a JavaSparkContext. Register input data store’s value class as Kryo class:

SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration Application").setMaster("local");
Class[] c = new Class[1];
c[0] = inStore.getPersistentClass();
sparkConf.registerKryoClasses(c);
JavaSparkContext sc = new JavaSparkContext(sparkConf);

JavaPairRDD can be retrieved from input data store:

JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc, inStore);

After that, all Spark functionality can be applied. For example running count can be done as follows:

long count = goraRDD.count();

Map and Reduce functions can be run on a JavaPairRDD as well. Assume that this is the variable after map/reduce is applied:

JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;

Result can be written as follows:

Configuration sparkHadoopConf = goraSparkEngine.generateOutputConf(outStore);
mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);