Title: Gora Core Module
This is the main documentation for DataStore‘s contained within the gora-core module which (as it’s name implies) holds most of the core functionality for the gora project.
Every module in gora depends on gora-core therefore most of the generic documentation about the project is gathered here as well as the documentation for AvroStore, DataFileAvroStore and MemStore. In addition to this, gora-core holds all of the core MapReduce, GoraSparkEngine, Persistency, Query, DataStoreBase and Utility functionality.
AvroStore can be used for binary-compatible Avro serializations. It supports Binary and JSON serializations.
In the stores covered within the gora-core module, no physical mappings are required.
DataFileAvroStore is file based store which extends <codeAvroStore to use Avro's DataFile{Writer,Reader}'s as a backend. This datastore supports MapReduce.
DataFileAvroStore would be configured exactly the same as in AvroStore above with the following exception
In the stores covered within the gora-core module, no physical mappings are required.
Essentially this store is a ConcurrentSkipListMap in which operations run as follows
MemStore would be configured exactly the same as in AvroStore above with the following exception
In the stores covered within the gora-core module, no physical mappings are required.
GoraSparkEngine is Spark backend of Gora. Assume that input and output data stores are:
DataStore<K1, V1> inStore; DataStore<K2, V2> outStore;
First step of using GoraSparkEngine is to initialize it:
GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class, V1.class);
Construct a JavaSparkContext
. Register input data store’s value class as Kryo class:
SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration Application").setMaster("local"); Class[] c = new Class[1]; c[0] = inStore.getPersistentClass(); sparkConf.registerKryoClasses(c); JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaPairRDD can be retrieved from input data store:
JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc, inStore);
After that, all Spark functionality can be applied. For example running count can be done as follows:
long count = goraRDD.count();
Map and Reduce functions can be run on a JavaPairRDD
as well. Assume that this is the variable after map/reduce is applied:
JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;
Result can be written as follows:
Configuration sparkHadoopConf = goraSparkEngine.generateOutputConf(outStore); mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);