In this tutorial page we describe how to execute SAMOA on top of Apache S4.
The following dependencies are needed to run SAMOA smoothly on Apache S4
Gradle is a build automation tool and is used to build Apache S4. The installation guide can be found here. The following instructions is a simplified installation guide.
wget http://services.gradle.org/distributions/gradle-1.6-bin.zip
unzip gradle-1.6-bin.zip
export GRADLE_HOME=/foo/bar/gradle-1.6
export PATH=$PATH:$GRADLE_HOME/bin
gradle
Now you are all set to install Apache S4
S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. The installation process is as follows:
wget http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip
or clone from git. git clone https://git-wip-us.apache.org/repos/asf/incubator-s4.git
.unzip apache-s4-0.6.0-incubating-src.zip
or go in the cloned directory.export S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src
.export PATH=$PATH:$S4_HOME
.gradle tasks
.gradle wrapper
.gradle install
in the S4_HOME directory.gradle s4-tools::installApp
.Done. Now you can configure and run your Apache S4 cluster.
Once the S4 dependencies are installed, you can simply clone the repository and install SAMOA.
git clone http://git.apache.org/incubator-samoa.git cd incubator-samoa mvn -Ps4 package
The deployable jars for SAMOA will be in target/SAMOA-<variant>-<version>-SNAPSHOT.jar
. For example, in our case for S4 target/SAMOA-S4-0.3.0-SNAPSHOT.jar
.
This section will go through the bin/samoa-s4.properties
file and how to configure it. In order for SAMOA to run correctly in a distributed environment there are some variables that need to be defined. Since Apache S4 uses ZooKeeper for cluster management we need to define where it is running.
# Zookeeper Server zookeeper.server=localhost zookeeper.port=2181
Apache S4 also distributes the application via HTTP, therefore the server and port which contains the S4 application must be provided.
# Simple HTTP Server providing the packaged S4 jar http.server.ip=localhost http.server.port=8000
Apache S4 uses the concept of logical clusters to define a group of machines, which are identified by an ID and start serving on a specific port.
# Name of the S4 cluster cluster.name=cluster cluster.port=12000
SAMOA can be deployed on a single machine using only one resource or in a cluster environments. The following property can be defined to deploy as a local
application or on a cluster
.
# Deployment strategy samoa.deploy.mode=local
In order to deploy SAMOA in a distributed environment you MUST configure the bin/samoa-s4.properties
file correctly. If you are running locally it is optional to modify the properties file.
The deployment is done by running the SAMOA execution script bin/samoa
with some additional parameters. The execution syntax is as follows: bin/samoa <platform> <jar-location> <task & options>
Example:
bin/samoa S4 target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation"
The <platform> can be s4 or storm.
The <jar-location> must be the absolute path to the platform specific jar file.
The <task & options> should be the name of a known task and the options belonging to that task.