layout: page title: Deploy Samza Job To CDH

The tutorial assumes you have successfully run hello-samza and now you want to deploy the job to your Cloudera Data Hub (CDH). This tutorial is based on CDH 5.4.0 and uses hello-samza as the example job.

Compile Package for CDH 5.4.0

We need to use a specific compile option to build hello-samza package for CDH 5.4.0

{% highlight bash %} mvn clean package -Dhadoop.version=cdh5.4.0 {% endhighlight %}

Upload Package to Cluster

There are a few ways of uploading the package to the cluster's HDFS. If you do not have the job package in your cluster, scp from you local machine to the cluster. Then run

{% highlight bash %} hadoop fs -put path/to/hello-samza-1.1.0-dist.tar.gz /path/for/tgz {% endhighlight %}

Get Deploying Scripts

Untar the job package (assume you will run from the current directory)

{% highlight bash %} tar -xvf path/to/samza-job-package-1.1.0-dist.tar.gz -C ./ {% endhighlight %}

Add Package Path to Properties File

{% highlight bash %} vim config/wikipedia-parser.properties {% endhighlight %}

Change the yarn package path:

{% highlight jproperties %} yarn.package.path=hdfs://:/path/to/tgz {% endhighlight %}

Set Yarn Environment Variable

{% highlight bash %} export HADOOP_CONF_DIR=/etc/hadoop/conf {% endhighlight %}

Run Samza Job

{% highlight bash %} bin/run-app.sh --config-path=$PWD/config/wikipedia-parser.properties {% endhighlight %}