There are couple of ways to use Samza SQL
Samza SQL console tool documented here uses Samza standalone to run the Samza SQL on your local machine. This is the quickest way to play with Samza SQL. Please follow the instructions here to get access to the Samza tools on your machine.
Please follow the instructions from the Kafka quickstart to start the zookeeper and Kafka server.
The below sql statements requires a topic named ProfileChangeStream to be created on the Kafka broker. You can follow the instructions in the Kafka quick start guide to create a topic named “ProfileChangeStream”.
{% highlight bash %} ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ProfileChangeStream {% endhighlight %}
Use generate-kafka-events from Samza tools to generate events into the ProfileChangeStream
{% highlight bash %} cd samza-tools- ./scripts/generate-kafka-events.sh -t ProfileChangeStream -e ProfileChange {% endhighlight %}
Below are some of the sql queries that you can execute using the samza-sql-console tool from Samza tools package.
{% highlight bash %}
./scripts/samza-sql-console.sh --sql “insert into log.consoleoutput select * from kafka.ProfileChangeStream”
./scripts/samza-sql-console.sh --sql “insert into log.consoleoutput select Name, OldCompany, NewCompany from kafka.ProfileChangeStream”
./scripts/samza-sql-console.sh --sql “insert into log.consoleoutput select Name as key, Name, NewCompany, RegexMatch(‘.*soft’, OldCompany) from kafka.ProfileChangeStream where NewCompany = ‘LinkedIn’” {% endhighlight %}
The hello-samza project is an example project designed to help you run your first Samza application. It has examples of applications using the low level task API, high level API as well as Samza SQL.
This tutorial demonstrates a simple Samza application that uses SQL to perform stream processing.
Please follow the instructions from hello-samza-high-level-yarn on how to build the hello-samza repository and start the yarn grid.
Please follow the steps in the section “Create ProfileChangeStream Kafka topic” and “Generate events into ProfileChangeStream topic” above.
Before you can run a Samza application, you need to build a package for it. Please follow the instructions from hello-samza-high-level-yarn on how to build the hello-samza application package.
After you've built your Samza package, you can start the app on the grid using the run-app.sh script.
{% highlight bash %} ./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/page-view-filter-sql.properties {% endhighlight %}
The app executes the following SQL command : {% highlight sql %} insert into kafka.NewLinkedInEmployees select Name from ProfileChangeStream where NewCompany = ‘LinkedIn’ {% endhighlight %}
This SQL performs the following
Give the job a minute to startup, and then tail the Kafka topic:
{% highlight bash %} ./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic NewLinkedInEmployees {% endhighlight %}
Congratulations! You've now setup a local grid that includes YARN, Kafka, and ZooKeeper, and run a Samza SQL application on it.
To shutdown the app, use the same run-app.sh script with an extra --operation=kill argument {% highlight bash %} ./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/page-view-filter-sql.properties --operation=kill {% endhighlight %}
Please follow the instructions from Hello Samza High Level API - YARN Deployment on how to shutdown and cleanup the app.