Please visit the [Download page] (/startup/download) to download the Samza tools package
{% highlight bash %} tar -xvzf samza-tools-*.tgz cd samza-tools- {% endhighlight %}
Generate kafka events tool is used to insert avro serialized events into kafka topics. Right now it can insert two types of events PageViewEvent and ProfileChangeEvent
Before you can generate kafka events, Please follow instructions here to start the zookeeper and kafka server on your local machine.
You can follow below instructions on how to use Generate kafka events tool.
{% highlight bash %}
./scripts/generate-kafka-events.sh usage: Error: Missing required options: t, e generate-kafka-events.sh -b,--broker Kafka broker endpoint Default (localhost:9092). -n,--numEvents <NUM_EVENTS> Number of events to be produced, Default - Produces events continuously every second. -p,--partitions <NUM_PARTITIONS> Number of partitions in the topic, Default (4). -t,--topic <TOPIC_NAME> Name of the topic to write events to. -e,--eventtype <EVENT_TYPE> Type of the event values can be (PageView|ProfileChange).
./scripts/generate-kafka-events.sh -t PageViewStream -e PageView -n 100
./scripts/generate-kafka-events.sh -t ProfileChangeStream -e ProfileChange
{% endhighlight %}
Once you generated the events into the kafka topic. Now you can use samza-sql-console tool to perform processing on the events published into the kafka topic.
There are two ways to use the tool -
Second option allows you to execute multiple sql statements, whereas the first one lets you execute one at a time.
Samza SQL needs all the events in the topic to be uniform schema. And it also needs access to the schema corresponding to the events in a topic. Typically in an organization, there is a deployment of schema registry which maps topics to schemas.
In the absence of schema registry, Samza SQL console tool uses the convention to identify the schemas associated with the topic. If the topic name has string “page” it assumes the topic has PageViewEvents else ProfileChangeEvents.
{% highlight bash %}
./scripts/samza-sql-console.sh usage: Error: One of the (f or s) options needs to be set samza-sql-console.sh -f,--file <SQL_FILE> Path to the SQL file to execute. -s,--sql <SQL_STMT> SQL statement to execute. {% endhighlight %}
You can run below sql commands using Samza sql console. Please make sure you are running generate-kafka-events tool to generate events into ProfileChangeStream before running the below command.
{% highlight bash %}
./scripts/samza-sql-console.sh --sql “Insert into log.consoleOutput select Name, OldCompany from kafka.ProfileChangeStream where NewCompany = ‘LinkedIn’”
{% endhighlight %}