blob: c0ebb699feb42b386702c048bcd16dcb8239aa36 [file] [log] [blame] [view]
---
layout: page
title: Configuration
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
All Samza jobs have a configuration file that defines the job. A very basic configuration file looks like this:
{% highlight jproperties %}
# Job
job.factory.class=org.apache.samza.job.local.ThreadJobFactory
job.name=hello-world
# Task
task.class=samza.task.example.MyJavaStreamerTask
task.inputs=example-system.example-stream
# Serializers
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
# Systems
systems.example-system.samza.factory=samza.stream.example.ExampleConsumerFactory
systems.example-system.samza.key.serde=string
systems.example-system.samza.msg.serde=json
{% endhighlight %}
There are four major sections to a configuration file:
1. The job section defines things like the name of the job, and whether to use the YarnJobFactory or ProcessJobFactory/ThreadJobFactory (See the job.factory.class property in [Configuration Table](configuration-table.html)).
2. The task section is where you specify the class name for your [StreamTask](../api/overview.html). It's also where you define what the [input streams](../container/streams.html) are for your task.
3. The serializers section defines the classes of the [serdes](../container/serialization.html) used for serialization and deserialization of specific objects that are received and sent along different streams.
4. The system section defines systems that your StreamTask can read from along with the types of serdes used for sending keys and messages from that system. Usually, you'll define a Kafka system, if you're reading from Kafka, although you can also specify your own self-implemented Samza-compatible systems. See the [hello-samza example project](/startup/hello-samza/{{site.version}})'s Wikipedia system for a good example of a self-implemented system.
### Required Configuration
Configuration keys that absolutely must be defined for a Samza job are:
* `job.factory.class`
* `job.name`
* `task.class`
* `task.inputs`
### Configuration Keys
A complete list of configuration keys can be found on the [Configuration Table](configuration-table.html) page. Note
that configuration keys prefixed with "sensitive." are treated specially, in that the values associated with such keys
will be masked in logs and Samza's YARN ApplicationMaster UI. This is to prevent accidental disclosure only; no
encryption is done.
## [Packaging &raquo;](packaging.html)