layout: page title: Configuration

All Samza applications have a properties format file that defines its configurations. A complete list of configuration keys can be found on the Samza Configurations Table page.

A very basic configuration file looks like this:

{% highlight jproperties %}

Application Configurations

job.factory.class=org.apache.samza.job.local.YarnJobFactory app.name=hello-world job.default.system=example-system serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory

Systems & Streams Configurations

systems.example-system.samza.factory=samza.stream.example.ExampleConsumerFactory systems.example-system.samza.key.serde=string systems.example-system.samza.msg.serde=json

Checkpointing

task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory

State Storage

stores.example-store.factory=org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory stores.example-store.key.serde=string stores.example-store.value.serde=json

Metrics

metrics.reporter.example-reporter.class=org.apache.samza.metrics.reporter.JmxReporterFactory metrics.reporters=example-reporter {% endhighlight %}

There are 6 sections sections to a configuration file:

  1. The Application section defines things like the name of the job, job factory (See the job.factory.class property in Configuration Table), the class name for your StreamTask and serialization and deserialization of specific objects that are received and sent along different streams.
  2. The Systems & Streams section defines systems that your StreamTask can read from along with the types of serdes used for sending keys and messages from that system. You may use any of the predefined systems that Samza ships with, although you can also specify your own self-implemented Samza-compatible systems. See the hello-samza example project's Wikipedia system for a good example of a self-implemented system.
  3. The Checkpointing section defines how the messages processing state is saved, which provides fault-tolerant processing of streams (See Checkpointing for more details).
  4. The State Storage section defines the stateful stream processing settings for Samza.
  5. The Deployment section defines how the Samza application will be deployed (To a cluster manager (YARN), or as a standalone library) as well as settings for each option. See Deployment Models for more details.
  6. The Metrics section defines how the Samza application metrics will be monitored and collected. (See Monitoring)

Note that configuration keys prefixed with sensitive. are treated specially, in that the values associated with such keys will be masked in logs and Samza's YARN ApplicationMaster UI. This is to prevent accidental disclosure only; no encryption is done.