Spark provides three locations to configure the system:
conf/spark-env.sh
script on each node.log4j.properties
.Spark properties control most application settings and are configured separately for each application. These properties can be set directly on a SparkConf passed to your SparkContext
. SparkConf
allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set()
method. For example, we could initialize an application as follows:
{% highlight scala %} val conf = new SparkConf() .setMaster(“local”) .setAppName(“CountingSheep”) .set(“spark.executor.memory”, “1g”) val sc = new SparkContext(conf) {% endhighlight %}
In some cases, you may want to avoid hard-coding certain configurations in a SparkConf
. For instance, if you'd like to run the same application with different masters or different amounts of memory. Spark allows you to simply create an empty conf:
{% highlight scala %} val sc = new SparkContext(new SparkConf()) {% endhighlight %}
Then, you can supply configuration values at runtime: {% highlight bash %} ./bin/spark-submit --name “My fancy app” --master local[4] myApp.jar {% endhighlight %}
The Spark shell and spark-submit
tool support two ways to load configurations dynamically. The first are command line options, such as --master
, as shown above. Running ./bin/spark-submit --help
will show the entire list of options.
bin/spark-submit
will also read configuration options from conf/spark-defaults.conf
, in which each line consists of a key and a value separated by whitespace. For example:
spark.master spark://5.6.7.8:7077 spark.executor.memory 512m spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer
Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit
or spark-shell
, then options in the spark-defaults.conf
file.
The application web UI at http://<driver>:4040
lists Spark properties in the “Environment” tab. This is a useful place to check to make sure that your properties have been set correctly. Note that only values explicitly specified through either spark-defaults.conf
or SparkConf will appear. For all other configuration properties, you can assume the default value is used.
Most of the properties that control internal settings have reasonable default values. Some of the most common options to set are:
NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager.
Apart from these, the following properties are also available, and may be useful in some situations:
Each cluster manager in Spark has additional configuration options. Configurations can be found on the pages for each mode:
Certain Spark settings can be configured through environment variables, which are read from the conf/spark-env.sh
script in the directory where Spark is installed (or conf/spark-env.cmd
on Windows). In Standalone and Mesos modes, this file can give machine specific information such as hostnames. It is also sourced when running local Spark applications or submission scripts.
Note that conf/spark-env.sh
does not exist by default when Spark is installed. However, you can copy conf/spark-env.sh.template
to create it. Make sure you make the copy executable.
The following variables can be set in spark-env.sh
:
In addition to the above, there are also options for setting up the Spark standalone cluster scripts, such as number of cores to use on each machine and maximum memory.
Since spark-env.sh
is a shell script, some of these can be set programmatically -- for example, you might compute SPARK_LOCAL_IP
by looking up the IP of a specific network interface.
Spark uses log4j for logging. You can configure it by adding a log4j.properties
file in the conf
directory. One way to start is to copy the existing log4j.properties.template
located there.