The spark
task allows you to submit a spark application
- task: my_spark_app type: spark executor: emr_executor application_source: 'my_app.py', class: 'com.mycompany.MySparkApp', master: 'yarn', conf: spark.driver.memory: '1g', spark.driver.maxResultSize: '1g', spark.yarn.executor.memoryOverhead: '500M' application_arguments: --query: "select * from my_db.my_input_table where my_date_col >= " "'{{yesterday_ds}}'", --output: 'my_output_table'
task
: name of your task (must be made of alphanumeric, dash and/or underscore characters only).
executor
: executor name (supported executors)
application_source
: the location of the spark application (jar location / python file)
class
: the entry point for your application
master
: the cluster manager to connect to.See the list of allowed master URL's (Optional)
conf
: arbitrary Spark configuration properties (Optional)
application_arguments
: arguments passed to the main method of your main class (Optional)