spark task

The spark task allows you to submit a spark application

  - task: my_spark_app
    type: spark
    executor: emr_executor
    application_source: 'my_app.py',
    class: 'com.mycompany.MySparkApp',
    master: 'yarn',
    conf:
      spark.driver.memory: '1g',
      spark.driver.maxResultSize: '1g',
      spark.yarn.executor.memoryOverhead: '500M'
    application_arguments:
      --query: "select * from my_db.my_input_table where my_date_col >= "
                 "'{{yesterday_ds}}'",
      --output: 'my_output_table'

attributes

task: name of your task (must be made of alphanumeric, dash and/or underscore characters only).

executor: executor name (supported executors)

application_source: the location of the spark application (jar location / python file)

class: the entry point for your application

master: the cluster manager to connect to.See the list of allowed master URL's (Optional)

conf: arbitrary Spark configuration properties (Optional)

application_arguments: arguments passed to the main method of your main class (Optional)

supported executors