| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| Tutorial |
| ======== |
| |
| This tutorial show you the basic concept of *PyDolphinScheduler* and tell all |
| things you should know before you submit or run your first workflow. If you |
| still not install *PyDolphinScheduler* and start Apache DolphinScheduler, you |
| could go and see :ref:`how to getting start PyDolphinScheduler <start:getting started>` |
| |
| Overview of Tutorial |
| -------------------- |
| |
| Here have an overview of our tutorial, and it look a little complex but do not |
| worry about that because we explain this example below as detailed as possible. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :start-after: [start tutorial] |
| :end-before: [end tutorial] |
| |
| Import Necessary Module |
| ----------------------- |
| |
| First of all, we should importing necessary module which we would use later just |
| like other Python package. We just create a minimum demo here, so we just import |
| :class:`pydolphinscheduler.core.process_definition` and |
| :class:`pydolphinscheduler.tasks.shell`. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :start-after: [start package_import] |
| :end-before: [end package_import] |
| |
| If you want to use other task type you could click and |
| :doc:`see all tasks we support <tasks/index>` |
| |
| Process Definition Declaration |
| ------------------------------ |
| |
| We should instantiate object after we import them from `import necessary module`_. |
| Here we declare basic arguments for process definition(aka, workflow). We define |
| the name of process definition, using `Python context manager`_ and it |
| **the only required argument** for object process definition. Beside that we also |
| declare three arguments named `schedule`, `start_time` which setting workflow schedule |
| interval and schedule start_time, and argument `tenant` which changing workflow's |
| task running user in the worker, :ref:`section tenant <concept:tenant>` in *PyDolphinScheduler* |
| :doc:`concept` page have more detail information. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :start-after: [start workflow_declare] |
| :end-before: [end workflow_declare] |
| |
| We could find more detail about process definition in |
| :ref:`concept about process definition <concept:process definition>` if you interested in it. |
| For all arguments of object process definition, you could find in the |
| :class:`pydolphinscheduler.core.process_definition` api documentation. |
| |
| Task Declaration |
| ---------------- |
| |
| Here we declare four tasks, and bot of them are simple task of |
| :class:`pydolphinscheduler.tasks.shell` which running `echo` command in terminal. |
| Beside the argument `command`, we also need setting argument `name` for each task *(not |
| only shell task, `name` is required for each type of task)*. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :dedent: 0 |
| :start-after: [start task_declare] |
| :end-before: [end task_declare] |
| |
| Beside shell task, *PyDolphinScheduler* support multiple tasks and you could |
| find in :doc:`tasks/index`. |
| |
| Setting Task Dependence |
| ----------------------- |
| |
| After we declare both process definition and task, we have one workflow with |
| four tasks, both all tasks is independent so that they would run in parallel. |
| We should reorder the sort and the dependence of tasks. It useful when we need |
| run prepare task before we run actual task or we need tasks running is specific |
| rule. We both support attribute `set_downstream` and `set_upstream`, or bitwise |
| operators `>>` and `<<`. |
| |
| In this example, we set task `task_parent` is the upstream task of task |
| `task_child_one` and `task_child_two`, and task `task_union` is the downstream |
| task of both these two task. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :dedent: 0 |
| :start-after: [start task_relation_declare] |
| :end-before: [end task_relation_declare] |
| |
| Please notice that we could grouping some tasks and set dependence if they have |
| same downstream or upstream. We declare task `task_child_one` and `task_child_two` |
| as a group here, named as `task_group` and set task `task_parent` as upstream of |
| both of them. You could see more detail in :ref:`concept:Tasks Dependence` section in concept |
| documentation. |
| |
| Submit Or Run Workflow |
| ---------------------- |
| |
| Now we finish our workflow definition, with task and task dependence, but all |
| these things are in local, we should let Apache DolphinScheduler daemon know what we |
| define our workflow. So the last thing we have to do here is submit our workflow to |
| Apache DolphinScheduler daemon. |
| |
| We here in the example using `ProcessDefinition` attribute `run` to submit workflow |
| to the daemon, and set the schedule time we just declare in `process definition declaration`_. |
| |
| Now, we could run the Python code like other Python script, for the basic usage run |
| :code:`python tutorial.py` to trigger and run it. |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :dedent: 0 |
| :start-after: [start submit_or_run] |
| :end-before: [end submit_or_run] |
| |
| If you not start your Apache DolphinScheduler server, you could find the way in |
| :ref:`start:start Python gateway server` and it would have more detail about related server |
| start. Beside attribute `run`, we have attribute `submit` for object `ProcessDefinition` |
| and it just submit workflow to the daemon but not setting the schedule information. For |
| more detail you could see :ref:`concept:process definition`. |
| |
| DAG Graph After Tutorial Run |
| ---------------------------- |
| |
| After we run the tutorial code, you could login Apache DolphinScheduler web UI, |
| go and see the `DolphinScheduler project page`_. they is a new process definition be |
| created and named "Tutorial". It create by *PyDolphinScheduler* and the DAG graph as below |
| |
| .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py |
| :language: text |
| :lines: 24-28 |
| |
| .. _`DolphinScheduler project page`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project.html |
| .. _`Python context manager`: https://docs.python.org/3/library/stdtypes.html#context-manager-types |