| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Tuning Guide |
| |
| ## Partitions and Parallelism |
| |
| The goal of any distributed compute engine is to parallelize work as much as possible, allowing the work to scale |
| by adding more compute resource. |
| |
| The basic unit of concurrency and parallelism in Ballista is the concept of a partition. The leaf nodes of a query |
| are typically table scans that read from files on disk and Ballista currently treats each file within a table as a |
| single partition (in the future, Ballista will support splitting files into partitions but this is not implemented yet). |
| |
| For example, if there is a table "customer" that consists of 200 Parquet files, that table scan will naturally have |
| 200 partitions and the table scan and certain subsequent operations will also have 200 partitions. Conversely, if the |
| table only has a single Parquet file then there will be a single partition and the work will not be able to scale even |
| if the cluster has resource available. Ballista supports repartitioning within a query to improve parallelism. |
| The configuration setting `ballista.shuffle.partitions`can be set to the desired number of partitions. This is |
| currently a global setting for the entire context. The default value for this setting is 16. |
| |
| Note that Ballista will never decrease the number of partitions based on this setting and will only repartition if |
| the source operation has fewer partitions than this setting. |
| |
| Example: Setting the desired number of shuffle partitions when creating a context. |
| |
| ```rust |
| let config = BallistaConfig::builder() |
| .set("ballista.shuffle.partitions", "200") |
| .build()?; |
| |
| let ctx = BallistaContext::remote("localhost", 50050, &config).await?; |
| ``` |
| |
| ## Configuring Executor Concurrency Levels |
| |
| Each executor instance has a fixed number of tasks that it can process concurrently. This is specified by passing a |
| `concurrent_tasks` command-line parameter. The default setting is to use all available CPU cores. |
| |
| Increasing this configuration setting will increase the number of tasks that each executor can run concurrently but |
| this will also mean that the executor will use more memory. If executors are failing due to out-of-memory errors then |
| decreasing the number of concurrent tasks may help. |
| |
| In the future, Ballista will have better support for tracking memory usage and allocating tasks based on available |
| memory, as well as supporting spill-to-disk to reduce memory pressure. |
| |
| ## Push-based vs Pull-based Task Scheduling |
| |
| Ballista supports both push-based and pull-based task scheduling. It is recommended that you try both to determine |
| which is the best for your use case. |
| |
| Pull-based scheduling works in a similar way to Apache Spark and push-based scheduling can result in lower latency. |
| |
| The scheduling policy can be specified in the `--scheduler_policy` parameter when starting the scheduler and executor |
| processes. The default is `pull-based`. |
| |
| ## Viewing Query Plans and Metrics |
| |
| The scheduler provides a web user interface as well as a REST API for monitoring jobs. See the |
| [scheduler documentation](scheduler.md) for more information. |
| |
| To download a query plan in dot format from the scheduler, submit a request to the following API endpoint |
| |
| ``` |
| http://localhost:50050/api/job/{job_id}/dot |
| ``` |
| |
| The resulting file can be converted into an image using `graphviz`: |
| |
| ```bash |
| dot -Tpng query.dot > query.png |
| ``` |
| |
| Here is an example query plan: |
| |
|  |