blob: aac0710dadf77d8ad6f9991999d064ea726d8155 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Roadmap and Improvement Proposals
The [project introduction](../user-guide/introduction) explains the
overview and goals of DataFusion, and our development efforts largely
align to that vision.
## Planning `EPIC`s
DataFusion uses [GitHub issues] to track planned work. We collect related
tickets using tracking issues marked with the `EPIC` label, containing
discussion and links to more detailed items:
[github issues]: https://github.com/apache/datafusion/issues
- [The current list of `EPIC`s can be found here.](https://github.com/apache/datafusion/issues?q=is%3Aissue%20state%3Aopen%20label%3AEPIC)
- [The current list of `PROPOSAL EPIC` (that are not yet underway) can be found here.](https://github.com/apache/datafusion/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22PROPOSAL%20EPIC%22)
Epics offer a high level roadmap of what the DataFusion community is thinking
about. The epics are not meant to restrict possibilities, but rather help
organize the community and make it easier to see where development is headed,
align our work, and inspire additional contributions.
We also welcome contributions for items not covered by epics. However, before
submitting a large PR, we strongly suggest and request you start a conversation as described in [Discussing New Features](#discussing-new-features) below.
[dev@arrow.apache.org]: mailto:dev@arrow.apache.org
## Quarterly Roadmap
The DataFusion roadmap is driven by the priorities of contributors rather than
any single organization or coordinating committee. We typically discuss our
roadmap using GitHub issues, approximately quarterly, and invite you to join the
discussion.
For more information:
1. [Search for issues labeled `roadmap`](https://github.com/apache/datafusion/issues?q=is%3Aissue%20%20%20roadmap)
2. [DataFusion Road Map: Q1 2026](https://github.com/apache/datafusion/issues/18494)
3. [DataFusion Road Map: Q3-Q4 2025](https://github.com/apache/datafusion/issues/15878)
4. [2024 Q4 / 2025 Q1 Roadmap](https://github.com/apache/datafusion/issues/13274)
## Improvement Proposals
### Discussing New Features
If you plan to work on a new feature that doesn't have an existing ticket, it is
a good idea to open one for discussion. Advanced discussion helps avoid wasted
effort by determining if the feature is a good fit for DataFusion before too
much time is invested. Discussion on a ticket can help gather feedback from the
community and is likely easier to discuss than a 1000 line PR.
Maintainers will mark major proposals as `PROPOSED EPIC` to make them more
visible, but we are very limited on review bandwidth. If you open a ticket and it
doesn't get any response, try `@`-mentioning recently active community members
in the ticket, or [posting to the mailing list or Discord](communication.md).
### Supervising Maintainers
We have found that most successful epics have one or more "supervising
maintainers", a committer ([see here for current list]) who take the lead on
reviewing and committing PRs, helps with design, and coordinates and
communicates with the community. If you want to ship a large feature, we
recommend finding such maintainer upfront; otherwise, your PRs may
remain unreviewed for a very long time.
Supervising maintainers have no additional formal authority and there is
currently no formal process for appointing, approving or tracking who has that
role for a given epic. Instead, we rely on discussion on the ticket or PR.
Helping complete an epic is a significant time commitment, so maintainers are
more likely to help features they are particularly interested in or align with
their own project's use of DataFusion.
If you are willing to be a supervising maintainer for a feature, please say so
explicitly. If you are unsure, we suggest asking directly who is willing to take
the role, as it can be hard to tell sometimes whether a committer is simply
participating and giving general feedback.
[see here for current list]: governance.md
### What Contributions are Good Fits?
DataFusion is designed to be highly extensible, and many features can be
implemented as extensions without changes or additions to the core. Support for
new functions, data formats, and similar functionality can be added using those
extension APIs, and there are already many existing community supported
extensions listed in the [extensions list].
Query engines are complex pieces of software to develop and maintain. Given our
limited maintenance bandwidth, we try to keep the DataFusion core as simple and
focused as possible, while still satisfying the [design goal] of an easy to
start initial experience.
With that in mind, contributions that meet the following criteria are more likely
to be accepted:
1. Bug fixes for existing features
2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
5. "Small" functional improvements to existing features (if they don't change existing behavior)
6. Additional APIs for extending DataFusion's capabilities
7. CI improvements
Contributions that will likely involve more discussion (see Discussing New
Features above) prior to acceptance include:
1. Major new functionality (even if it is part of the "standard SQL")
2. New functions, especially if they aren't part of "standard SQL"
3. New data sources (e.g. support for Apache ORC)
[extensions list]: ../library-user-guide/extensions.md
[design goal]: https://docs.rs/datafusion/latest/datafusion/index.html#design-goals
### Design Build vs. Big Up Front Design
Typically, the DataFusion community attacks large problems by solving them bit
by bit and refining a solution iteratively on the `main` branch as a series of
Pull Requests. This is different from projects which front-load the effort
with a more comprehensive design process.
By "advancing the front" the community always makes tangible progress, and the strategy is
especially effective in a project that relies on individual contributors who may
not have the time or resources to invest in a large upfront design effort.
However, this "bit by bit approach" doesn't always succeed, and sometimes we get
stuck or go down the wrong path and then change directions.
Our process necessarily results in imperfect solutions being the "state of the
code" in some cases, and larger visions are not yet fully realized. However, the
community is good at driving things to completion in the long run. If you see
something that needs improvement or an area that is not yet fully realized,
please consider submitting an issue or PR to improve it. We are always looking
for more contributions.