commit | 28836dfa71d0bf9baac5512fb86bd40c5ea51c46 | [log] [tgz] |
---|---|---|
author | Gian Merlino <gianmerlino@gmail.com> | Sun Aug 14 23:34:36 2022 -0700 |
committer | GitHub <noreply@github.com> | Sun Aug 14 23:34:36 2022 -0700 |
tree | 2d34606b833b6ad83e893203c6e1ab1d5197cd2f | |
parent | 6c5a43106a1f7d0b0258b943c51b93c52563e9ed [diff] |
Fix race in TaskQueue.notifyStatus. (#12901) * Fix race in TaskQueue.notifyStatus. It was possible for manageInternal to relaunch a task while it was being cleaned up, due to a race that happens when notifyStatus is called to clean up a successful task: 1) In a critical section, notifyStatus removes the task from "tasks". 2) Outside a critical section, notifyStatus calls taskRunner.shutdown to let the task runner know it can clear out its data structures. 3) In a critical section, syncFromStorage adds the task back to "tasks", because it is still present in metadata storage. 4) In a critical section, manageInternalCritical notices that the task is in "tasks" and is not running in the taskRunner, so it launches it again. 5) In a (different) critical section, notifyStatus updates the metadata store to set the task status to SUCCESS. 6) The task continues running even though it should not be. The possibility for this race was introduced in #12099, which shrunk the critical section in notifyStatus. Prior to that patch, a single critical section encompassed (1), (2), and (5), so the ordering above was not possible. This patch does the following: 1) Fixes the race by adding a recentlyCompletedTasks set that prevents the main management loop from doing anything with tasks that are currently being cleaned up. 2) Switches the order of the critical sections in notifyStatus, so metadata store updates happen first. This is useful in case of server failures: it ensures that if the Overlord fails in the midst of notifyStatus, then completed-task statuses are still available in ZK or on MMs for the next Overlord. (Those are cleaned up by taskRunner.shutdown, which formerly ran first.) This isn't related to the race described above, but is fixed opportunistically as part of the same patch. 3) Changes the "tasks" list to a map. Many operations require retrieval or removal of individual tasks; those are now O(1) instead of O(N) in the number of running tasks. 4) Changes various log messages to use task ID instead of full task payload, to make the logs more readable. * Fix format string. * Update comment.
Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download
Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.
Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.
You can get started with Druid with our local or Docker quickstart.
Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).
Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.
Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.
Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.
You can find the documentation for the latest Druid release on the project website.
If you would like to contribute documentation, please do so under /docs
in this repository and submit a pull request.
Community support is available on the druid-user mailing list, which is hosted at Google Groups.
Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.
Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.
Please note that JDK 8 or JDK 11 is required to build Druid.
See the latest build guide for instructions on building Apache Druid from source.
Please follow the community guidelines for contributing.
For instructions on setting up IntelliJ dev/intellij-setup.md