| --- |
| id: data-management-api |
| title: Data management API |
| sidebar_label: Data management |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| import Tabs from '@theme/Tabs'; |
| import TabItem from '@theme/TabItem'; |
| |
| This topic describes the data management API endpoints for Apache Druid. |
| This includes information on how to mark segments as used or unused and delete them from Druid. |
| |
| In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. |
| Replace it with the information for your deployment. |
| For example, use `http://localhost:8888` for quickstart deployments. |
| |
| :::info |
| - Coordinator APIs for data management are now deprecated. Use new APIs served by the Overlord instead. |
| - Do not use these APIs while an indexing task or kill task is in progress for the same datasource and interval. |
| ::: |
| |
| ## Segment management |
| |
| You can mark segments as used by sending POST requests to the datasource, but the Coordinator may subsequently mark segments as unused if they meet any configured [drop rules](../operations/rule-configuration.md#drop-rules). |
| Even if these API requests update segments to used, you still need to configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. |
| |
| When you use these APIs concurrently with an indexing task or a kill task, the behavior is undefined. |
| Druid terminates some segments and marks others as used. |
| Furthermore, it is possible that all segments could be unused, yet an indexing task might still be able to read data from these segments and complete successfully. |
| |
| All of the following APIs, except [Segment deletion](#segment-deletion) are served by the Overlord as it is the service responsible for performing actions on segment metadata on behalf of indexing tasks. |
| This makes it the single source of truth for segment metadata, thus ensuring a consistent view across the Druid cluster and allowing the Overlord to cache metadata to improve performance. |
| |
| ### Segment IDs |
| |
| You must provide segment IDs when using many of the endpoints described in this topic. |
| For information on segment IDs, see [Segment identification](../design/segments.md#segment-identification). |
| For information on finding segment IDs in the web console, see [Segments](../operations/web-console.md#segments). |
| |
| ### Mark a single segment unused |
| |
| Marks the state of a segment as unused, using the segment ID. |
| This is a "soft delete" of the segment from Historicals. |
| To undo this action, [mark the segment used](#mark-a-single-segment-as-used). |
| |
| Note that this endpoint returns an HTTP `200 OK` response code even if the segment ID or datasource doesn't exist. |
| Check the response payload to confirm if any segment was actually updated. |
| |
| #### URL |
| |
| `DELETE` `/druid/indexer/v1/datasources/{datasource}/segments/{segmentId}` |
| |
| #### Header |
| |
| The following headers are required for this request: |
| |
| ```json |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="1" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segment* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example updates the segment `wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z` from datasource `wikipedia_hour` as `unused`. |
| |
| <Tabs> |
| |
| <TabItem value="2" label="cURL"> |
| |
| |
| ```shell |
| curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z" \ |
| --header 'Content-Type: application/json' \ |
| --header 'Accept: application/json, text/plain' |
| ``` |
| |
| </TabItem> |
| <TabItem value="3" label="HTTP"> |
| |
| |
| ```HTTP |
| DELETE /druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "segmentStateChanged": true, |
| "numChangedSegments": 1 |
| } |
| ``` |
| </details> |
| |
| ### Mark a single segment as used |
| |
| Marks the state of a segment as used, using the segment ID. |
| |
| #### URL |
| |
| `POST` `/druid/indexer/v1/datasources/{datasource}/segments/{segmentId}` |
| |
| #### Header |
| |
| The following headers are required for this request: |
| |
| ```json |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="4" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segments* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example updates the segment with ID `wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z` to used. |
| |
| <Tabs> |
| |
| <TabItem value="5" label="cURL"> |
| |
| |
| ```shell |
| curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z" \ |
| --header 'Content-Type: application/json' \ |
| --header 'Accept: application/json, text/plain' |
| ``` |
| |
| </TabItem> |
| <TabItem value="6" label="HTTP"> |
| |
| |
| ```HTTP |
| POST /druid/indexer/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "segmentStateChanged": true, |
| "numChangedSegments": 1 |
| } |
| ``` |
| </details> |
| |
| ### Mark a group of segments unused |
| |
| Marks the state of a group of segments as unused, using an array of segment IDs or an interval. |
| Pass the array of segment IDs or interval as a JSON object in the request body. |
| |
| For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time. |
| Optionally, specify an array of segment versions with interval. Druid updates only the segments completely contained |
| within the specified interval that match the optional list of versions; partially overlapping segments are not affected. |
| |
| #### URL |
| |
| `POST` `/druid/indexer/v1/datasources/{datasource}/markUnused` |
| |
| #### Request body |
| |
| The group of segments is sent as a JSON request payload that accepts the following properties: |
| |
| |Property|Description|Required|Example| |
| |--------|-----------|--------|-------| |
| |`interval`|ISO 8601 segments interval.|Yes, if `segmentIds` is not specified.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`| |
| |`segmentIds`|List of segment IDs.|Yes, if `interval` is not specified.|`["segmentId1", "segmentId2"]`| |
| |`versions`|List of segment versions. Must be provided with `interval`.|No.|`["2024-03-14T16:00:04.086Z", ""2024-03-12T16:00:04.086Z"]`| |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="7" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segments* |
| |
| </TabItem> |
| <TabItem value="8" label="204 NO CONTENT"> |
| |
| |
| *Invalid datasource name* |
| |
| </TabItem> |
| <TabItem value="9" label="400 BAD REQUEST"> |
| |
| |
| *Invalid request payload* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example marks two segments from the `wikipedia_hour` datasource unused based on their segment IDs. |
| |
| <Tabs> |
| |
| <TabItem value="10" label="cURL"> |
| |
| |
| ```shell |
| curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/markUnused" \ |
| --header 'Content-Type: application/json' \ |
| --data '{ |
| "segmentIds": [ |
| "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", |
| "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" |
| ] |
| }' |
| ``` |
| |
| </TabItem> |
| <TabItem value="11" label="HTTP"> |
| |
| |
| ```HTTP |
| POST /druid/indexer/v1/datasources/wikipedia_hour/markUnused HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| Content-Type: application/json |
| Content-Length: 230 |
| |
| { |
| "segmentIds": [ |
| "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", |
| "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" |
| ] |
| } |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "numChangedSegments": 2 |
| } |
| ``` |
| </details> |
| |
| ### Mark a group of segments used |
| |
| Marks the state of a group of segments as used, using an array of segment IDs or an interval. |
| Pass the array of segment IDs or interval as a JSON object in the request body. |
| |
| For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time. |
| Optionally, specify an array of segment versions with interval. Druid updates only the segments completely contained |
| within the specified interval that match the optional list of versions; partially overlapping segments are not affected. |
| |
| #### URL |
| |
| `POST` `/druid/indexer/v1/datasources/{datasource}/markUsed` |
| |
| #### Request body |
| |
| The group of segments is sent as a JSON request payload that accepts the following properties: |
| |
| |Property|Description|Required|Example| |
| |--------|-----------|--------|-------| |
| |`interval`|ISO 8601 segments interval.|Yes, if `segmentIds` is not specified.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`| |
| |`segmentIds`|List of segment IDs.|Yes, if `interval` is not specified.|`["segmentId1", "segmentId2"]`| |
| |`versions`|List of segment versions. Must be provided with `interval`.|No.|`["2024-03-14T16:00:04.086Z", ""2024-03-12T16:00:04.086Z"]`| |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="12" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segments* |
| |
| </TabItem> |
| <TabItem value="13" label="204 NO CONTENT"> |
| |
| |
| *Invalid datasource name* |
| |
| </TabItem> |
| <TabItem value="14" label="400 BAD REQUEST"> |
| |
| |
| *Invalid request payload* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example marks two segments from the `wikipedia_hour` datasource used based on their segment IDs. |
| |
| <Tabs> |
| |
| <TabItem value="15" label="cURL"> |
| |
| |
| ```shell |
| curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour/markUsed" \ |
| --header 'Content-Type: application/json' \ |
| --data '{ |
| "segmentIds": [ |
| "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", |
| "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" |
| ] |
| }' |
| ``` |
| |
| </TabItem> |
| <TabItem value="16" label="HTTP"> |
| |
| |
| ```HTTP |
| POST /druid/indexer/v1/datasources/wikipedia_hour/markUsed HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| Content-Type: application/json |
| Content-Length: 230 |
| |
| { |
| "segmentIds": [ |
| "wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z", |
| "wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z" |
| ] |
| } |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "numChangedSegments": 2 |
| } |
| ``` |
| </details> |
| |
| ### Mark all segments unused |
| |
| Marks the state of all segments of a datasource as unused. |
| This action performs a "soft delete" of the segments from Historicals. |
| |
| Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. |
| Check the response payload to confirm if any segment was actually updated. |
| |
| #### URL |
| |
| `DELETE` `/druid/indexer/v1/datasources/{datasource}` |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="17" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segments* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| <Tabs> |
| |
| <TabItem value="18" label="cURL"> |
| |
| |
| ```shell |
| curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour" |
| ``` |
| |
| </TabItem> |
| <TabItem value="19" label="HTTP"> |
| |
| |
| ```HTTP |
| DELETE /druid/indexer/v1/datasources/wikipedia_hour HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "numChangedSegments": 24 |
| } |
| ``` |
| </details> |
| |
| ### Mark all non-overshadowed segments used |
| |
| Marks the state of all unused segments of a datasource as used given that they are not already overshadowed by other segments. |
| The endpoint returns the number of changed segments. |
| |
| Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. |
| Check the response payload to get the number of segments actually updated. |
| |
| #### URL |
| |
| `POST` `/druid/indexer/v1/datasources/{datasource}` |
| |
| #### Header |
| |
| The following headers are required for this request: |
| |
| ```json |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="20" label="200 SUCCESS"> |
| |
| |
| *Successfully updated segments* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example updates all unused segments of `wikipedia_hour` to used. |
| `wikipedia_hour` contains one unused segment eligible to be marked as used. |
| |
| <Tabs> |
| |
| <TabItem value="21" label="cURL"> |
| |
| |
| ```shell |
| curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/datasources/wikipedia_hour" \ |
| --header 'Content-Type: application/json' \ |
| --header 'Accept: application/json, text/plain' |
| ``` |
| |
| </TabItem> |
| <TabItem value="22" label="HTTP"> |
| |
| |
| ```HTTP |
| POST /druid/indexer/v1/datasources/wikipedia_hour HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| Content-Type: application/json |
| Accept: application/json, text/plain |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| <details> |
| <summary>View the response</summary> |
| |
| ```json |
| { |
| "numChangedSegments": 1 |
| } |
| ``` |
| </details> |
| |
| ## Segment deletion |
| |
| ### Permanently delete segments |
| |
| The DELETE endpoint sends a [kill task](../ingestion/tasks.md) for a given interval and datasource. The interval value is an ISO 8601 string delimited by `_`. This request permanently deletes all metadata for unused segments and removes them from deep storage. |
| |
| Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist. |
| |
| This endpoint supersedes the deprecated endpoint: `DELETE /druid/coordinator/v1/datasources/{datasource}?kill=true&interval={interval}` |
| |
| #### URL |
| |
| `DELETE` `/druid/coordinator/v1/datasources/{datasource}/intervals/{interval}` |
| |
| #### Responses |
| |
| <Tabs> |
| |
| <TabItem value="23" label="200 SUCCESS"> |
| |
| |
| *Successfully sent kill task* |
| |
| </TabItem> |
| </Tabs> |
| |
| --- |
| |
| #### Sample request |
| |
| The following example sends a kill task to permanently delete segments in the datasource `wikipedia_hour` from the interval `2015-09-12` to `2015-09-13`. |
| |
| <Tabs> |
| |
| <TabItem value="24" label="cURL"> |
| |
| |
| ```shell |
| curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13" |
| ``` |
| |
| </TabItem> |
| <TabItem value="25" label="HTTP"> |
| |
| |
| ```HTTP |
| DELETE /druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13 HTTP/1.1 |
| Host: http://ROUTER_IP:ROUTER_PORT |
| ``` |
| |
| </TabItem> |
| </Tabs> |
| |
| #### Sample response |
| |
| A successful request returns an HTTP `200 OK` and an empty response body. |