HTTP table-valued-function (TVF) allows users to read data returned from any HTTP endpoint as if accessing relational table format data. As long as the returned data is in a supported format, it can be queried and analyzed directly via SQL. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc data formats.
Typical use cases include:
:::note Supported since version 4.0.2. :::
HTTP( "uri" = "<uri>", "format" = "<format>" [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] )
| Parameter | Description |
|---|---|
| uri | The HTTP address to access. Supports http, https, and hf protocols. Can be a URL to a data file or an API endpoint that returns data. |
| format | Data format, i.e., how the content returned by the HTTP endpoint is parsed. Supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc. |
For hf:// (Hugging Face), please refer to Analyzing Hugging Face Data.
| Parameter | Description | Notes |
|---|---|---|
http.header.xxx | Used to specify arbitrary HTTP Headers, which are passed directly to the HTTP Client. | e.g., "http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb...", the resulting Header will be Authorization: Bearer hf_MWYzOJJoZEymb... |
http.enable.range.request | Whether to use range requests to access the HTTP service. Default is true. | |
http.max.request.size.bytes | Maximum access size limit when using non-range request mode. Default is 100 MB. |
When http.enable.range.request is true, the system will first attempt to access the HTTP service using range requests. If the HTTP service does not support range requests, it will automatically fall back to non-range request mode. The maximum data access size is limited by http.max.request.size.bytes.
Read CSV data from GitHub
SELECT COUNT(*) FROM HTTP( "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv", "format" = "csv", "column_separator" = "," );
Read Parquet data from GitHub
SELECT arr_map, id FROM HTTP( "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet", "format" = "parquet" );
Read JSON data from GitHub and use DESC FUNCTION to view the schema
DESC FUNCTION HTTP( "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json", "format" = "json", "strip_outer_array" = "true" );
You can use the HTTP table function to directly query HTTP API endpoints that return JSON-formatted data. For example, querying a REST API that returns JSON data:
SELECT * FROM HTTP( "uri" = "https://api.example.com/v1/data", "format" = "json", "http.header.Authorization" = "Bearer your_token", "strip_outer_array" = "true" );
:::tip For HTTP API endpoints that do not support Range Requests, the system will automatically fall back to non-Range Request mode. You can manually disable Range Requests via the http.enable.range.request parameter. :::