blob: bcabcc28a889dc958a2518189661e5d882f04a25 [file] [view]
---
{
"title": "HTTP",
"language": "en",
"description": "HTTP table-valued-function (tvf) allows users to read and access file content on HTTP paths as if accessing relational table format data."
}
---
HTTP table-valued-function (tvf) allows users to read and access file content on HTTP paths as if accessing relational table format data. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file formats.
:::note
Supported since 4.0.2
:::
## Syntax
```sql
HTTP(
"uri" = "<uri>",
"format" = "<format>"
[, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
)
```
### Required Parameters
| Parameter | Description |
|-------------------|------------------------------|
| uri | HTTP address for access. Supports `http`, `https` and `hf` protocols.|
| format | File format, supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` |
About `hf://`(Hugging Face), please see [Analyzing Hugging Face Data](../../../lakehouse/huggingface.md).
### Optional Parameters
| Parameter | Description | Notes |
|-------|-----------|------------------------|
| `http.header.xxx` | Used to specify arbitrary HTTP Headers, which will be directly passed to the HTTP Client. For example `"http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb..."`, the final Header will be `Authorization: Bearer hf_MWYzOJJoZEymb...` |
| `http.enable.range.request` | Whether to use range request to access HTTP service. Default is `true`.|
| `http.max.request.size.bytes` | Maximum access size limit when using non-range request mode. Default is 100MB |
When `http.enable.range.request` is `true`, the system will first try to access the HTTP service using range request. If the HTTP service does not support range request, it will automatically fall back to non-range request mode. And the maximum access data size is limited by `http.max.request.size.bytes`.
## Examples
- Read CSV data from GitHub
```sql
SELECT COUNT(*) FROM
HTTP(
"uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
"format" = "csv",
"column_separator" = ","
);
```
- Access Parquet data from GitHub
```sql
SELECT arr_map, id FROM
HTTP(
"uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
"format" = "parquet"
);
```
- Access JSON data from GitHub and use with `desc function`
```sql
DESC FUNCTION
HTTP(
"uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
"format" = "json",
"strip_outer_array" = "true"
);
```