blob: 3bbdb0aa7d38f6a2b5f30ad144b8f9eb78bf76cb [file] [view]
---
title: Load Data Using the File Protocol
---
# Load Data into Apache Cloudberry Using the `file://` Protocol
The `file://` protocol is a Apache Cloudberry-specific protocol that allows you to load data from a local segment host server file into Apache Cloudberry.
The `file://` protocol is used in a URI that specifies the location of an operating system file. External tables that you create that specify the `file://` protocol are read-only tables.
The URI includes the host name, port, and path to the file. Each file must locate on a segment host in a location accessible by the Apache Cloudberry superuser (`gpadmin`). The host name used in the URI must match a segment host name registered in the `gp_segment_configuration` system catalog table.
The `LOCATION` clause can have multiple URIs, as shown in [Usage examples](#usage-examples).
The number of URIs you specify in the `LOCATION` clause is the number of segment instances that will work in parallel to access the external table. For each URI, Apache Cloudberry assigns a primary segment on the specified host to the file. For maximum parallelism when loading data, divide the data into as many equally sized files as you have primary segments. This ensures that all segments participate in the load. The number of external files per segment host cannot exceed the number of primary segment instances on that host. For example, if your array has 4 primary segment instances per segment host, you can place 4 external files on each segment host. Tables based on the `file://` protocol can only be readable tables.
The system view `pg_max_external_files` shows how many external table files are permitted per external table. This view lists the available file slots per segment host when using the `file://` protocol. The view is only applicable for the `file://` protocol. For example:
```sql
SELECT * FROM pg_max_external_files;
```
## Usage examples
Load multiple files in CSV format with header rows:
Creates a readable external table, `ext_expenses`, using the `file` protocol. The files are CSV format and have a header row.
```sql
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('file://filehost/data/international/*',
'file://filehost/data/regional/*',
'file://filehost/data/supplement/*.csv')
FORMAT 'CSV' (HEADER);
```
```sql
CREATE EXTERNAL TABLE ext_expenses (
name text, date date, amount float4, category text, desc1 text )
LOCATION ('file://host1:5432/data/expense/*.csv',
'file://host2:5432/data/expense/*.csv',
'file://host3:5432/data/expense/*.csv')
FORMAT 'CSV' (HEADER);
```