docs/content/python-api-intro.md - iceberg-docs - Git at Google

 ---
 title: "Python API"
 url: python-api-intro
 aliases:
     - "python/api-intro"
 menu:
     main:
         parent: "API"
         weight: 500
 ---
 <!--
  - Licensed to the Apache Software Foundation (ASF) under one or more
  - contributor license agreements.  See the NOTICE file distributed with
  - this work for additional information regarding copyright ownership.
  - The ASF licenses this file to You under the Apache License, Version 2.0
  - (the "License"); you may not use this file except in compliance with
  - the License.  You may obtain a copy of the License at
  -
  -   http://www.apache.org/licenses/LICENSE-2.0
  -
  - Unless required by applicable law or agreed to in writing, software
  - distributed under the License is distributed on an "AS IS" BASIS,
  - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  - See the License for the specific language governing permissions and
  - limitations under the License.
  -->

 # Iceberg Python API

 Much of the python api conforms to the java api. You can get more info about the java api [here](../api).

 ## Catalog

 The Catalog interface, like java provides search and management operations for tables.

 To create a catalog:

 ``` python
 from iceberg.hive import HiveTables

 # instantiate Hive Tables
 conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
         "hive.metastore.warehouse.dir": {tmpdir} }
 tables = HiveTables(conf)
 ```

 and to create a table from a catalog:

 ``` python
 from iceberg.api.schema import Schema\
 from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
 from iceberg.api.partition_spec import PartitionSpecBuilder

 schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
                 NestedField.optional(2, "Bid", DoubleType.get()),
                 NestedField.optional(3, "Ask", DoubleType.get()),
                 NestedField.optional(4, "symbol", StringType.get()))
 partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()

 tables.create(schema, "test.test_123", partition_spec)
 ```


 ## Tables

 The Table interface provides access to table metadata

 + schema returns the current table `Schema`
 + spec returns the current table `PartitonSpec`
 + properties returns a map of key-value `TableProperties`
 + currentSnapshot returns the current table `Snapshot`
 + snapshots returns all valid snapshots for the table
 + snapshot(id) returns a specific snapshot by ID
 + location returns the table’s base location

 Tables also provide refresh to update the table to the latest version.

 ### Scanning
 Iceberg table scans start by creating a `TableScan` object with `newScan`.

 ``` python
 scan = table.new_scan();
 ```

 To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.

 ``` python
 filtered_scan = scan.filter(Expressions.equal("id", 5))
 ```

 String expressions can also be passed to the filter method.

 ``` python
 filtered_scan = scan.filter("id=5")
 ```

 `Schema` projections can be applied against a `TableScan` by passing a list of column names.

 ``` python
 filtered_scan = scan.select(["col_1", "col_2", "col_3"])
 ```

 Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided.

 ``` python
 filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
 ```


 Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable.

 When a scan is configured, `planFiles`, `planTasks`, and `Schema` are used to return files, tasks, and the read projection.

 ``` python
 scan = table.new_scan() \
     .filter("id=5") \
     .select(["id", "data"])

 projection = scan.schema
 for task in scan.plan_tasks():
     print(task)
 ```

 ## Types

 Iceberg data types are located in `iceberg.api.types.types`

 ### Primitives

 Primitive type instances are available from static methods in each type class. Types without parameters use `get`, and types like `DecimalType` use factory methods:

 ```python
 IntegerType.get()    # int
 DoubleType.get()     # double
 DecimalType.of(9, 2) # decimal(9, 2)
 ```

 ### Nested types
 Structs, maps, and lists are created using factory methods in type classes.

 Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.

 Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.

 ```python
 # struct<1 id: int, 2 data: optional string>
 struct = StructType.of([NestedField.required(1, "id", IntegerType.get()),
                         NestedField.optional(2, "data", StringType.get()])
   )
 ```
 ```python
 # map<1 key: int, 2 value: optional string>
 map_var = MapType.of_optional(1, IntegerType.get(),
                           2, StringType.get())
 ```
 ```python
 # array<1 element: int>
 list_var = ListType.of_required(1, IntegerType.get());
 ```

 ## Expressions
 Iceberg’s `Expressions` are used to configure table scans. To create `Expressions`, use the factory methods in `Expressions`.

 Supported `Predicate` expressions are:

 + `is_null`
 + `not_null`
 + `equal`
 + `not_equal`
 + `less_than`
 + `less_than_or_equal`
 + `greater_than`
 + `greater_than_or_equal`

 Supported expression `Operations`are:

 + `and`
 + `or`
 + `not`

 Constant expressions are:

 + `always_true`
 + `always_false`
	---
	title: "Python API"
	url: python-api-intro
	aliases:
	- "python/api-intro"
	menu:
	main:
	parent: "API"
	weight: 500
	---
	<!--
	- Licensed to the Apache Software Foundation (ASF) under one or more
	- contributor license agreements. See the NOTICE file distributed with
	- this work for additional information regarding copyright ownership.
	- The ASF licenses this file to You under the Apache License, Version 2.0
	- (the "License"); you may not use this file except in compliance with
	- the License. You may obtain a copy of the License at
	-
	- http://www.apache.org/licenses/LICENSE-2.0
	-
	- Unless required by applicable law or agreed to in writing, software
	- distributed under the License is distributed on an "AS IS" BASIS,
	- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	- See the License for the specific language governing permissions and
	- limitations under the License.
	-->

	# Iceberg Python API

	Much of the python api conforms to the java api. You can get more info about the java api [here](../api).

	## Catalog

	The Catalog interface, like java provides search and management operations for tables.

	To create a catalog:

	``` python
	from iceberg.hive import HiveTables

	# instantiate Hive Tables
	conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
	"hive.metastore.warehouse.dir": {tmpdir} }
	tables = HiveTables(conf)
	```

	and to create a table from a catalog:

	``` python
	from iceberg.api.schema import Schema\
	from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField
	from iceberg.api.partition_spec import PartitionSpecBuilder

	schema = Schema(NestedField.optional(1, "DateTime", TimestampType.with_timezone()),
	NestedField.optional(2, "Bid", DoubleType.get()),
	NestedField.optional(3, "Ask", DoubleType.get()),
	NestedField.optional(4, "symbol", StringType.get()))
	partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", "day").build()

	tables.create(schema, "test.test_123", partition_spec)
	```


	## Tables

	The Table interface provides access to table metadata

	+ schema returns the current table `Schema`
	+ spec returns the current table `PartitonSpec`
	+ properties returns a map of key-value `TableProperties`
	+ currentSnapshot returns the current table `Snapshot`
	+ snapshots returns all valid snapshots for the table
	+ snapshot(id) returns a specific snapshot by ID
	+ location returns the table’s base location

	Tables also provide refresh to update the table to the latest version.

	### Scanning
	Iceberg table scans start by creating a `TableScan` object with `newScan`.

	``` python
	scan = table.new_scan();
	```

	To configure a scan, call filter and select on the `TableScan` to get a new `TableScan` with those changes.

	``` python
	filtered_scan = scan.filter(Expressions.equal("id", 5))
	```

	String expressions can also be passed to the filter method.

	``` python
	filtered_scan = scan.filter("id=5")
	```

	`Schema` projections can be applied against a `TableScan` by passing a list of column names.

	``` python
	filtered_scan = scan.select(["col_1", "col_2", "col_3"])
	```

	Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided.

	``` python
	filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
	```


	Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable.

	When a scan is configured, `planFiles`, `planTasks`, and `Schema` are used to return files, tasks, and the read projection.

	``` python
	scan = table.new_scan() \
	.filter("id=5") \
	.select(["id", "data"])

	projection = scan.schema
	for task in scan.plan_tasks():
	print(task)
	```

	## Types

	Iceberg data types are located in `iceberg.api.types.types`

	### Primitives

	Primitive type instances are available from static methods in each type class. Types without parameters use `get`, and types like `DecimalType` use factory methods:

	```python
	IntegerType.get() # int
	DoubleType.get() # double
	DecimalType.of(9, 2) # decimal(9, 2)
	```

	### Nested types
	Structs, maps, and lists are created using factory methods in type classes.

	Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.

	Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.

	```python
	# struct<1 id: int, 2 data: optional string>
	struct = StructType.of([NestedField.required(1, "id", IntegerType.get()),
	NestedField.optional(2, "data", StringType.get()])
	)
	```
	```python
	# map<1 key: int, 2 value: optional string>
	map_var = MapType.of_optional(1, IntegerType.get(),
	2, StringType.get())
	```
	```python
	# array<1 element: int>
	list_var = ListType.of_required(1, IntegerType.get());
	```

	## Expressions
	Iceberg’s `Expressions` are used to configure table scans. To create `Expressions`, use the factory methods in `Expressions`.

	Supported `Predicate` expressions are:

	+ `is_null`
	+ `not_null`
	+ `equal`
	+ `not_equal`
	+ `less_than`
	+ `less_than_or_equal`
	+ `greater_than`
	+ `greater_than_or_equal`

	Supported expression `Operations`are:

	+ `and`
	+ `or`
	+ `not`

	Constant expressions are:

	+ `always_true`
	+ `always_false`