blob: c416c2b8dfd6976435a89991e7da34945657e6a5 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "191f31b2",
"metadata": {},
"source": [
"# Spark Integration Example\n",
"\n",
"This notebook demonstrates how to connect to Spark and interact with Iceberg tables using Spark Connect.\n",
"\n",
"## Prerequisites\n",
"\n",
"**⚠️ This notebook requires the integration test infrastructure to be running.**\n",
"\n",
"To start the infrastructure, use one of these commands:\n",
"- `make test-integration-setup` - Start just the infrastructure\n",
"- `make notebook-infra` - Start infrastructure and launch JupyterLab\n",
"\n",
"The infrastructure includes:\n",
"- Spark Connect server (port 15002)\n",
"- Iceberg REST catalog\n",
"- S3-compatible storage (MinIO)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6cc20c0",
"metadata": {},
"outputs": [],
"source": [
"# Import required libraries\n",
"from pyspark.sql import SparkSession"
]
},
{
"cell_type": "markdown",
"id": "8c1a3fad",
"metadata": {},
"source": [
"## Connecting to Spark\n",
"\n",
"Connect to the Spark server using Spark Connect."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28bf42fc",
"metadata": {},
"outputs": [],
"source": [
"# Create SparkSession against the remote Spark Connect server\n",
"spark = SparkSession.builder.remote(\"sc://localhost:15002\").getOrCreate()\n",
"spark.sql(\"SHOW CATALOGS\").show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "550f2a5c",
"metadata": {},
"outputs": [],
"source": [
"# Show available namespaces/databases\n",
"spark.sql(\"SHOW NAMESPACES\").show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9971fd4a",
"metadata": {},
"outputs": [],
"source": [
"# Show tables in the default namespace\n",
"spark.sql(\"SHOW TABLES FROM default\").show()"
]
},
{
"cell_type": "markdown",
"id": "2a8d3463",
"metadata": {},
"source": [
"## Exploring Iceberg Tables\n",
"\n",
"Use Spark SQL commands to explore Iceberg table structure and metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b9d7dba",
"metadata": {},
"outputs": [],
"source": [
"# Describe a table\n",
"spark.sql(\"DESCRIBE TABLE default.test_all_types\").show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}