| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "id": "191f31b2", |
| "metadata": {}, |
| "source": [ |
| "# Spark Integration Example\n", |
| "\n", |
| "This notebook demonstrates how to connect to Spark and interact with Iceberg tables using Spark Connect.\n", |
| "\n", |
| "## Prerequisites\n", |
| "\n", |
| "**⚠️ This notebook requires the integration test infrastructure to be running.**\n", |
| "\n", |
| "To start the infrastructure, use one of these commands:\n", |
| "- `make test-integration-setup` - Start just the infrastructure\n", |
| "- `make notebook-infra` - Start infrastructure and launch JupyterLab\n", |
| "\n", |
| "The infrastructure includes:\n", |
| "- Spark Connect server (port 15002)\n", |
| "- Iceberg REST catalog\n", |
| "- S3-compatible storage (MinIO)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "c6cc20c0", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Import required libraries\n", |
| "from pyspark.sql import SparkSession" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "8c1a3fad", |
| "metadata": {}, |
| "source": [ |
| "## Connecting to Spark\n", |
| "\n", |
| "Connect to the Spark server using Spark Connect." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "28bf42fc", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Create SparkSession against the remote Spark Connect server\n", |
| "spark = SparkSession.builder.remote(\"sc://localhost:15002\").getOrCreate()\n", |
| "spark.sql(\"SHOW CATALOGS\").show()" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "550f2a5c", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Show available namespaces/databases\n", |
| "spark.sql(\"SHOW NAMESPACES\").show()" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "9971fd4a", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Show tables in the default namespace\n", |
| "spark.sql(\"SHOW TABLES FROM default\").show()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "id": "2a8d3463", |
| "metadata": {}, |
| "source": [ |
| "## Exploring Iceberg Tables\n", |
| "\n", |
| "Use Spark SQL commands to explore Iceberg table structure and metadata." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "id": "1b9d7dba", |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Describe a table\n", |
| "spark.sql(\"DESCRIBE TABLE default.test_all_types\").show()" |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": "Python 3 (ipykernel)", |
| "language": "python", |
| "name": "python3" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 3 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython3", |
| "version": "3.12.9" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 5 |
| } |