Spark-pinot connector to read data from Pinot.
Detailed read model documentation is here; Spark-Pinot Connector Read Model
import org.apache.spark.sql.SparkSession val spark: SparkSession = SparkSession .builder() .appName("spark-pinot-connector-test") .master("local") .getOrCreate() import spark.implicits._ val data = spark.read .format("pinot") .option("table", "airlineStats") .option("tableType", "offline") .load() .filter($"DestStateName" === "Florida") data.show(100)
There are more examples included in src/test/scala/.../ExampleSparkPinotConnectorTest.scala
. You can run the examples locally (e.g. using your IDE) in standalone mode by starting a local Pinot cluster. See: https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally
You can also run the tests in cluster mode using following command:
export SPARK_CLUSTER=<YOUR_YARN_OR_SPARK_CLUSTER> # Edit the ExampleSparkPinotConnectorTest to get rid of `.master("local")` and rebuild the jar before running this command spark-submit \ --class org.apache.pinot.connector.spark.v3.datasource.ExampleSparkPinotConnectorTest \ --jars ./target/pinot-spark-3-connector-0.13.0-SNAPSHOT-shaded.jar \ --master $SPARK_CLUSTER \ --deploy-mode cluster \ ./target/pinot-spark-3-connector-0.13.0-SNAPSHOT-tests.jar
Spark-Pinot connector uses Spark DatasourceV2 API
. Please check the Databricks presentation for DatasourceV2 API;