blob: a5fe479838c44471d3e03c9633d23d7c51ab0c4a [file] [log] [blame]
Sub-module in Apache Ignite ML: Spark ML model parser
----------------------------
Supported and tested Spark version: 2.3.0
Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4
This module supports the loading machine learning models generated by Spark ML.
To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file
like in example below:
val spark: SparkSession = TitanicUtils.getSparkSession
val passengers = TitanicUtils.readPassengersWithCasting(spark)
.select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")
// Step - 1: Make Vectors from dataframe's columns using special Vector Assmebler
val assembler = new VectorAssembler()
.setInputCols(Array("pclass", "sibsp", "parch", "survived"))
.setOutputCol("features")
// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform(
passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")
val lr = new LinearRegression()
.setMaxIter(100)
.setRegParam(0.1)
.setElasticNetParam(0.1)
.setLabelCol("age")
.setFeaturesCol("features")
// Fit the model
val model = lr.fit(output)
model.write.overwrite().save("/home/models/titanic/linreg")
This listing of code was used to get the Spark ML models for examples in spark-model-parser.
This module supports the all common models in Ignite ML and Spark ML:
- LogisticRegression
- LinearRegression
- LinearSVC
- DecisionTreeClassifier
- RandomForestClassifier
- GBTClassifier
- KMeans
- DecisionTreeRegressor
- RandomForestRegressor
- GBTRegressor
To load in Ignite ML you should use SparkModelParser class via method parse() call like in https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/inference/spark/modelparser
NOTE: it doesn't support loading from PipelineModel in Spark.