| Sub-module in Apache Ignite ML: Spark ML model parser |
| ---------------------------- |
| |
| Supported and tested Spark version: 2.3.0 |
| Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4 |
| |
| This module supports the loading machine learning models generated by Spark ML. |
| |
| To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file |
| like in example below: |
| |
| val spark: SparkSession = TitanicUtils.getSparkSession |
| |
| val passengers = TitanicUtils.readPassengersWithCasting(spark) |
| .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age") |
| |
| // Step - 1: Make Vectors from dataframe's columns using special Vector Assmebler |
| val assembler = new VectorAssembler() |
| .setInputCols(Array("pclass", "sibsp", "parch", "survived")) |
| .setOutputCol("features") |
| |
| // Step - 2: Transform dataframe to vectorized dataframe with dropping rows |
| val output = assembler.transform( |
| passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age")) |
| ).select("features", "age") |
| |
| val lr = new LinearRegression() |
| .setMaxIter(100) |
| .setRegParam(0.1) |
| .setElasticNetParam(0.1) |
| .setLabelCol("age") |
| .setFeaturesCol("features") |
| |
| // Fit the model |
| val model = lr.fit(output) |
| model.write.overwrite().save("/home/models/titanic/linreg") |
| |
| This listing of code was used to get the Spark ML models for examples in spark-model-parser. |
| |
| This module supports the all common models in Ignite ML and Spark ML: |
| |
| - LogisticRegression |
| - LinearRegression |
| - LinearSVC |
| - DecisionTreeClassifier |
| - RandomForestClassifier |
| - GBTClassifier |
| - KMeans |
| - DecisionTreeRegressor |
| - RandomForestRegressor |
| - GBTRegressor |
| |
| To load in Ignite ML you should use SparkModelParser class via method parse() call like in https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/inference/spark/modelparser |
| |
| NOTE: it doesn't support loading from PipelineModel in Spark. |