| // Licensed to the Apache Software Foundation (ASF) under one or more |
| // contributor license agreements. See the NOTICE file distributed with |
| // this work for additional information regarding copyright ownership. |
| // The ASF licenses this file to You under the Apache License, Version 2.0 |
| // (the "License"); you may not use this file except in compliance with |
| // the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| = Import Model from Apache Spark |
| |
| Starting with Ignite 2.8, it's possible to import the following models of Apache Spark ML: |
| |
| - Logistic regression (`org.apache.spark.ml.classification.LogisticRegressionModel`) |
| - Linear regression (`org.apache.spark.ml.classification.LogisticRegressionModel`) |
| - Decision tree (`org.apache.spark.ml.classification.DecisionTreeClassificationModel`) |
| - Support Vector Machine (`org.apache.spark.ml.classification.LinearSVCModel`) |
| - Random forest (`org.apache.spark.ml.classification.RandomForestClassificationModel`) |
| - K-Means (`org.apache.spark.ml.clustering.KMeansModel`) |
| - Decision tree regression (`org.apache.spark.ml.regression.DecisionTreeRegressionModel`) |
| - Random forest regression (`org.apache.spark.ml.regression.RandomForestRegressionModel`) |
| - Gradient boosted trees regression (`org.apache.spark.ml.regression.GBTRegressionModel`) |
| - Gradient boosted trees (`org.apache.spark.ml.classification.GBTClassificationModel`) |
| |
| This feature works with models saved in _snappy.parquet_ files. |
| |
| Supported and tested Spark version: 2.3.0 |
| Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4 |
| |
| To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below: |
| |
| |
| [source, scala] |
| ---- |
| val spark: SparkSession = TitanicUtils.getSparkSession |
| |
| val passengers = TitanicUtils.readPassengersWithCasting(spark) |
| .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age") |
| |
| // Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler |
| val assembler = new VectorAssembler() |
| .setInputCols(Array("pclass", "sibsp", "parch", "survived")) |
| .setOutputCol("features") |
| |
| // Step - 2: Transform dataframe to vectorized dataframe with dropping rows |
| val output = assembler.transform( |
| passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age")) |
| ).select("features", "age") |
| |
| |
| val lr = new LinearRegression() |
| .setMaxIter(100) |
| .setRegParam(0.1) |
| .setElasticNetParam(0.1) |
| .setLabelCol("age") |
| .setFeaturesCol("features") |
| |
| // Fit the model |
| val model = lr.fit(output) |
| model.write.overwrite().save("/home/models/titanic/linreg") |
| ---- |
| |
| |
| To load in Ignite ML you should use SparkModelParser class via method parse() call |
| |
| |
| [source, java] |
| ---- |
| DecisionTreeNode mdl = (DecisionTreeNode)SparkModelParser.parse( |
| SPARK_MDL_PATH, |
| SupportedSparkModels.DECISION_TREE |
| ); |
| ---- |
| |
| You can see more examples of using this API in the examples module in the package: `org.apache.ignite.examples.ml.inference.spark.modelparser` |
| |
| NOTE: It does not support loading from PipelineModel in Spark. |
| It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side. |
| |