docs/_docs/machine-learning/importing-model/model-import-from-apache-spark.adoc - ignite - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = Import Model from Apache Spark

 Starting with Ignite 2.8,  it's possible to import the following models of Apache Spark ML:

 - Logistic regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
 - Linear regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
 - Decision tree (`org.apache.spark.ml.classification.DecisionTreeClassificationModel`)
 - Support Vector Machine (`org.apache.spark.ml.classification.LinearSVCModel`)
 - Random forest (`org.apache.spark.ml.classification.RandomForestClassificationModel`)
 - K-Means (`org.apache.spark.ml.clustering.KMeansModel`)
 - Decision tree regression (`org.apache.spark.ml.regression.DecisionTreeRegressionModel`)
 - Random forest regression (`org.apache.spark.ml.regression.RandomForestRegressionModel`)
 - Gradient boosted trees regression (`org.apache.spark.ml.regression.GBTRegressionModel`)
 - Gradient boosted trees (`org.apache.spark.ml.classification.GBTClassificationModel`)

 This feature works with models saved in _snappy.parquet_ files.

 Supported and tested Spark version: 2.3.0
 Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

 To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:


 [source, scala]
 ----
 val spark: SparkSession = TitanicUtils.getSparkSession

 val passengers = TitanicUtils.readPassengersWithCasting(spark)
     .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

 // Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
 val assembler = new VectorAssembler()
     .setInputCols(Array("pclass", "sibsp", "parch", "survived"))
     .setOutputCol("features")

 // Step - 2: Transform dataframe to vectorized dataframe with dropping rows
 val output = assembler.transform(
     passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
 ).select("features", "age")


 val lr = new LinearRegression()
     .setMaxIter(100)
     .setRegParam(0.1)
     .setElasticNetParam(0.1)
     .setLabelCol("age")
     .setFeaturesCol("features")

 // Fit the model
 val model = lr.fit(output)
 model.write.overwrite().save("/home/models/titanic/linreg")
 ----


 To load in Ignite ML you should use SparkModelParser class via method parse() call


 [source, java]
 ----
 DecisionTreeNode mdl = (DecisionTreeNode)SparkModelParser.parse(
    SPARK_MDL_PATH,
    SupportedSparkModels.DECISION_TREE
 );
 ----

 You can see more examples of using this API in the examples module in the package: `org.apache.ignite.examples.ml.inference.spark.modelparser`

 NOTE: It does not support loading from PipelineModel in Spark.
 It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= Import Model from Apache Spark

	Starting with Ignite 2.8, it's possible to import the following models of Apache Spark ML:

	- Logistic regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
	- Linear regression (`org.apache.spark.ml.classification.LogisticRegressionModel`)
	- Decision tree (`org.apache.spark.ml.classification.DecisionTreeClassificationModel`)
	- Support Vector Machine (`org.apache.spark.ml.classification.LinearSVCModel`)
	- Random forest (`org.apache.spark.ml.classification.RandomForestClassificationModel`)
	- K-Means (`org.apache.spark.ml.clustering.KMeansModel`)
	- Decision tree regression (`org.apache.spark.ml.regression.DecisionTreeRegressionModel`)
	- Random forest regression (`org.apache.spark.ml.regression.RandomForestRegressionModel`)
	- Gradient boosted trees regression (`org.apache.spark.ml.regression.GBTRegressionModel`)
	- Gradient boosted trees (`org.apache.spark.ml.classification.GBTClassificationModel`)

	This feature works with models saved in _snappy.parquet_ files.

	Supported and tested Spark version: 2.3.0
	Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

	To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:


	[source, scala]
	----
	val spark: SparkSession = TitanicUtils.getSparkSession

	val passengers = TitanicUtils.readPassengersWithCasting(spark)
	.select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

	// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
	val assembler = new VectorAssembler()
	.setInputCols(Array("pclass", "sibsp", "parch", "survived"))
	.setOutputCol("features")

	// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
	val output = assembler.transform(
	passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
	).select("features", "age")


	val lr = new LinearRegression()
	.setMaxIter(100)
	.setRegParam(0.1)
	.setElasticNetParam(0.1)
	.setLabelCol("age")
	.setFeaturesCol("features")

	// Fit the model
	val model = lr.fit(output)
	model.write.overwrite().save("/home/models/titanic/linreg")
	----


	To load in Ignite ML you should use SparkModelParser class via method parse() call


	[source, java]
	----
	DecisionTreeNode mdl = (DecisionTreeNode)SparkModelParser.parse(
	SPARK_MDL_PATH,
	SupportedSparkModels.DECISION_TREE
	);
	----

	You can see more examples of using this API in the examples module in the package: `org.apache.ignite.examples.ml.inference.spark.modelparser`

	NOTE: It does not support loading from PipelineModel in Spark.
	It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.