modules/ml/spark-model-parser/README.txt - ignite - Git at Google

 Sub-module in Apache Ignite ML: Spark ML model parser
 ----------------------------

 Supported and tested Spark version: 2.3.0
 Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

 This module supports the loading machine learning models generated by Spark ML.

 To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file
 like in example below:

         val spark: SparkSession = TitanicUtils.getSparkSession

         val passengers = TitanicUtils.readPassengersWithCasting(spark)
             .select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

         // Step - 1: Make Vectors from dataframe's columns using special Vector Assmebler
         val assembler = new VectorAssembler()
             .setInputCols(Array("pclass", "sibsp", "parch", "survived"))
             .setOutputCol("features")

         // Step - 2: Transform dataframe to vectorized dataframe with dropping rows
         val output = assembler.transform(
             passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
         ).select("features", "age")

         val lr = new LinearRegression()
             .setMaxIter(100)
             .setRegParam(0.1)
             .setElasticNetParam(0.1)
             .setLabelCol("age")
             .setFeaturesCol("features")

         // Fit the model
         val model = lr.fit(output)
         model.write.overwrite().save("/home/models/titanic/linreg")

 This listing of code was used to get the Spark ML models for examples in spark-model-parser.

 This module supports the all common models in Ignite ML and Spark ML:

 - LogisticRegression
 - LinearRegression
 - LinearSVC
 - DecisionTreeClassifier
 - RandomForestClassifier
 - GBTClassifier
 - KMeans
 - DecisionTreeRegressor
 - RandomForestRegressor
 - GBTRegressor

 To load in Ignite ML you should use SparkModelParser class via method parse() call like in https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/inference/spark/modelparser

 NOTE: it doesn't support loading from PipelineModel in Spark.
	Sub-module in Apache Ignite ML: Spark ML model parser
	----------------------------

	Supported and tested Spark version: 2.3.0
	Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4

	This module supports the loading machine learning models generated by Spark ML.

	To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file
	like in example below:

	val spark: SparkSession = TitanicUtils.getSparkSession

	val passengers = TitanicUtils.readPassengersWithCasting(spark)
	.select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")

	// Step - 1: Make Vectors from dataframe's columns using special Vector Assmebler
	val assembler = new VectorAssembler()
	.setInputCols(Array("pclass", "sibsp", "parch", "survived"))
	.setOutputCol("features")

	// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
	val output = assembler.transform(
	passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
	).select("features", "age")

	val lr = new LinearRegression()
	.setMaxIter(100)
	.setRegParam(0.1)
	.setElasticNetParam(0.1)
	.setLabelCol("age")
	.setFeaturesCol("features")

	// Fit the model
	val model = lr.fit(output)
	model.write.overwrite().save("/home/models/titanic/linreg")

	This listing of code was used to get the Spark ML models for examples in spark-model-parser.

	This module supports the all common models in Ignite ML and Spark ML:

	- LogisticRegression
	- LinearRegression
	- LinearSVC
	- DecisionTreeClassifier
	- RandomForestClassifier
	- GBTClassifier
	- KMeans
	- DecisionTreeRegressor
	- RandomForestRegressor
	- GBTRegressor

	To load in Ignite ML you should use SparkModelParser class via method parse() call like in https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/inference/spark/modelparser

	NOTE: it doesn't support loading from PipelineModel in Spark.