versions/1.0.0/_sources/tutorials/r/CustomLossFunction.md.txt - mxnet-site - Git at Google

 Customized loss function
 ======================================

 This tutorial provides guidelines for using customized loss function in network construction.

 Model Training Example
 ----------------------

 Let's begin with a small regression example. We can build and train a regression model with the following code:

 ``` r
 data(BostonHousing, package = "mlbench")
 BostonHousing[, sapply(BostonHousing, is.factor)] <-
   as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
 BostonHousing <- data.frame(scale(BostonHousing))

 test.ind = seq(1, 506, 5)    # 1 pt in 5 used for testing
 train.x = data.matrix(BostonHousing[-test.ind,-14])
 train.y = BostonHousing[-test.ind, 14]
 test.x = data.matrix(BostonHousing[--test.ind,-14])
 test.y = BostonHousing[--test.ind, 14]

 require(mxnet)
 ```

     ## Loading required package: mxnet

 ``` r
 data <- mx.symbol.Variable("data")
 label <- mx.symbol.Variable("label")
 fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
 tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
 fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
 lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")

 mx.set.seed(0)
 model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
                                      ctx = mx.cpu(),
                                      num.round = 5,
                                      array.batch.size = 60,
                                      optimizer = "rmsprop",
                                      verbose = TRUE,
                                      array.layout = "rowmajor",
                                      batch.end.callback = NULL,
                                      epoch.end.callback = NULL)
 ```

     ## Start training with 1 devices

 ``` r
 pred <- predict(model, test.x)
 ```

     ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

 ``` r
 sum((test.y - pred[1,])^2) / length(test.y)
 ```

     ## [1] 0.2485236

 Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`. However, this might not be enough for real-world models. You can provide your own loss function by using `mx.symbol.MakeLoss` when constructing the network.

 How to Use Your Own Loss Function
 ---------------------------------

 We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`

 ``` r
 data <- mx.symbol.Variable("data")
 label <- mx.symbol.Variable("label")
 fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
 tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
 fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
 lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
 ```

 Then we can train the network just as usual.

 ``` r
 mx.set.seed(0)
 model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 5,
                                       array.batch.size = 60,
                                       optimizer = "rmsprop",
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)
 ```

     ## Start training with 1 devices

 We should get very similar results because we are actually minimizing the same loss function. However, the result is quite different.

 ``` r
 pred2 <- predict(model2, test.x)
 ```

     ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

 ``` r
 sum((test.y - pred2)^2) / length(test.y)
 ```

     ## [1] 1.234584

 This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data. We can get the real prediction as below.

 ``` r
 internals = internals(model2$symbol)
 fc_symbol = internals[[match("fc2_output", outputs(internals))]]

 model3 <- list(symbol = fc_symbol,
                arg.params = model2$arg.params,
                aux.params = model2$aux.params)

 class(model3) <- "MXFeedForwardModel"

 pred3 <- predict(model3, test.x)
 ```

     ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

 ``` r
 sum((test.y - pred3[1,])^2) / length(test.y)
 ```

     ## [1] 0.248294

 We have provided many operations on the symbols. An example of `|pred-label|` can be found below.

 ``` r
 lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
 mx.set.seed(0)
 model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 20,
                                       array.batch.size = 60,
                                       optimizer = "sgd",
                                       learning.rate = 0.001,
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)
 ```

     ## Start training with 1 devices

 ``` r
 internals = internals(model4$symbol)
 fc_symbol = internals[[match("fc2_output", outputs(internals))]]

 model5 <- list(symbol = fc_symbol,
                arg.params = model4$arg.params,
                aux.params = model4$aux.params)

 class(model5) <- "MXFeedForwardModel"

 pred5 <- predict(model5, test.x)
 ```

     ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

 ``` r
 sum(abs(test.y - pred5[1,])) / length(test.y)
 ```

     ## [1] 0.7056902

 ``` r
 lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
 mx.set.seed(0)
 model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 20,
                                       array.batch.size = 60,
                                       optimizer = "sgd",
                                       learning.rate = 0.001,
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)
 ```

     ## Start training with 1 devices

 ``` r
 pred6 <- predict(model6, test.x)
 ```

     ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

 ``` r
 sum(abs(test.y - pred6[1,])) / length(test.y)
 ```

     ## [1] 0.7056902


 ## Next Steps
 * [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html)
 * [Classify Real-World Images with a PreTrained Model](http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html)
 * [Handwritten Digits Classification Competition](http://mxnet.io/tutorials/r/mnistCompetition.html)
 * [Character Language Model Using RNN](http://mxnet.io/tutorials/r/charRnnModel.html)
	Customized loss function
	======================================

	This tutorial provides guidelines for using customized loss function in network construction.

	Model Training Example
	----------------------

	Let's begin with a small regression example. We can build and train a regression model with the following code:

	``` r
	data(BostonHousing, package = "mlbench")
	BostonHousing[, sapply(BostonHousing, is.factor)] <-
	as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
	BostonHousing <- data.frame(scale(BostonHousing))

	test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
	train.x = data.matrix(BostonHousing[-test.ind,-14])
	train.y = BostonHousing[-test.ind, 14]
	test.x = data.matrix(BostonHousing[--test.ind,-14])
	test.y = BostonHousing[--test.ind, 14]

	require(mxnet)
	```

	## Loading required package: mxnet

	``` r
	data <- mx.symbol.Variable("data")
	label <- mx.symbol.Variable("label")
	fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
	tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
	fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
	lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")

	mx.set.seed(0)
	model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 5,
	array.batch.size = 60,
	optimizer = "rmsprop",
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	```

	## Start training with 1 devices

	``` r
	pred <- predict(model, test.x)
	```

	## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

	``` r
	sum((test.y - pred[1,])^2) / length(test.y)
	```

	## [1] 0.2485236

	Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`. However, this might not be enough for real-world models. You can provide your own loss function by using `mx.symbol.MakeLoss` when constructing the network.

	How to Use Your Own Loss Function
	---------------------------------

	We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`

	``` r
	data <- mx.symbol.Variable("data")
	label <- mx.symbol.Variable("label")
	fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
	tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
	fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
	lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
	```

	Then we can train the network just as usual.

	``` r
	mx.set.seed(0)
	model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 5,
	array.batch.size = 60,
	optimizer = "rmsprop",
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	```

	## Start training with 1 devices

	We should get very similar results because we are actually minimizing the same loss function. However, the result is quite different.

	``` r
	pred2 <- predict(model2, test.x)
	```

	## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

	``` r
	sum((test.y - pred2)^2) / length(test.y)
	```

	## [1] 1.234584

	This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data. We can get the real prediction as below.

	``` r
	internals = internals(model2$symbol)
	fc_symbol = internals[[match("fc2_output", outputs(internals))]]

	model3 <- list(symbol = fc_symbol,
	arg.params = model2$arg.params,
	aux.params = model2$aux.params)

	class(model3) <- "MXFeedForwardModel"

	pred3 <- predict(model3, test.x)
	```

	## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

	``` r
	sum((test.y - pred3[1,])^2) / length(test.y)
	```

	## [1] 0.248294

	We have provided many operations on the symbols. An example of `\|pred-label\|` can be found below.

	``` r
	lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
	mx.set.seed(0)
	model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 20,
	array.batch.size = 60,
	optimizer = "sgd",
	learning.rate = 0.001,
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	```

	## Start training with 1 devices

	``` r
	internals = internals(model4$symbol)
	fc_symbol = internals[[match("fc2_output", outputs(internals))]]

	model5 <- list(symbol = fc_symbol,
	arg.params = model4$arg.params,
	aux.params = model4$aux.params)

	class(model5) <- "MXFeedForwardModel"

	pred5 <- predict(model5, test.x)
	```

	## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

	``` r
	sum(abs(test.y - pred5[1,])) / length(test.y)
	```

	## [1] 0.7056902

	``` r
	lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
	mx.set.seed(0)
	model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 20,
	array.batch.size = 60,
	optimizer = "sgd",
	learning.rate = 0.001,
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	```

	## Start training with 1 devices

	``` r
	pred6 <- predict(model6, test.x)
	```

	## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..

	``` r
	sum(abs(test.y - pred6[1,])) / length(test.y)
	```

	## [1] 0.7056902


	## Next Steps
	* [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html)
	* [Classify Real-World Images with a PreTrained Model](http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html)
	* [Handwritten Digits Classification Competition](http://mxnet.io/tutorials/r/mnistCompetition.html)
	* [Character Language Model Using RNN](http://mxnet.io/tutorials/r/charRnnModel.html)