R-package/vignettes/CustomLossFunction.Rmd - mxnet - Git at Google

 # Customized loss function

 This tutorial provides guidelines for using customized loss function in network construction.

 ## Model Training Example

 Let's begin with a small regression example. We can build and train a regression model with the following code:

 ```{r}
 data(BostonHousing, package = "mlbench")
 BostonHousing[, sapply(BostonHousing, is.factor)] <-
   as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
 BostonHousing <- data.frame(scale(BostonHousing))

 test.ind = seq(1, 506, 5)    # 1 pt in 5 used for testing
 train.x = data.matrix(BostonHousing[-test.ind,-14])
 train.y = BostonHousing[-test.ind, 14]
 test.x = data.matrix(BostonHousing[--test.ind,-14])
 test.y = BostonHousing[--test.ind, 14]

 require(mxnet)

 data <- mx.symbol.Variable("data")
 label <- mx.symbol.Variable("label")
 fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
 tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
 fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
 lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")

 mx.set.seed(0)
 model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
                                      ctx = mx.cpu(),
                                      num.round = 5,
                                      array.batch.size = 60,
                                      optimizer = "rmsprop",
                                      verbose = TRUE,
                                      array.layout = "rowmajor",
                                      batch.end.callback = NULL,
                                      epoch.end.callback = NULL)

 pred <- predict(model, test.x)
 sum((test.y - pred[1,])^2) / length(test.y)
 ```

 Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
 However, this might not be enough for real-world models. You can provide your own loss function
 by using `mx.symbol.MakeLoss` when constructing the network.

 ## How to Use Your Own Loss Function

 We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`

 ```{r}
 data <- mx.symbol.Variable("data")
 label <- mx.symbol.Variable("label")
 fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
 tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
 fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
 lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
 ```

 Then we can train the network just as usual.

 ```{r}
 mx.set.seed(0)
 model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 5,
                                       array.batch.size = 60,
                                       optimizer = "rmsprop",
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)
 ```

 We should get very similar results because we are actually minimizing the same loss function.
 However, the result is quite different.

 ```{r}
 pred2 <- predict(model2, test.x)
 sum((test.y - pred2)^2) / length(test.y)
 ```

 This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data.
 We can get the real prediction as below.

 ```{r}
 internals = internals(model2$symbol)
 fc_symbol = internals[[match("fc2_output", outputs(internals))]]

 model3 <- list(symbol = fc_symbol,
                arg.params = model2$arg.params,
                aux.params = model2$aux.params)

 class(model3) <- "MXFeedForwardModel"

 pred3 <- predict(model3, test.x)
 sum((test.y - pred3[1,])^2) / length(test.y)
 ```

 We have provided many operations on the symbols. An example of `|pred-label|` can be found below.

 ```{r}
 lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
 mx.set.seed(0)
 model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 20,
                                       array.batch.size = 60,
                                       optimizer = "sgd",
                                       learning.rate = 0.001,
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)

 internals = internals(model4$symbol)
 fc_symbol = internals[[match("fc2_output", outputs(internals))]]

 model5 <- list(symbol = fc_symbol,
                arg.params = model4$arg.params,
                aux.params = model4$aux.params)

 class(model5) <- "MXFeedForwardModel"

 pred5 <- predict(model5, test.x)
 sum(abs(test.y - pred5[1,])) / length(test.y)
 ```


 ```{r}
 lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
 mx.set.seed(0)
 model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
                                       ctx = mx.cpu(),
                                       num.round = 20,
                                       array.batch.size = 60,
                                       optimizer = "sgd",
                                       learning.rate = 0.001,
                                       verbose = TRUE,
                                       array.layout = "rowmajor",
                                       batch.end.callback = NULL,
                                       epoch.end.callback = NULL)
 pred6 <- predict(model6, test.x)
 sum(abs(test.y - pred6[1,])) / length(test.y)
 ```

 We got the same result as expected.

 <!-- INSERT SOURCE DOWNLOAD BUTTONS -->
	# Customized loss function

	This tutorial provides guidelines for using customized loss function in network construction.

	## Model Training Example

	Let's begin with a small regression example. We can build and train a regression model with the following code:

	```{r}
	data(BostonHousing, package = "mlbench")
	BostonHousing[, sapply(BostonHousing, is.factor)] <-
	as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
	BostonHousing <- data.frame(scale(BostonHousing))

	test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
	train.x = data.matrix(BostonHousing[-test.ind,-14])
	train.y = BostonHousing[-test.ind, 14]
	test.x = data.matrix(BostonHousing[--test.ind,-14])
	test.y = BostonHousing[--test.ind, 14]

	require(mxnet)

	data <- mx.symbol.Variable("data")
	label <- mx.symbol.Variable("label")
	fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
	tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
	fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
	lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")

	mx.set.seed(0)
	model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 5,
	array.batch.size = 60,
	optimizer = "rmsprop",
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)

	pred <- predict(model, test.x)
	sum((test.y - pred[1,])^2) / length(test.y)
	```

	Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
	However, this might not be enough for real-world models. You can provide your own loss function
	by using `mx.symbol.MakeLoss` when constructing the network.

	## How to Use Your Own Loss Function

	We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`

	```{r}
	data <- mx.symbol.Variable("data")
	label <- mx.symbol.Variable("label")
	fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
	tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
	fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
	lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
	```

	Then we can train the network just as usual.

	```{r}
	mx.set.seed(0)
	model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 5,
	array.batch.size = 60,
	optimizer = "rmsprop",
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	```

	We should get very similar results because we are actually minimizing the same loss function.
	However, the result is quite different.

	```{r}
	pred2 <- predict(model2, test.x)
	sum((test.y - pred2)^2) / length(test.y)
	```

	This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data.
	We can get the real prediction as below.

	```{r}
	internals = internals(model2$symbol)
	fc_symbol = internals[[match("fc2_output", outputs(internals))]]

	model3 <- list(symbol = fc_symbol,
	arg.params = model2$arg.params,
	aux.params = model2$aux.params)

	class(model3) <- "MXFeedForwardModel"

	pred3 <- predict(model3, test.x)
	sum((test.y - pred3[1,])^2) / length(test.y)
	```

	We have provided many operations on the symbols. An example of `\|pred-label\|` can be found below.

	```{r}
	lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
	mx.set.seed(0)
	model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 20,
	array.batch.size = 60,
	optimizer = "sgd",
	learning.rate = 0.001,
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)

	internals = internals(model4$symbol)
	fc_symbol = internals[[match("fc2_output", outputs(internals))]]

	model5 <- list(symbol = fc_symbol,
	arg.params = model4$arg.params,
	aux.params = model4$aux.params)

	class(model5) <- "MXFeedForwardModel"

	pred5 <- predict(model5, test.x)
	sum(abs(test.y - pred5[1,])) / length(test.y)
	```


	```{r}
	lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
	mx.set.seed(0)
	model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
	ctx = mx.cpu(),
	num.round = 20,
	array.batch.size = 60,
	optimizer = "sgd",
	learning.rate = 0.001,
	verbose = TRUE,
	array.layout = "rowmajor",
	batch.end.callback = NULL,
	epoch.end.callback = NULL)
	pred6 <- predict(model6, test.x)
	sum(abs(test.y - pred6[1,])) / length(test.y)
	```

	We got the same result as expected.

	<!-- INSERT SOURCE DOWNLOAD BUTTONS -->