blob: afb99518894c52706a04989d8c77faf9f30ddaaf [file] [log] [blame]
Customized loss function
======================================
This tutorial provides guidelines for using customized loss function in network construction.
Model Training Example
----------------------
Let's begin with a small regression example. We can build and train a regression model with the following code:
``` r
data(BostonHousing, package = "mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
train.x = data.matrix(BostonHousing[-test.ind,-14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[--test.ind,-14])
test.y = BostonHousing[--test.ind, 14]
require(mxnet)
```
## Loading required package: mxnet
``` r
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
## Start training with 1 devices
``` r
pred <- predict(model, test.x)
```
## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
``` r
sum((test.y - pred[1,])^2) / length(test.y)
```
## [1] 0.2485236
Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`. However, this might not be enough for real-world models. You can provide your own loss function by using `mx.symbol.MakeLoss` when constructing the network.
How to Use Your Own Loss Function
---------------------------------
We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`
``` r
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
```
Then we can train the network just as usual.
``` r
mx.set.seed(0)
model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
## Start training with 1 devices
We should get very similar results because we are actually minimizing the same loss function. However, the result is quite different.
``` r
pred2 <- predict(model2, test.x)
```
## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
``` r
sum((test.y - pred2)^2) / length(test.y)
```
## [1] 1.234584
This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data. We can get the real prediction as below.
``` r
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model3 <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model3) <- "MXFeedForwardModel"
pred3 <- predict(model3, test.x)
```
## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
``` r
sum((test.y - pred3[1,])^2) / length(test.y)
```
## [1] 0.248294
We have provided many operations on the symbols. An example of `|pred-label|` can be found below.
``` r
lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
mx.set.seed(0)
model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 20,
array.batch.size = 60,
optimizer = "sgd",
learning.rate = 0.001,
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
## Start training with 1 devices
``` r
internals = internals(model4$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model5 <- list(symbol = fc_symbol,
arg.params = model4$arg.params,
aux.params = model4$aux.params)
class(model5) <- "MXFeedForwardModel"
pred5 <- predict(model5, test.x)
```
## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
``` r
sum(abs(test.y - pred5[1,])) / length(test.y)
```
## [1] 0.7056902
``` r
lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
mx.set.seed(0)
model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 20,
array.batch.size = 60,
optimizer = "sgd",
learning.rate = 0.001,
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
## Start training with 1 devices
``` r
pred6 <- predict(model6, test.x)
```
## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
``` r
sum(abs(test.y - pred6[1,])) / length(test.y)
```
## [1] 0.7056902
## Next Steps
* [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html)
* [Classify Real-World Images with a PreTrained Model](http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html)
* [Handwritten Digits Classification Competition](http://mxnet.io/tutorials/r/mnistCompetition.html)
* [Character Language Model Using RNN](http://mxnet.io/tutorials/r/charRnnModel.html)