blob: 85e882567f8ed66105ce2d43c124d3c7ff1b2457 [file] [log] [blame]
# Customized loss function
This tutorial provides guidelines for using customized loss function in network construction.
## Model Training Example
Let's begin with a small regression example. We can build and train a regression model with the following code:
```{r}
data(BostonHousing, package = "mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
train.x = data.matrix(BostonHousing[-test.ind,-14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[--test.ind,-14])
test.y = BostonHousing[--test.ind, 14]
require(mxnet)
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
pred <- predict(model, test.x)
sum((test.y - pred[1,])^2) / length(test.y)
```
Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
However, this might not be enough for real-world models. You can provide your own loss function
by using `mx.symbol.MakeLoss` when constructing the network.
## How to Use Your Own Loss Function
We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`
```{r}
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
```
Then we can train the network just as usual.
```{r}
mx.set.seed(0)
model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 60,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
```
We should get very similar results because we are actually minimizing the same loss function.
However, the result is quite different.
```{r}
pred2 <- predict(model2, test.x)
sum((test.y - pred2)^2) / length(test.y)
```
This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data.
We can get the real prediction as below.
```{r}
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model3 <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model3) <- "MXFeedForwardModel"
pred3 <- predict(model3, test.x)
sum((test.y - pred3[1,])^2) / length(test.y)
```
We have provided many operations on the symbols. An example of `|pred-label|` can be found below.
```{r}
lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
mx.set.seed(0)
model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 20,
array.batch.size = 60,
optimizer = "sgd",
learning.rate = 0.001,
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
internals = internals(model4$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model5 <- list(symbol = fc_symbol,
arg.params = model4$arg.params,
aux.params = model4$aux.params)
class(model5) <- "MXFeedForwardModel"
pred5 <- predict(model5, test.x)
sum(abs(test.y - pred5[1,])) / length(test.y)
```
```{r}
lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
mx.set.seed(0)
model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 20,
array.batch.size = 60,
optimizer = "sgd",
learning.rate = 0.001,
verbose = TRUE,
array.layout = "rowmajor",
batch.end.callback = NULL,
epoch.end.callback = NULL)
pred6 <- predict(model6, test.x)
sum(abs(test.y - pred6[1,])) / length(test.y)
```
We got the same result as expected.
<!-- INSERT SOURCE DOWNLOAD BUTTONS -->