example/stochastic-depth/README.md - mxnet - Git at Google

 <!--- Licensed to the Apache Software Foundation (ASF) under one -->
 <!--- or more contributor license agreements.  See the NOTICE file -->
 <!--- distributed with this work for additional information -->
 <!--- regarding copyright ownership.  The ASF licenses this file -->
 <!--- to you under the Apache License, Version 2.0 (the -->
 <!--- "License"); you may not use this file except in compliance -->
 <!--- with the License.  You may obtain a copy of the License at -->

 <!---   http://www.apache.org/licenses/LICENSE-2.0 -->

 <!--- Unless required by applicable law or agreed to in writing, -->
 <!--- software distributed under the License is distributed on an -->
 <!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
 <!--- KIND, either express or implied.  See the License for the -->
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->

 Stochastic Depth
 ================

 This folder contains examples showing implementation of the stochastic depth algorithm described in the paper
 Huang, Gao, et al. ["Deep networks with stochastic depth."](https://arxiv.org/abs/1603.09382)
 arXiv preprint arXiv:1603.09382 (2016). This paper introduces a new way to perturb networks during training
 in order to improve their performance. Stochastic Depth (SD) is a method for residual networks,
 which randomly removes/deactivates residual blocks during training.

 The paper talks about constructing the network of residual blocks which are basically a set of
 convolution layers and a bypass that passes the information from the previous layer through without any change.
 With stochastic depth, the convolution block is sometimes switched off allowing the information
 to flow through the layer without being changed, effectively removing the layer from the network.
 During testing, all layers are left in and the weights are modified by their survival probability.
 This is very similar to how dropout works, except instead of dropping a single node in a layer
 the entire layer is dropped!

 The main idea behind stochastic depth is relatively simple, but the results are surprisingly good.
 The authors demonstrated the new architecture on CIFAR-10, CIFAR-100, and the Street View House Number dataset (SVHN).
 They achieve the lowest published error on CIFAR-10 and CIFAR-100, and second lowest for SVHN.

 Files in this example folder:

 - `sd_mnist.py` example shows sample implementation of the algorithm just for the sanity check.

 - **sd_cifar10.py** shows the algorithm implementation for 500 epochs on cifar_10 dataset. After 500 epochs, ~9.4% error
 was achieved for cifar10, it can be further improved by some more careful hyper parameters tuning to achieve
 the reported numbers in the paper.
 You can see the sample result log in the top section of sd_cifar10.py file.
	<!--- Licensed to the Apache Software Foundation (ASF) under one -->
	<!--- or more contributor license agreements. See the NOTICE file -->
	<!--- distributed with this work for additional information -->
	<!--- regarding copyright ownership. The ASF licenses this file -->
	<!--- to you under the Apache License, Version 2.0 (the -->
	<!--- "License"); you may not use this file except in compliance -->
	<!--- with the License. You may obtain a copy of the License at -->

	<!--- http://www.apache.org/licenses/LICENSE-2.0 -->

	<!--- Unless required by applicable law or agreed to in writing, -->
	<!--- software distributed under the License is distributed on an -->
	<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
	<!--- KIND, either express or implied. See the License for the -->
	<!--- specific language governing permissions and limitations -->
	<!--- under the License. -->

	Stochastic Depth
	================

	This folder contains examples showing implementation of the stochastic depth algorithm described in the paper
	Huang, Gao, et al. ["Deep networks with stochastic depth."](https://arxiv.org/abs/1603.09382)
	arXiv preprint arXiv:1603.09382 (2016). This paper introduces a new way to perturb networks during training
	in order to improve their performance. Stochastic Depth (SD) is a method for residual networks,
	which randomly removes/deactivates residual blocks during training.

	The paper talks about constructing the network of residual blocks which are basically a set of
	convolution layers and a bypass that passes the information from the previous layer through without any change.
	With stochastic depth, the convolution block is sometimes switched off allowing the information
	to flow through the layer without being changed, effectively removing the layer from the network.
	During testing, all layers are left in and the weights are modified by their survival probability.
	This is very similar to how dropout works, except instead of dropping a single node in a layer
	the entire layer is dropped!

	The main idea behind stochastic depth is relatively simple, but the results are surprisingly good.
	The authors demonstrated the new architecture on CIFAR-10, CIFAR-100, and the Street View House Number dataset (SVHN).
	They achieve the lowest published error on CIFAR-10 and CIFAR-100, and second lowest for SVHN.

	Files in this example folder:

	- `sd_mnist.py` example shows sample implementation of the algorithm just for the sanity check.

	- sd_cifar10.py shows the algorithm implementation for 500 epochs on cifar_10 dataset. After 500 epochs, ~9.4% error
	was achieved for cifar10, it can be further improved by some more careful hyper parameters tuning to achieve
	the reported numbers in the paper.
	You can see the sample result log in the top section of sd_cifar10.py file.