blob: 2584b4c36caf332b8720ab17cf864515b3159dae [file] [log] [blame]
# Deep Learning in a Single File for Smart Devices
Deep learning (DL) systems are complex and often depend on a number of libraries.
Porting a DL library to different platforms can be painful, especially for smart devices.
One simple solution to this problem is to provide a light interface to the library, complete with all required code in a single file with minimal dependencies.
In this document, we explain how to amalgamate all necessary code into a single file,
and demonstrate the approach with an example in which we run image recognition on a mobile device.
## Amalgamation: Making the Whole System a Single File
We come to the idea of of amalgamation following the example of SQLite,
which pack all the code needed to run a simple database into a single source file.
All that's necessary to create the library is to compile that single file.
This simplifies the problem of porting to various platforms.
Thanks to [Jack Deng](https://github.com/jdeng),
MXNet provides an [amalgamation](https://github.com/dmlc/mxnet/tree/master/amalgamation) script
that compiles all code needed for prediction based on trained DL models into a single `.cc` file,
containing approximately 30K lines of code. This code only depends on the BLAS library.
Moreover, we've also created an even more minimal version,
with the BLAS dependency removed.
You can compile the single file into JavaScript by using [emscripten](https://github.com/kripken/emscripten).
The compiled library can be used by any other programming language.
The `.h` file contains a light prediction API.
Porting to another language with a C foreign function interface requires little effort.
For examples, see the following examples on GitHub:
- Go: [https://github.com/jdeng/gomxnet](https://github.com/jdeng/gomxnet)
- Java: [https://github.com/dmlc/mxnet/tree/master/amalgamation/jni](https://github.com/dmlc/mxnet/tree/master/amalgamation/jni)
- Python: [https://github.com/dmlc/mxnet/tree/master/amalgamation/python](https://github.com/dmlc/mxnet/tree/master/amalgamation/python)
If you plan to amalgamate your system, there are a few guidelines you ought to observe when building the project:
- Minimize dependence on other libraries.
- Use namespace to encapsulate the types and operators.
- Avoid running commands such as ```using namespace xyz``` on the global scope.
- Avoid cyclic include dependencies.
## Image Recognition Demo on Mobile Devices
With amalgamation, deploying the system on smart devices (such as Android or iOS) is simple. But there are two additional considerations:
- The model should be small enough to fit into the device's memory.
- The model shouldn't be too expensive to run given the relatively low computational power of these devices.
Let's use image recognition as an example.
We start with the state-of-the-art inception model.
We train it on an ImageNet dataset,
using multiple servers with GTX 980 cards.
The resulting model fits into memory,
but it's too expensive to run.
We remove some layers, but now the results are poor.
Finally, we show an Android example, thanks to Leliana, [https://github.com/Leliana/WhatsThis](https://github.com/Leliana/WhatsThis) to demonstrate how to run on Android.
<img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/apk/subinception.png" height="488" width="274">
By using amalgamation, we can easily port the prediction library to mobile devices, with nearly no dependency.
After compiling the library for smart platforms, the last thing we must do is to call C-API in the target language (Java/Swift).
<img src="https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/apk/poolnet.png" height="488" width="274">
Besides this pre-trained Inception-BatchNorm network, we've provided two pre-trained models.
We tested our model on a Nexus 5:
| | Top-1 Validation on ILSVRC2012 | Time | App Size | Runtime Temp Memory Req |
| ---------------- | ----------------------------------- | ----- | --- | ------------ |
| FastPoorNet | around 52%, similar to 2011 winner | 1s | <10MB | <5MB |
| Sub InceptionBN | around 64%, similar to 2013 winner | 2.7s | <40MB | <10MB |
| InceptionBN | around 70% | 4s-5s | <60MB | 10MB |
These models are for demonstration only.
They aren't fine-tuned for mobile devices,
and there is definitely room for improvement.
We believe that making a lightweight, portable,
and fast deep learning library is fun and interesting,
and hope you enjoy using the library.
## Source Code
[https://github.com/Leliana/WhatsThis](https://github.com/Leliana/WhatsThis)
## Demo APK Download
- [FastPoorNet](https://github.com/dmlc/web-data/blob/master/mxnet/apk/fastpoornet.apk?raw=true)
- [SubInception](https://github.com/dmlc/web-data/blob/master/mxnet/apk/subinception.apk?raw=true)