blob: b2d8bfb1cbfabe0b63848bc6905c1371baa114c3 [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# singa-incubating-0.1.0 Release Notes
---
SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is
designed with an intuitive programming model based on the layer abstraction. SINGA supports a wide variety of popular
deep learning models.
This release includes following features:
* Job management
* [SINGA-3](https://issues.apache.org/jira/browse/SINGA-3) Use Zookeeper to check stopping (finish) time of the system
* [SINGA-16](https://issues.apache.org/jira/browse/SINGA-16) Runtime Process id Management
* [SINGA-25](https://issues.apache.org/jira/browse/SINGA-25) Setup glog output path
* [SINGA-26](https://issues.apache.org/jira/browse/SINGA-26) Run distributed training in a single command
* [SINGA-30](https://issues.apache.org/jira/browse/SINGA-30) Enhance easy-to-use feature and support concurrent jobs
* [SINGA-33](https://issues.apache.org/jira/browse/SINGA-33) Automatically launch a number of processes in the cluster
* [SINGA-34](https://issues.apache.org/jira/browse/SINGA-34) Support external zookeeper service
* [SINGA-38](https://issues.apache.org/jira/browse/SINGA-38) Support concurrent jobs
* [SINGA-39](https://issues.apache.org/jira/browse/SINGA-39) Avoid ssh in scripts for single node environment
* [SINGA-43](https://issues.apache.org/jira/browse/SINGA-43) Remove Job-related output from workspace
* [SINGA-56](https://issues.apache.org/jira/browse/SINGA-56) No automatic launching of zookeeper service
* [SINGA-73](https://issues.apache.org/jira/browse/SINGA-73) Refine the selection of available hosts from host list
* Installation with GNU Auto tool
* [SINGA-4](https://issues.apache.org/jira/browse/SINGA-4) Refine thirdparty-dependency installation
* [SINGA-13](https://issues.apache.org/jira/browse/SINGA-13) Separate intermediate files of compilation from source files
* [SINGA-17](https://issues.apache.org/jira/browse/SINGA-17) Add root permission within thirdparty/install.
* [SINGA-27](https://issues.apache.org/jira/browse/SINGA-27) Generate python modules for proto objects
* [SINGA-53](https://issues.apache.org/jira/browse/SINGA-53) Add lmdb compiling options
* [SINGA-62](https://issues.apache.org/jira/browse/SINGA-62) Remove building scrips and auxiliary files
* [SINGA-67](https://issues.apache.org/jira/browse/SINGA-67) Add singatest into build targets
* Distributed training
* [SINGA-7](https://issues.apache.org/jira/browse/SINGA-7) Implement shared memory Hogwild algorithm
* [SINGA-8](https://issues.apache.org/jira/browse/SINGA-8) Implement distributed Hogwild
* [SINGA-19](https://issues.apache.org/jira/browse/SINGA-19) Slice large Param objects for load-balance
* [SINGA-29](https://issues.apache.org/jira/browse/SINGA-29) Update NeuralNet class to enable layer partition type customization
* [SINGA-24](https://issues.apache.org/jira/browse/SINGA-24) Implement Downpour training framework
* [SINGA-32](https://issues.apache.org/jira/browse/SINGA-32) Implement AllReduce training framework
* [SINGA-57](https://issues.apache.org/jira/browse/SINGA-57) Improve Distributed Hogwild
* Training algorithms for different model categories
* [SINGA-9](https://issues.apache.org/jira/browse/SINGA-9) Add Support for Restricted Boltzman Machine (RBM) model
* [SINGA-10](https://issues.apache.org/jira/browse/SINGA-10) Add Support for Recurrent Neural Networks (RNN)
* Checkpoint and restore
* [SINGA-12](https://issues.apache.org/jira/browse/SINGA-12) Support Checkpoint and Restore
* Unit test
* [SINGA-64](https://issues.apache.org/jira/browse/SINGA-64) Add the test module for utils/common
* Programming model
* [SINGA-36](https://issues.apache.org/jira/browse/SINGA-36) Refactor job configuration, driver program and scripts
* [SINGA-37](https://issues.apache.org/jira/browse/SINGA-37) Enable users to set parameter sharing in model configuration
* [SINGA-54](https://issues.apache.org/jira/browse/SINGA-54) Refactor job configuration to move fields in ModelProto out
* [SINGA-55](https://issues.apache.org/jira/browse/SINGA-55) Refactor main.cc and singa.h
* [SINGA-61](https://issues.apache.org/jira/browse/SINGA-61) Support user defined classes
* [SINGA-65](https://issues.apache.org/jira/browse/SINGA-65) Add an example of writing user-defined layers
* Other features
* [SINGA-6](https://issues.apache.org/jira/browse/SINGA-6) Implement thread-safe singleton
* [SINGA-18](https://issues.apache.org/jira/browse/SINGA-18) Update API for displaying performance metric
* [SINGA-77](https://issues.apache.org/jira/browse/SINGA-77) Integrate with Apache RAT
Some bugs are fixed during the development of this release
* [SINGA-2](https://issues.apache.org/jira/browse/SINGA-2) Check failed: zsock_connect
* [SINGA-5](https://issues.apache.org/jira/browse/SINGA-5) Server early terminate when zookeeper singa folder is not initially empty
* [SINGA-15](https://issues.apache.org/jira/browse/SINGA-15) Fixg a bug from ConnectStub function which gets stuck for connecting layer_dealer_
* [SINGA-22](https://issues.apache.org/jira/browse/SINGA-22) Cannot find openblas library when it is installed in default path
* [SINGA-23](https://issues.apache.org/jira/browse/SINGA-23) Libtool version mismatch error.
* [SINGA-28](https://issues.apache.org/jira/browse/SINGA-28) Fix a bug from topology sort of Graph
* [SINGA-42](https://issues.apache.org/jira/browse/SINGA-42) Issue when loading checkpoints
* [SINGA-44](https://issues.apache.org/jira/browse/SINGA-44) A bug when reseting metric values
* [SINGA-46](https://issues.apache.org/jira/browse/SINGA-46) Fix a bug in updater.cc to scale the gradients
* [SINGA-47](https://issues.apache.org/jira/browse/SINGA-47) Fix a bug in data layers that leads to out-of-memory when group size is too large
* [SINGA-48](https://issues.apache.org/jira/browse/SINGA-48) Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups
* [SINGA-49](https://issues.apache.org/jira/browse/SINGA-49) Fix a bug in HandlePutMsg func that sets param fields to invalid values
* [SINGA-66](https://issues.apache.org/jira/browse/SINGA-66) Fix bugs in Worker::RunOneBatch function and ClusterProto
* [SINGA-79](https://issues.apache.org/jira/browse/SINGA-79) Fix bug in singatool that can not parse -conf flag
Features planned for the next release
* [SINGA-11](https://issues.apache.org/jira/browse/SINGA-11) Start SINGA using Mesos
* [SINGA-31](https://issues.apache.org/jira/browse/SINGA-31) Extend Blob to support xpu (cpu or gpu)
* [SINGA-35](https://issues.apache.org/jira/browse/SINGA-35) Add random number generators
* [SINGA-40](https://issues.apache.org/jira/browse/SINGA-40) Support sparse Param update
* [SINGA-41](https://issues.apache.org/jira/browse/SINGA-41) Support single node single GPU training