SINGA-201 Error when running Mesos

A bug was reported ( when
launching SINGA on Mesos in fully distributed mode.

The main cause was determined to be of ZeroMQ binding to the localhost. In fully
distributed mode, SINGA on each node should be passed a `-host` flag specifying
the public IP address of the local host.

The Mesos scheduler is modified accordingly:

1. When a Mesos slave starts connecting to the master, it passes `--hostname` flag specifying its public IP address

2. The scheduler now sends to each executor command of the form:

          `singa -conf ./job.conf -singa_conf ./singa.conf -singa_job XX -host XX`
1 file changed
tree: 51856fdff0719237c0ead0f0e745f7369089a49c
  1. .gitignore
  3. Doxyfile
  6. Makefile.example
  7. Makefile.gpu
  12. bin/
  13. conf/
  15. doc/
  16. examples/
  17. include/
  18. rat-excludes
  19. src/
  20. thirdparty/
  21. tool/

#Apache SINGA

Distributed deep learning system

##Project Website

All the details can be found in Project Website, including the following instructions.

##Mailing Lists

<a name=“Dependencies” ##Dependencies The current code depends on the following external libraries:

  • glog (New BSD)
  • google-protobuf (New BSD)
  • openblas (New BSD)

###Optional dependencies For advanced features, the following libraries are needed:

  • zeromq (LGPLv3 + static link exception),czmq (Mozilla Public License Version 2.0) and zookeeper (Apache 2.0), for distributed training with multiple processes. Compile SINGA with --enable-dist
  • cuda (NVIDIA CUDA Toolkit EUL) for training using NVIDIA GPUs.
  • cudnn (NVIDIA CuDNN EULA) for training using NVIDIA's CuDNN library.
  • Apache Mesos (Apache 2.0)
  • Apache Hadoop (Apache 2.0)
  • libhdfs3 (Apache 2.0)
  • swig (GPL) for using Python Binding.

We have tested SINGA on Ubuntu 12.04, Ubuntu 14.01 and CentOS 6. You can install all dependencies (including optional dependencies) into $PREFIX folder by

./thirdparty/ all $PREFIX

If $PREFIX is not a system path (e.g., /usr/local/), please export the following variables to continue the building instructions,

$ export PATH=$PREFIX/bin:$PATH


Full documentation is available online at Official Documentation.

##Building SINGA

Please make sure you have g++ >= 4.8.1 before building SINGA.

$ ./
# refer to the FAQs below for errors during configure, including blas_segmm() error
$ ./configure
# refer to the FAQs below for error during make
$ make

To compile with GPU support, you should run:

$ ./configure --enable-cuda --with-cuda=/CUDA/PATH --enable-cudnn --with-cudnn=/CUDNN/PATH

--with-cuda and --with-cudnn are optional as by default the script will search system paths. We have tested with CUDA V7.0 and V7.5, CUDNN V3 and V4. Please kindly set proper environment parameters (LD_LIBRARY_PATH, LIBRARY_PATH, etc.) when you run the code.

To compile with HDFS support, you should run:

$ ./configure --enable-hdfs --with-libhdfs=/PATH/TO/HDFS3

--with-libhdfs is optional as by default the path is /usr/local/.

To compile with python wrappers, you should run:

$ ./tool/python/singa/
$ ./configure --enable-python --with-python=/PATH/TO/Python.h

--with-python is optional as by default the path is /usr/local/include.

You can also run the following command for further configuration.

$ ./configure --help

##Running Examples

Let us train the CNN model over the CIFAR-10 dataset without parallelism as an example. The hyper-parameters are set following cuda-convnet. More details about this example are available at CNN example.

First, download the dataset and create data shards:

$ cd examples/cifar10/
$ cp Makefile.example Makefile
$ make download
$ make create

If it reports errors due to library missing, e.g., libopenblas or libprotobuf, please export the environment variables shown in the Dependencies section and continue with the following instructions,

# delete the newly created folders
$ rm -rf cifar10_t*
$ make create

Next, start the training:

$ cd ../../
$ ./singa -conf examples/cifar10/job.conf

For GPU training or distributed training, please refer to the online guide.


Apache SINGA is licensed under the Apache License, Version 2.0.

For additional information, see the LICENSE and NOTICE files.


  • Q1:I get error ./configure --> cannot find blas_segmm() function even I have installed OpenBLAS.

    A1: This means the compiler cannot find the OpenBLAS library. If you have installed OpenBLAS via apt-get install, then export the path to $LD_LIBRARY_PATH (e.g. /usr/lib/openblas-base). If you installed it with ./thirdparty/, then export the correct path based on $PREFIX (e.g. /opt/OpenBLAS/lib):

    # using apt-get install for openblas
    # using ./thirdparty/ for openblas:
    $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
  • Q2: I get error cblas.h no such file or directory exists.

    A2: You need to include the folder containing cblas.h into $CPLUS_INCLUDE_PATH, e.g.,

    # e.g.,
    # then reconfigure and make SINGA
    $ ./configure
    $ make
  • Q3: When compiling, I get error SSE2 instruction set not enabled

    A3: You can try following command:

    $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
  • Q4: I get ImportError: cannot import name enum_type_wrapper from google.protobuf.internal when I try to import .py files.

    A4: After installing protobuf by make install, we should install python runtime libraries. Go to protobuf source directory, run:

    $ cd python
    $ python build
    $ python install

    You may need sudo when you try to install python runtime libraries in the system folder.

  • Q5: I get a linking error caused by gflags.

    A5: SINGA does not depend on gflags. But you may have installed the glog with gflags. In that case you can reinstall glog using thirdparty/ into a another folder and export the $LDFLAGS and $CPPFLAGS to include that folder.

  • Q6: While compiling SINGA and installing glog on mac OS X, I get fatal error 'ext/slist' file not found

    A6: We have not done thorough test on Mac OS. If you want to install glog, please goto glog folder and try:

    $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
  • Q7: When I start a training job, it reports error related to ZOO_ERROR...zk retcode=-4....

    A7: This is because zookeeper is not started. Please start the service

    $ ./bin/ start

    If the error still exists, probably that you do not have java. You can simply check it by

    $ java --version
  • Q8: When I build OpenBLAS from source, I am told that I need a fortran compiler.

    A8: You can compile OpenBLAS by

    $ make ONLY_CBLAS=1

    or install it using

    $ sudo apt-get install openblas-dev


    $ sudo yum install openblas-devel

    It is worth noting that you need root access to run the last two commands. Remember to set the environment variables to include the header and library paths of OpenBLAS after installation (please refer to the Dependencies section).

  • Q9: When I build protocol buffer, it reports that GLIBC++_3.4.20 not found in /usr/lib64/

    A9: This means the linker found but that library belongs to an older version of GCC than was used to compile and link the program. The program depends on code defined in the newer libstdc++ that belongs to the newer version of GCC, so the linker must be told how to find the newer libstdc++ shared library. The simplest way to fix this is to find the correct libstdc++ and export it to $LD_LIBRARY_PATH. For example, if GLIBC++_3.4.20 is listed in the output of the following command,

    $ strings /usr/local/lib64/|grep GLIBC++

    then just set your environment variable as

    $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH