Submarine is a new subproject of Apache Hadoop.
Submarine is a project which allows infra engineer / data scientist to run unmodified TensorFlow or PyTorch programs on YARN or Kubernetes.
Goals of Submarine:
Submarine Workbench is a WEB system. Algorithm engineers can perform complete lifecycle management of machine learning jobs in the Workbench.
Manage machine learning jobs through project.
Data processing, data conversion, feature engineering, etc. in the workbench.
Data processing, algorithm development, and model training in machine learning jobs as a job run.
Algorithm selection, parameter adjustment, model training, model release, model Serving.
Automate the complete life cycle of machine learning operations by scheduling workflows for data processing, model training, and model publishing.
Support team development, code sharing, comments, code and model version management.
The submarine core is the execution engine of the system and has the following features：
Support for multiple machine learning framework access, such as tensorflow, pytorch.
Docking the externally deployed Spark calculation engine for data processing.
Support Python, Scala, R language for algorithm development, The SDK is provided to help developers use submarine's internal data caching, data exchange, and task tracking to more efficiently improve the development and execution of machine learning tasks.
Compatible with the underlying hybrid scheduling system of yarn and k8s for unified task scheduling and resource management, so that users are not aware.
You can use mini-submarine for a quick experience submairne.
This is a docker image built for submarine development and quick start test.
Read the Quick Start Guide
Read the Apache Hadoop Submarine Community Guide
How to contribute Contributing Guide
The Apache Hadoop Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.