tag	a456dd1d12b3acaeccb8755b26a6cde1927a106b
tagger	Wanqiang Ji <jiwq@apache.org>	Fri Jun 26 19:18:28 2020 +0800
object	1ae00ff6eb328040fbf8e5ccb2ff49de1efd7714

Release candidate - 0.4.0-RC0

commit	1ae00ff6eb328040fbf8e5ccb2ff49de1efd7714	[log] [tgz]
author	Wanqiang Ji <jiwq@apache.org>	Fri Jun 26 19:12:54 2020 +0800
committer	Wanqiang Ji <jiwq@apache.org>	Fri Jun 26 19:12:54 2020 +0800
tree	e0ceb0752d2524d33590b117fa93bd18fea6c99d
parent	f0b3110bebfaef1326b3000a8a72f5d62d6ed11a [diff]

Preparing for 0.4.0 release

71 files changed

tree: e0ceb0752d2524d33590b117fa93bd18fea6c99d

README.md

color_logo_with_text

What is Apache Submarine?

Apache Submarine (Submarine for short) is the ONE PLATFORM to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM means it supports Data Scientists to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training (experiments), and push model to production (model serving and monitoring). All these steps can be completed within the ONE PLATFORM.

Why Submarine?

There‘re already a lot of open-source and comericial projects are trying to create a end-to-end machine-learning/deep-learning platform, what’s the vision of Submarine?

Problems

Existing products lack of good User-Interface (API, SDK, etc) to run training workload at scale, repeatable and easy for data scientist to understand on cloud/premise.
Data-Scientist want to focus on domain-specific target (e.g. improve Click-Through-Rate), however available products always give user a platform (a SDK to run distributed Pytorch script).
Many products provided functionalities to do data exploring, model training, and serving/monitoring. However these functionalities are largely disconnected with each other. And cannot organically work with each other.

Theodore Levitt once said:

“People don’t want to buy a quarter-inch drill. They want a quarter-inch hole.”

Goals of Submarine

Model Training (Experiment)

Can run experiment (training jobs) on prem, on cloud. Via easy-to-use User-Interfaces
Easy for Data-Scientist (DS) to manage training code and dependencies (Docker, Python Dependencies, etc.) .
ML-focused APIs to run/track experiment from Python SDK (notebook), REST API, and CLI.
Provide APIs to run training jobs by using popular frameworks (Standalone/Distributed TensorFlow/PyTorch/Hovorod).
Pre-packaged Training Template for Data-Scientists to focus on domain-specific tasks (like using DeepFM to build a CTR prediction model).
Support GPU and other compute speed-up devides.
Support running on K8s/YARN or other resource management system.
Pipeline is also on the backlog, we will look into pipeline for training in the future.

Notebook Service

Submarine is target to provide notebook service, which allows users to create/edit/delete a notebook instance (such as a Jupyter notebook) running on the cluster.
Users can submit experiement, manage models using Submarine SDK.

Model Management (Serving/versioning/monitoring, etc.)

Model management for model-serving/versioning/monitoring is on the roadmap.

Easy-to-use User-Interface of Submarine

Like mentioned above, Submarine is targeted to bring Data-Scientist-friendly user-interfaces to make their life easier. Here're some examples of Submarine user-interfaces.

Submit a distributed Tensorflow experiment via Submarine Python SDK

Run a Tensorflow Mnist experiment


# New a submarine client of the submarine server
submarine_client = submarine.ExperimentClient(host='http://localhost:8080')

# The experiment's environment, could be Docker image or Conda environment based
environment = Environment(image='gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0')

# Specify the experiment's name, framework it's using, namespace it will run in,
# the entry point. It can also accept environment variables. etc.
# For PyTorch job, the framework should be 'Pytorch'.
experiment_meta = ExperimentMeta(name='mnist-dist',
                                 namespace='default',
                                 framework='Tensorflow',
                                 cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100')
# 1 PS task of 2 cpu, 1GB
ps_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
                             replicas=1)
# 1 Worker task
worker_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
                                 replicas=1)

# Wrap up the meta, environment and task specs into an experiment.
# For PyTorch job, the specs would be "Master" and "Worker".
experiment_spec = ExperimentSpec(meta=experiment_meta,
                                 environment=environment,
                                 spec={'Ps':ps_spec, 'Worker': worker_spec})

# Submit the experiment to submarine server
experiment = submarine_client.create_experiment(experiment_spec=experiment_spec)

# Get the experiment ID
id = experiment['experimentId']

Query a specific experiment

submarine_client.get_experiment(id)

Wait for finish

submarine_client.wait_for_finish(id)

Get the experiment's log

submarine_client.get_log(id)

Get all running experiment

submarine_client.list_experiments(status='running')

For a quick-start, see Submarine On K8s

Submit a pre-defined experiment template job

Submit an experiment via Submarine UI

(Available on 0.6.0, see Roadmap)

Architecture, Design and requirements

If you want to knwow more about Submarine's architecture, components, requirements and design doc, they can be found on Architecture-and-requirement

Detailed design documentation, implementation notes can be found at: Implementation notes

Apache Submarine Community

Read the Apache Submarine Community Guide

How to contribute Contributing Guide

Issue Tracking: https://issues.apache.org/jira/projects/SUBMARINE

User Document

See User Guide Home Page

Developper Document

See Developper Guide Home Page

Roadmap

What to know more about what's coming for Submarine? Please check the roadmap out: https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap

License

The Apache Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.