commit | 929294341ef5c9d9a423fad6f75da3406086efec | [log] [tgz] |
---|---|---|
author | Kai-Hsun Chen <b03901153@ntu.edu.tw> | Thu Jul 22 11:25:26 2021 +0800 |
committer | Kevin <pingsutw@apache.org> | Mon Jul 26 13:22:01 2021 +0000 |
tree | 12d71046c58ed8d824657859724f0ea89282abc3 | |
parent | 6fe72e294afed2711556837c38457d26bad8f95d [diff] |
SUBMARINE-942. Make experiment ID consistent with TFJob and PyTorch Job ### What is this PR for? 1. Please refer to the following two JIRA issues. * https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-886?filter=reportedbyme * https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-880?filter=reportedbyme In this JIRA issue, we need to make the experiment ID consistent with the name of TFJob and PyTorch Job. The difference is caused by this [link](https://github.com/apache/submarine/commit/8da9f478de9323dd098d06bbd52afdda3a27ce07#diff-c1cad0951ad3663eb075d964171d74ae94a35bec2b79f8c9894030c0446a6b99R119). Update ExperimentId.java. 2. Update [IntegrationTestK8s.md](http://submarine.apache.org/docs/devDocs/IntegrationTestK8s) * "submarine-server-core" is a dependency package specified in pom.xml of test-k8s. * In the document [BuildFromCode.md](http://submarine.apache.org/docs/devDocs/BuildFromCode), the command to build Submarine is `mvn clean package -DskipTests`. However, the package will be installed into the local repository at **install phase**, a later phase than both **package and verify phases**. * Hence, we need to execute `mvn install -DskipTests` to ensure that the **test-k8s** module uses the latest "submarine-server-core" module. 3. Remove the field `name` in Experiment.java because the value of `name` is the same as `experimentId`. ### What type of PR is it? [Improvement] ### Todos ### What is the Jira issue? https://issues.apache.org/jira/browse/SUBMARINE-942 ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Do the license files need updating? No * Are there breaking changes for older versions? No * Does this need new documentation? No Author: Kai-Hsun Chen <b03901153@ntu.edu.tw> Signed-off-by: Kevin <pingsutw@apache.org> Closes #683 from kevin85421/SUBMARINE-942 and squashes the following commits: 594e46ff [Kai-Hsun Chen] Remove experiment name f051ff87 [Kai-Hsun Chen] Update IntegrationTestK8s.md c88a384a [Kai-Hsun Chen] Refactor b81c2daa [Kai-Hsun Chen] Refactor 495ceb32 [Kai-Hsun Chen] SUBMARINE-942. Make experiment ID consistent with TFJob and PyTorch Job ffd2c168 [Kai-Hsun Chen] SUBMARINE-942. Make experiment ID consistent with TFJob and PyTorch Job (cherry picked from commit 610535806c6208e53dbe0c2889cc944cf32835c5) Signed-off-by: Kevin <pingsutw@apache.org>
Apache Submarine (Submarine for short) is an End-to-End Machine Learning Platform to allow data scientists to create end-to-end machine learning workflows. On Submarine, data scientists can finish each stage in the ML model lifecycle, including data exploration, data pipeline creation, model training, serving, and monitoring.
Some open-source and commercial projects are trying to build an end-to-end ML platform. What's the vision of Submarine?
Theodore Levitt once said:
“People don’t want to buy a quarter-inch drill. They want a quarter-inch hole.”
experiment
on prem or cloud via easy-to-use UI/API/SDK.experiment
and dependencies of environment
.As mentioned above, Submarine attempts to provide Data-Scientist-friendly UI to make data scientists have a good user experience. Here're some examples.
# New a submarine client of the submarine server submarine_client = submarine.ExperimentClient(host='http://localhost:8080') # The experiment's environment, could be Docker image or Conda environment based environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0') # Specify the experiment's name, framework it's using, namespace it will run in, # the entry point. It can also accept environment variables. etc. # For PyTorch job, the framework should be 'Pytorch'. experiment_meta = ExperimentMeta(name='mnist-dist', namespace='default', framework='Tensorflow', cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100') # 1 PS task of 2 cpu, 1GB ps_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M', replicas=1) # 1 Worker task worker_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M', replicas=1) # Wrap up the meta, environment and task specs into an experiment. # For PyTorch job, the specs would be "Master" and "Worker". experiment_spec = ExperimentSpec(meta=experiment_meta, environment=environment, spec={'Ps':ps_spec, 'Worker': worker_spec}) # Submit the experiment to submarine server experiment = submarine_client.create_experiment(experiment_spec=experiment_spec) # Get the experiment ID id = experiment['experimentId']
submarine_client.get_experiment(id)
submarine_client.wait_for_finish(id)
submarine_client.get_log(id)
submarine_client.list_experiments(status='running')
For a quick-start, see Submarine On K8s
(Available on 0.6.0, see Roadmap)
If you want to know more about Submarine's architecture, components, requirements and design doc, they can be found on Architecture-and-requirement
Detailed design documentation, implementation notes can be found at: Implementation notes
Read the Apache Submarine Community Guide
How to contribute Contributing Guide
Issue Tracking: https://issues.apache.org/jira/projects/SUBMARINE
What to know more about what's coming for Submarine? Please check the roadmap out: https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap
The Apache Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.