commit | b2e65c7d0efbfac1ba96051d8f8f7c967b07917a | [log] [tgz] |
---|---|---|
author | cdmikechen <cdmikechen@apache.org> | Wed Apr 05 19:38:07 2023 +0800 |
committer | cdmikechen <cdmikechen@apache.org> | Sun Apr 30 12:43:48 2023 +0800 |
tree | 98a800dc086e9c9073af866df487e5538b314dcf | |
parent | 22df22f1b1952babea6de5eebae64d8b0c28f7e3 [diff] |
SUBMARINE-1327. Upgrade K8s to support the latest 3 releases (1.22-1.24) ### What is this PR for? Update k8s to 1.22, 1.23, 1.24. PodSecurityPolicy `policy/v1beta1` is deprecated in v1.21+, unavailable in v1.25+, so that we will deal with the 1.22-1.24 for the time being and then try to upgrade to 1.25 or 1.26 once all the related issues have been resolved. ### What type of PR is it? Feature ### Todos * [x] - Update Notebook-Controller-Operator to 1.7.0 * [x] - Update Training-Operator to 1.6.0 * [x] - Update Seldon-Core-Operator to 1.15.1 * [x] - Update Submarine CRD/Operator-API to v1 * [x] - Git Action update * [x] - Test NoteBook (Blocked issue solved by PR https://github.com/apache/submarine/pull/1058 ) * [x] - Test Tf Experiment * [x] - Test PyTorch Experiment * [x] - Test Serving * [x] - Update documents Some issues were found during testing but remain unresolved and need to be looked at to see if they need to be resolved in 0.8.0: * [ ] - Training-operator change the `job-name` of the pod label to`training.kubeflow.org/job-name` after the upgrade from 1.3.0 to 1.6.0. ### What is the Jira issue? https://issues.apache.org/jira/browse/SUBMARINE-1327 ### How should this be tested? Github CI ### Screenshots (if appropriate) No ### Questions: * Do the license files need updating? Yes * Are there breaking changes for older versions? Yes * Does this need new documentation? Yes Author: cdmikechen <cdmikechen@apache.org> Signed-off-by: cdmikechen <cdmikechen@apache.org> Closes #1060 from cdmikechen/SUBMARINE-1327 and squashes the following commits: 9b0f2f02 [cdmikechen] Update serving document f6f25a38 [cdmikechen] Fix training-operator job name 46945971 [cdmikechen] Support for notebook-controller 1.6.0+ 478c0415 [cdmikechen] Replace v2 to v3 df1ee518 [cdmikechen] Fix version 962cf848 [cdmikechen] Down to 1.17.2 bdb3a90a [cdmikechen] Update Helm CR version to v1 Update istio to 1.17.1 Update Go to 1.19.7
Apache Submarine (Submarine for short) is an End-to-End Machine Learning Platform to allow data scientists to create end-to-end machine learning workflows. On Submarine, data scientists can finish each stage in the ML model lifecycle, including data exploration, data pipeline creation, model training, serving, and monitoring.
Some open-source and commercial projects are trying to build an end-to-end ML platform. What's the vision of Submarine?
Theodore Levitt once said:
“People don’t want to buy a quarter-inch drill. They want a quarter-inch hole.”
experiment
on prem or cloud via easy-to-use UI/API/SDK.experiment
and dependencies of environment
.As mentioned above, Submarine attempts to provide Data-Scientist-friendly UI to make data scientists have a good user experience. Here're some examples.
# New a submarine client of the submarine server submarine_client = submarine.ExperimentClient(host='http://localhost:8080') # The experiment's environment, could be Docker image or Conda environment based environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0') # Specify the experiment's name, framework it's using, namespace it will run in, # the entry point. It can also accept environment variables. etc. # For PyTorch job, the framework should be 'Pytorch'. experiment_meta = ExperimentMeta(name='mnist-dist', namespace='default', framework='Tensorflow', cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100') # 1 PS task of 2 cpu, 1GB ps_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M', replicas=1) # 1 Worker task worker_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M', replicas=1) # Wrap up the meta, environment and task specs into an experiment. # For PyTorch job, the specs would be "Master" and "Worker". experiment_spec = ExperimentSpec(meta=experiment_meta, environment=environment, spec={'Ps':ps_spec, 'Worker': worker_spec}) # Submit the experiment to submarine server experiment = submarine_client.create_experiment(experiment_spec=experiment_spec) # Get the experiment ID id = experiment['experimentId']
submarine_client.get_experiment(id)
submarine_client.wait_for_finish(id)
submarine_client.get_log(id)
submarine_client.list_experiments(status='running')
For a quick-start, see Submarine On K8s
(Available on 0.5.0, see Roadmap)
If you want to know more about Submarine's architecture, components, requirements and design doc, they can be found on Architecture-and-requirement
Detailed design documentation, implementation notes can be found at: Implementation notes
Read the Apache Submarine Community Guide
How to contribute Contributing Guide
Login Submarine slack channel: https://join.slack.com/t/asf-submarine/shared_invite
Issue Tracking: https://issues.apache.org/jira/projects/SUBMARINE
What to know more about what's coming for Submarine? Please check the roadmap out: https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap
From here, you can know the changelog and the issue tracker of different version of Apache Submarine.
Apache submarine: a unified machine learning platform made simple at EuroMLSys '22
The Apache Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.