[{"title":"Running Submarine on YARN","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/","content":"","keywords":""},{"title":"Hadoop version​","type":1,"pageTitle":"Running Submarine on YARN","url":"docs/0.6.0/adminDocs/yarn/#hadoop-version","content":"Must: Apache Hadoop version newer than 2.7.3 Optional: When you want to use GPU-on-YARN feature with Submarine, please make sure Hadoop is at least 2.10.0+ (or 3.1.0+), and follow Enable GPU on YARN 2.10.0+ to enable GPU-on-YARN feature.When you want to run training jobs with Docker container, please make sure Hadoop is at least 2.8.2, and follow Enable Docker on YARN 2.8.2+ to enable Docker-on-YARN feature. "},{"title":"Submarine YARN Runtime Guide​","type":1,"pageTitle":"Running Submarine on YARN","url":"docs/0.6.0/adminDocs/yarn/#submarine-yarn-runtime-guide","content":"YARN Runtime Guide talk about how to use Submarine to run jobs on YARN, with Docker / without Docker. "},{"title":"README","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/workbench/","content":"","keywords":""},{"title":"Register​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#register","content":"Everyone who needs to use Submarine for machine learning algorithm development can log in to Submarine Workbench's WEB homepage. On the homepage, click the registration link, fill in the user name, email address and password to register the user. At this time, the user status is waiting for approval status. After receiving the registration request from the user in Submarine Workbench, the administrator sets the operation authority according to the user's needs, sets the user's organization and allocates resources, and sets the user status to pass the audit. The user can log in to the Submarine Workbench. Different users have different permission. "},{"title":"Login​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#login","content":"Each Submarine user logs in to the Home page of Submarine Workbench by entering their username and password on the Login page. "},{"title":"Home​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#home","content":"In the Submarine Workbench Home page, the top level shows the user's resource usage and task execution through four charts. In the Quick Start list, the most commonly used feature links in the Workbench are displayed so that users can work quickly. In the Open Recent list, there are nine items that the user has used recently, so you can work quickly. At What's New? In the list, some of the latest features and project information released by Submarine are displayed to help you understand the latest developments in the Submarine project. "},{"title":"Workspace​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#workspace","content":"Workspace consists primarily of five tab pages, with the total number of items in each tab page's title. "},{"title":"Project​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#project","content":"In the Project page, all the projects created by the user themselves are displayed as cards.  Each Project card consists of the following sections: Project Type：Submarine currently supports six types of machine learning algorithm frameworks and development languages: Notebook, Python, R, Scala, Tensorflow, and PyTorch, which are identified by corresponding icons in the project card.Project Tags：Users can tag each Project with different tags for easy searching and management.Github/Gitlab integrated：Submarine Workbench is system integrated with Github/Gitlab, and each Project can perform Watch, Star, Fork, and Comment operations in Workbench. Watch：[TODO]Star：[TODO]Fork：[TODO]Comment：Users can comment on the project. Edit：Users can open projects in Notebook and perform algorithm development by double-clicking on the project or by clicking the Edit button.Download：The user downloads the project package locally by clicking the Download button.Setting：Edit project information such as project name, profile, visibility level and permissions.Delete：Delete the project and all included files. Add New Project​ Clicking the Add New Project button on the project page will display the guide page for creating the project, and you can create a new project in just three steps. Step 1: Fill in the project name and project description in the Base Information step.  Visibility: Set the visibility level of the item externally Private: (Default) Set to private project, and all the files included in the project are not publicly displayed. but the execution result of the project can be individually set and exposed in Notebook, so that others can view the visual report of the project.Team: Set to team project, select the team name in the team selection box, and other members of the team can access the project according to the set permissions.Public: Set to public project, all users in Workbench can view this project through search. Permission: Set the external access rights of the project. The permission setting interface will appear only when the Visibility of the project is set to Team or Public. Can View When the project's Visibility is set to Team, other members of the team can only view the files for this project. When the project's Visibility is set to Public, other members of the Workbench can only view the files for this project. Can Edit When the project's Visibility is set to Team, other members of the team can view and edit the files for this project. When the project's Visibility is set to Public, other members of the Workbench can view and edit the files for this project. Can Execute When the project's Visibility is set to Team, other members of the team can view, edit, and execute the project's files. When the project's Visibility is set to Public, other members of the Workbench can view, edit, and execute the project's files. Step 2: In the Initial Project step, Workbench provides four ways to initialize the project. Template: Workbench Project templates with several different development languages and algorithm frameworks are built in. You can choose any template to initialize your project and you can execute it directly in Notebook without any modification. It is especially suitable for novices to experience quickly. Blank：Create a blank project, and later we can manually add the project's file in Notebook Upload: Initialize your project by uploading a file in notebook format that is compatible with the Jupyter Notebook and Zeppelin Notebook file formats. Git Repo: Fork a file in the repository to initialize the project in your Github/Gitlab account. Step 3：Preview the included files in the project  Save: Save the project to Workspace.Open In Notebook: Save the project to Workspace and open the project with Notebook. "},{"title":"Release​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#release","content":"[TODO] "},{"title":"Training​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#training","content":"[TODO] "},{"title":"Team​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#team","content":"[TODO] "},{"title":"Shared​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#shared","content":"[TODO] "},{"title":"Interpreters​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#interpreters","content":"[TODO] "},{"title":"Job​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#job","content":"[TODO] "},{"title":"Data​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#data","content":"[TODO] "},{"title":"Model​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#model","content":"[TODO] "},{"title":"Manager​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#manager","content":""},{"title":"User​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#user","content":"[TODO] "},{"title":"Team​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#team-1","content":"[TODO] "},{"title":"Data Dict​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#data-dict","content":"[TODO] "},{"title":"Department​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#department","content":"[TODO] "},{"title":"How to run workbench​","type":1,"pageTitle":"README","url":"docs/0.6.0/adminDocs/yarn/workbench/#how-to-run-workbench","content":"How To Run Submarine Workbench Guide "},{"title":"Test and Troubleshooting","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting","content":"","keywords":""},{"title":"Test with a tensorflow job​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#test-with-a-tensorflow-job","content":"Distributed-shell + GPU + cgroup  ... \\ job run \\ --env DOCKER_JAVA_HOME=/opt/java \\ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu \\ --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \\ --worker_docker_image tf-1.13.1-gpu:0.0.1 \\ --ps_docker_image tf-1.13.1-cpu:0.0.1 \\ --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \\ --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \\ --num_ps 0 \\ --ps_resources memory=4G,vcores=2,gpu=0 \\ --ps_launch_cmd &quot;python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data --job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0&quot; \\ --worker_resources memory=4G,vcores=2,gpu=1 --verbose \\ --num_workers 1 \\ --worker_launch_cmd &quot;python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data --job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 --eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1&quot;  "},{"title":"Issues:​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issues","content":""},{"title":"Issue 1: Fail to start nodemanager after system reboot​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issue-1-fail-to-start-nodemanager-after-system-reboot","content":"2018-09-20 18:54:39,785 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems! org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:cpu Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:389) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997) 2018-09-20 18:54:39,789 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED  Solution: Grant user yarn the access to /sys/fs/cgroup/cpu,cpuacct, which is the subfolder of cgroup mount destination. chown :yarn -R /sys/fs/cgroup/cpu,cpuacct chmod g+rwx -R /sys/fs/cgroup/cpu,cpuacct  If GPUs are used，the access to cgroup devices folder is neede as well chown :yarn -R /sys/fs/cgroup/devices chmod g+rwx -R /sys/fs/cgroup/devices  "},{"title":"Issue 2: container-executor permission denied​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issue-2-container-executor-permission-denied","content":"2018-09-21 09:36:26,102 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: IOException executing command: java.io.IOException: Cannot run program &quot;/etc/yarn/sbin/Linux-amd64-64/container-executor&quot;: error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.util.Shell.runCommand(Shell.java:938) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)  Solution: The permission of /etc/yarn/sbin/Linux-amd64-64/container-executor should be 6050 "},{"title":"Issue 3：How to get docker service log​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issue-3how-to-get-docker-service-log","content":"Solution: we can get docker log with the following command journalctl -u docker  "},{"title":"Issue 4：docker can't remove containers with errors like device or resource busy​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issue-4docker-cant-remove-containers-with-errors-like-device-or-resource-busy","content":"$ docker rm 0bfafa146431 Error response from daemon: Unable to remove filesystem for 0bfafa146431771f6024dcb9775ef47f170edb2f1852f71916ba44209ca6120a: remove /app/docker/containers/0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a/shm: device or resource busy  Solution: to find which process leads to a device or resource busy, we can add a shell script, named find-busy-mnt.sh #!/usr/bin/env bash # A simple script to get information about mount points and pids and their # mount namespaces. if [ $# -ne 1 ];then echo &quot;Usage: $0 &lt;devicemapper-device-id&gt;&quot; exit 1 fi ID=$1 MOUNTS=`find /proc/*/mounts | xargs grep $ID 2&gt;/dev/null` [ -z &quot;$MOUNTS&quot; ] &amp;&amp; echo &quot;No pids found&quot; &amp;&amp; exit 0 printf &quot;PID\\tNAME\\t\\tMNTNS\\n&quot; echo &quot;$MOUNTS&quot; | while read LINE; do PID=`echo $LINE | cut -d &quot;:&quot; -f1 | cut -d &quot;/&quot; -f3` # Ignore self and thread-self if [ &quot;$PID&quot; == &quot;self&quot; ] || [ &quot;$PID&quot; == &quot;thread-self&quot; ]; then continue fi NAME=`ps -q $PID -o comm=` MNTNS=`readlink /proc/$PID/ns/mnt` printf &quot;%s\\t%s\\t\\t%s\\n&quot; &quot;$PID&quot; &quot;$NAME&quot; &quot;$MNTNS&quot; done  Kill the process by pid, which is found by the script $ chmod +x find-busy-mnt.sh ./find-busy-mnt.sh 0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a # PID NAME MNTNS # 5007 ntpd mnt:[4026533598] $ kill -9 5007  "},{"title":"Issue 5：Yarn failed to start containers​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/adminDocs/yarn/TestAndTroubleshooting#issue-5yarn-failed-to-start-containers","content":"if the number of GPUs required by applications is larger than the number of GPUs in the cluster, there would be some containers can't be created. "},{"title":"HowToRun","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun","content":"","keywords":""},{"title":"Two versions of Submarine Workbench​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#two-versions-of-submarine-workbench","content":"Angular (default)Vue (This is the old version, and it will be replaced by version Angular in the future.) (WARNING: Please restart a new incognito window when you switch to different versions of Submarine Workbench)​ "},{"title":"Launch the Submarine Workbench(Angular)​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#launch-the-submarine-workbenchangular","content":"It should be noted that since Submarine Workbench depends on the Submarine database, so you need to run the docker container of the Submarine database first. docker run -it -p 3306:3306 -d --name submarine-database -e MYSQL_ROOT_PASSWORD=password apache/submarine:database-&lt;REPLACE_VERSION&gt; docker run -it -p 8080:8080 -d --link=submarine-database:submarine-database --name submarine-server apache/submarine:server-&lt;REPLACE_VERSION&gt;  The login page of Submarine Workbench will be shown in http://127.0.0.1:8080. "},{"title":"Check the data in the submarine-database​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#check-the-data-in-the-submarine-database","content":"Step1: Enter the submarine-database container docker exec -it submarine-database bash  Step2: Enter MySQL database mysql -uroot -ppassword  Step3: List the data in the table // list all databases show databases; // choose a database use ${target_database}; // list all tables show tables; // list the data in the table select * from ${target_table};  Run Submarine Workbench without docker "},{"title":"Run Submarine Workbench​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#run-submarine-workbench","content":"cd submarine ./bin/submarine-daemon.sh [start|stop|restart]  To start workbench server, you need to download MySQL jdbc jar and put it in the path of workbench/lib for the first time. Or you can add parameter, getMysqlJar, to get MySQL jar automatically. cd submarine ./bin/submarine-daemon.sh start getMysqlJar  "},{"title":"submarine-env.sh​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#submarine-envsh","content":"submarine-env.sh is automatically executed each time the submarine-daemon.sh script is executed, so we can set the submarine-daemon.sh script and the environment variables in the SubmarineServer process via submarine-env.sh. Name\tVariableJAVA_HOME\tSet your java home path, default is java. SUBMARINE_JAVA_OPTS\tSet the JAVA OPTS parameter when the Submarine Workbench process starts. If you need to debug the Submarine Workbench process, you can set it to -agentlib:jdwp=transport=dt_socket, server=y,suspend=n,address=5005 SUBMARINE_MEM\tSet the java memory parameter when the Submarine Workbench process starts. MYSQL_JAR_URL\tThe customized URL to download MySQL jdbc jar. MYSQL_VERSION\tThe version of MySQL jdbc jar to downloaded. The default value is 5.1.39. It's used to generate the default value of MYSQL_JDBC_URL "},{"title":"submarine-site.xml​","type":1,"pageTitle":"HowToRun","url":"docs/0.6.0/adminDocs/yarn/workbench/HowToRun#submarine-sitexml","content":"submarine-site.xml is the configuration file for the entire Submarine system to run. Name\tVariablesubmarine.server.addr\tSubmarine server address, default is 0.0.0.0 submarine.server.port\tSubmarine server port, default 8080 submarine.ssl\tShould SSL be used by the Submarine servers?, default false submarine.server.ssl.port\tServer ssl port. (used when ssl property is set to true), default 8483 submarine.ssl.client.auth\tShould client authentication be used for SSL connections? submarine.ssl.keystore.path\tPath to keystore relative to Submarine configuration directory submarine.ssl.keystore.type\tThe format of the given keystore (e.g. JKS or PKCS12) submarine.ssl.keystore.password\tKeystore password. Can be obfuscated by the Jetty Password tool submarine.ssl.key.manager.password\tKey Manager password. Defaults to keystore password. Can be obfuscated. submarine.ssl.truststore.path\tPath to truststore relative to Submarine configuration directory. Defaults to the keystore path submarine.ssl.truststore.type\tThe format of the given truststore (e.g. JKS or PKCS12). Defaults to the same type as the keystore type submarine.ssl.truststore.password\tTruststore password. Can be obfuscated by the Jetty Password tool. Defaults to the keystore password workbench.web.war\tSubmarine Workbench web war file path. "},{"title":"setup-jupyter","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/workbench/notebook/setup-jupyter","content":"","keywords":""},{"title":"Experiment environment​","type":1,"pageTitle":"setup-jupyter","url":"docs/0.6.0/adminDocs/yarn/workbench/notebook/setup-jupyter#experiment-environment","content":""},{"title":"Setup Kubernetes​","type":1,"pageTitle":"setup-jupyter","url":"docs/0.6.0/adminDocs/yarn/workbench/notebook/setup-jupyter#setup-kubernetes","content":"We recommend using kind to setup a Kubernetes cluster on a local machine. You can use Extra mounts to mount your host path to kind node and use Extra port mappings to port forward to the kind nodes. Please refer to kind configurationfor more details. You need to create a kind config file. The following is an example : kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraMounts: # add a mount from /path/to/my/files on the host to /files on the node - hostPath: /tmp/submarine containerPath: /tmp/submarine extraPortMappings: - containerPort: 80 hostPort: 80 protocol: TCP # exposing additional ports to be used for NodePort services - containerPort: 30070 hostPort: 8888 protocol: TCP  Running the following command: kind create cluster --image kindest/node:v1.15.6 --config &lt;path-to-kind-config&gt; --name k8s-submarine kubectl create namespace submarine  "},{"title":"Deploy Jupyter Notebook​","type":1,"pageTitle":"setup-jupyter","url":"docs/0.6.0/adminDocs/yarn/workbench/notebook/setup-jupyter#deploy-jupyter-notebook","content":"Once you have a running Kubernetes cluster, you can write a YAML file to deploy a jupyter notebook. In this example yaml, we use jupyter/minimal-notebookto make a single notebook running on the kind node. kubectl apply -f jupyter.yaml --namespace submarine  Once jupyter notebook is running, you can access the notebook server from the browser using http://localhost:8888 on local machine. You can enter and store a password for your notebook server with: kubectl exec -it &lt;jupyter-pod-name&gt; -- jupyter notebook password  After restarting the notebook server, you can login jupyter notebook with your new password. If you want to use JupyterLab : http://localhost:8888/lab  "},{"title":"README.zh-CN","type":0,"sectionRef":"#","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN","content":"","keywords":""},{"title":"Register​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#register","content":"每个需要使用 Submarine 进行机器学习算法开发的用户，都可以登录 Submarine Workbench 的 WEB 首页，在首页上，点击注册链接，填写用户名、注册邮箱和密码就可以完成注册，但此时用户状态为 等待审核 状态。 管理员在 Submarine Workbench 中接收到用户的注册请求后，设置用户的操作权限，所属机构部门和分配资源，设置用户状态为 审核通过 后，用户才可以登录 Submarine Workbench。 "},{"title":"Login​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#login","content":"每个 Submarine 的用户在 Login 页面中输入用户名和密码，登录到 Submarine Workbench 的首页 Home。 "},{"title":"Home​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#home","content":"在 Submarine Workbench 的 Home 首页中，顶层通过四个图表显示了用户的资源的使用情况和任务执行的情况。 在 Quick Start 列表中，显示了 Workbench 中最常使用的功能链接，方便用户可以快速的进行工作。 在 Open Recent 列表中，显示了用户最近使用过的九个项目，方便你快速的进行工作。 在 What‘s New？ 列表中，显示了 Submarine 最新发布的一些功能特性和项目信息，方便你了解 Submarine 项目的最新进展。 "},{"title":"Workspace​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#workspace","content":"Workspace 主要有五个 Tab 页组成，每个 Tab 页的标题中显示了各自项目的总数。 "},{"title":"Project​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#project","content":"在 Project 页面中，以卡片的方式显示了用户自己创建的所有 Project。  每个 Project 卡片由以下部分内容组成： Project 类型：目前 Submarine 支持 Notebook、Python、R、Scala、Tensorflow 和 PyTorch 这六种类型的机器学习算法框架和开发语言，在项目卡片中以对应的图标进行标识。Project Tags：用户可以为每个 Project 打上不同的 Tag 标签，方便查找和管理。Github/Gitlab 集成：Submarine Workbench 与 Github/Gitlab 进行了系统集成，每个 Project 都可以在 Workbench 中进行 Watch、Star、Frok 和 Comment 操作。 Watch：[TODO]Star：[TODO]Fork：[TODO]Comment：用户可以在项目中进行评论 Edit：用户通过双击项目或者点击 Edit 按钮，可以在 Notebook 中打开项目，进行算法开发等操作。Download：用户通过点击 Download 按钮，将项目打包下载到本地。Setting：编辑项目信息，例如项目的名字，简介，分享级别和权限。Delete：删除项目中所有包含的文件。 Add New Project​ 在项目页面中点击 Add New Project 按钮，将会显示出创建项目的引导页面，只需要三个步骤就可以创建一个新的项目。 第一步：在 Base Information 步骤中填写项目名称、项目简介。  Visibility: 设置项目对外的可见级别 Private: （默认）设置为私有项目，不对外公开项目中包含的所有文件，但是可以在 Notebook 中将项目的执行结果单独设置公开，方便其他人查看项目的可视化报告。Team: 设置为团队项目，在团队选择框中选择团队的名称，团队的其他成员可以根据设置的权限访问这个项目。Public: 设置为公开项目，Workbench 中的所有用户都可以通过搜索查看到这个项目。 Permission: 设置项目对外的访问权限，只有将项目的 Visibility 设置为 Team 或 Public 的时候，才会出现权限设置界面。 Can View 当项目的 Visibility 设置为 Team 时，团队中其他成员都只能查看这个项目的文件。 当项目的 Visibility 设置为 Public 时，Workbench 中其他成员都只能查看这个项目的文件。 Can Edit 当项目的 Visibility 设置为 Team 时，团队中其他成员都可以查看、编辑这个项目的文件。 当项目的 Visibility 设置为 Public 时，Workbench 中其他成员都可以查看、编辑这个项目的文件。 Can Execute 当项目的 Visibility 设置为 Team 时，团队中其他成员都可以查看、编辑、执行这个项目的文件。 当项目的 Visibility 设置为 Public 时，Workbench 中其他成员都可以查看、编辑、执行这个项目的文件。 第二步：在 Initial Project 步骤中，Workbench 提供了四种项目初始化的方式 Template: Workbench 内置了几种不同开发语言和算法框架的项目模版，你可以选择任何一种模版初始化你的项目，无需做任何修改就可以直接在 Notebook 中执行，特别适合新手进行快速的体验。 Blank：创建一个空白的项目，稍后，我们可以通过在 Notebook 中手工添加项目的文件 Upload: 通过上传 notebook 格式的文件来初始化你的项目，notebook 格式兼容 Jupyter Notebook 和 Zeppelin Notebook 文件格式。 Git Repo: 在你的 Github/Gitlab 账号中 Fork 一个仓库中的文件内容来初始化项目。 第三步：预览项目中的所包含的文件  Save: 将项目保存到 Workspace 中。Open In Notebook: 将项目保存到 Workspace 中，并用 Notebook 打开项目。 "},{"title":"Release​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#release","content":"[TODO] "},{"title":"Training​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#training","content":"[TODO] "},{"title":"Team​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#team","content":"[TODO] "},{"title":"Shared​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#shared","content":"[TODO] "},{"title":"Interpreters​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#interpreters","content":"[TODO] "},{"title":"Job​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#job","content":"[TODO] "},{"title":"Data​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#data","content":"[TODO] "},{"title":"Model​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#model","content":"[TODO] "},{"title":"Manager​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#manager","content":""},{"title":"User​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#user","content":"[TODO] "},{"title":"Team​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#team-1","content":"[TODO] "},{"title":"Data Dict​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#data-dict","content":"[TODO] "},{"title":"Department​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#department","content":"[TODO] "},{"title":"How to run workbench​","type":1,"pageTitle":"README.zh-CN","url":"docs/0.6.0/adminDocs/yarn/workbench/README.zh-CN#how-to-run-workbench","content":"How To Run Submarine Workbench Guide "},{"title":"Environment REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/api/environment","content":"","keywords":""},{"title":"Create Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/api/environment#create-environment","content":"POST /api/v1/environment Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.5.0&quot;, &quot;pyarrow==0.17.0&quot;] } } ' http://127.0.0.1:32080/api/v1/environment  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.5.0&quot;, &quot;pyarrow==0.17.0&quot;] } } } }  "},{"title":"List environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/api/environment#list-environment","content":"GET /api/v1/environment Example Request: curl -X GET http://127.0.0.1:32080/api/v1/environment  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: [ { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.5.0&quot;, &quot;pyarrow==0.17.0&quot;] } } }, { &quot;environmentId&quot;: &quot;environment_1586156073228_0002&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env-2&quot;, &quot;dockerImage&quot; : &quot;continuumio/miniconda&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_miniconda_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;], &quot;pipDependencies&quot; : [], } } } ] }  "},{"title":"Get environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/api/environment#get-environment","content":"GET /api/v1/environment/{name} Example Request: curl -X GET http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.5.0&quot;, &quot;pyarrow==0.17.0&quot;] } } } }  "},{"title":"Patch environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/api/environment#patch-environment","content":"PATCH /api/v1/environment/{name} Example Request: curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } ' http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;success&quot;: true, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } } }  dockerImage, &quot;name&quot; (of kernelSpec), &quot;channels&quot;, &quot;condaDependencies&quot;, &quot;pipDependencies&quot; etc can be updated using this API. &quot;name&quot; of EnvironmentSpec is not supported. "},{"title":"Delete environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/api/environment#delete-environment","content":"GET /api/v1/environment/{name} Example Request: curl -X DELETE http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } } }  "},{"title":"Experiment REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/api/experiment","content":"","keywords":""},{"title":"Create Experiment (Using Anonymous/Embedded Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#create-experiment-using-anonymousembedded-environment","content":"POST /api/v1/experiment Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;experimentId&quot;: &quot;experiment_1586156073228_0001&quot;, &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;uid&quot;: &quot;28e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:59:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } } }  "},{"title":"Create Experiment (Using Pre-defined/Stored Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#create-experiment-using-pre-definedstored-environment","content":"POST /api/v1/experiment Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  Above example assume environment &quot;my-submarine-env&quot; already exists in Submarine. Please refer Environment API Reference doc to Create/Update/Delete/List Environment REST API's Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;experimentId&quot;: &quot;experiment_1586156073228_0001&quot;, &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;uid&quot;: &quot;28e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:59:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } } }  "},{"title":"List experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#list-experiment","content":"GET /api/v1/experiment Example Request: curl -X GET http://127.0.0.1:32080/api/v1/experiment  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: [ { &quot;experimentId&quot;: &quot;experiment_1592057447228_0001&quot;, &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;uid&quot;: &quot;28e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:59:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } }, { &quot;experimentId&quot;: &quot;experiment_1592057447228_0002&quot;, &quot;name&quot;: &quot;mnist&quot;, &quot;uid&quot;: &quot;38e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:19:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;pytorch-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;PyTorch&quot;, &quot;cmd&quot;: &quot;python /var/mnist.py --backend gloo&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:pytorch-dist-mnist-1.0&quot; }, &quot;spec&quot;: { &quot;Master&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } } } ] }  "},{"title":"Get experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#get-experiment","content":"GET /api/v1/experiment/{id} Example Request: curl -X GET http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;experimentId&quot;: &quot;experiment_1592057447228_0001&quot;, &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;uid&quot;: &quot;28e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:59:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } } }  "},{"title":"Patch experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#patch-experiment","content":"PATCH /api/v1/experiment/{id} Example Request: curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;success&quot;: true, &quot;result&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } }  "},{"title":"Delete experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#delete-experiment","content":"GET /api/v1/experiment/{id} Example Request: curl -X DELETE http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;experimentId&quot;: &quot;experiment_1586156073228_0001&quot;, &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;uid&quot;: &quot;28e39dcd-77d4-11ea-8dbb-0242ac110003&quot;, &quot;status&quot;: &quot;Accepted&quot;, &quot;acceptedTime&quot;: &quot;2020-06-13T22:59:29.000+08:00&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } } }  "},{"title":"List experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#list-experiment-log","content":"GET /api/v1/experiment/logs Example Request: curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;success&quot;: null, &quot;message&quot;: null, &quot;result&quot;: [ { &quot;experimentId&quot;: &quot;experiment_1589199154923_0001&quot;, &quot;logContent&quot;: [ { &quot;podName&quot;: &quot;mnist-worker-0&quot;, &quot;podLog&quot;: null } ] }, { &quot;experimentId&quot;: &quot;experiment_1589199154923_0002&quot;, &quot;logContent&quot;: [ { &quot;podName&quot;: &quot;pytorch-dist-mnist-gloo-master-0&quot;, &quot;podLog&quot;: null }, { &quot;podName&quot;: &quot;pytorch-dist-mnist-gloo-worker-0&quot;, &quot;podLog&quot;: null } ] } ], &quot;attributes&quot;: {} }  "},{"title":"Get experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/api/experiment#get-experiment-log","content":"GET /api/v1/experiment/logs/{id} Example Request: curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs/experiment_1589199154923_0002  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;success&quot;: null, &quot;message&quot;: null, &quot;result&quot;: { &quot;experimentId&quot;: &quot;experiment_1589199154923_0002&quot;, &quot;logContent&quot;: [ { &quot;podName&quot;: &quot;pytorch-dist-mnist-gloo-master-0&quot;, &quot;podLog&quot;: &quot;Using distributed PyTorch with gloo backend\\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\\nProcessing...\\nDone!\\nTrain Epoch: 1 [0/60000 (0%)]\\tloss=2.3000\\nTrain Epoch: 1 [640/60000 (1%)]\\tloss=2.2135\\nTrain Epoch: 1 [1280/60000 (2%)]\\tloss=2.1704\\nTrain Epoch: 1 [1920/60000 (3%)]\\tloss=2.0766\\nTrain Epoch: 1 [2560/60000 (4%)]\\tloss=1.8679\\nTrain Epoch: 1 [3200/60000 (5%)]\\tloss=1.4135\\nTrain Epoch: 1 [3840/60000 (6%)]\\tloss=1.0003\\nTrain Epoch: 1 [4480/60000 (7%)]\\tloss=0.7762\\nTrain Epoch: 1 [5120/60000 (9%)]\\tloss=0.4598\\nTrain Epoch: 1 [5760/60000 (10%)]\\tloss=0.4860\\nTrain Epoch: 1 [6400/60000 (11%)]\\tloss=0.4389\\nTrain Epoch: 1 [7040/60000 (12%)]\\tloss=0.4084\\nTrain Epoch: 1 [7680/60000 (13%)]\\tloss=0.4602\\nTrain Epoch: 1 [8320/60000 (14%)]\\tloss=0.4289\\nTrain Epoch: 1 [8960/60000 (15%)]\\tloss=0.3990\\nTrain Epoch: 1 [9600/60000 (16%)]\\tloss=0.3852\\n&quot; }, { &quot;podName&quot;: &quot;pytorch-dist-mnist-gloo-worker-0&quot;, &quot;podLog&quot;: &quot;Using distributed PyTorch with gloo backend\\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\\nProcessing...\\nDone!\\nTrain Epoch: 1 [0/60000 (0%)]\\tloss=2.3000\\nTrain Epoch: 1 [640/60000 (1%)]\\tloss=2.2135\\nTrain Epoch: 1 [1280/60000 (2%)]\\tloss=2.1704\\nTrain Epoch: 1 [1920/60000 (3%)]\\tloss=2.0766\\nTrain Epoch: 1 [2560/60000 (4%)]\\tloss=1.8679\\nTrain Epoch: 1 [3200/60000 (5%)]\\tloss=1.4135\\nTrain Epoch: 1 [3840/60000 (6%)]\\tloss=1.0003\\nTrain Epoch: 1 [4480/60000 (7%)]\\tloss=0.7762\\nTrain Epoch: 1 [5120/60000 (9%)]\\tloss=0.4598\\nTrain Epoch: 1 [5760/60000 (10%)]\\tloss=0.4860\\nTrain Epoch: 1 [6400/60000 (11%)]\\tloss=0.4389\\nTrain Epoch: 1 [7040/60000 (12%)]\\tloss=0.4084\\nTrain Epoch: 1 [7680/60000 (13%)]\\tloss=0.4602\\nTrain Epoch: 1 [8320/60000 (14%)]\\tloss=0.4289\\nTrain Epoch: 1 [8960/60000 (15%)]\\tloss=0.3990\\nTrain Epoch: 1 [9600/60000 (16%)]\\tloss=0.3852\\n&quot; } ] }, &quot;attributes&quot;: {} }  "},{"title":"Experiment Template REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/api/experiment-template","content":"","keywords":""},{"title":"Create experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#create-experiment-template","content":"POST /api/v1/template Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template  "},{"title":"List experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#list-experiment-template","content":"GET /api/v1/template Example Request: curl -X GET http://127.0.0.1:32080/api/v1/template  "},{"title":"Get experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#get-experiment-template","content":"GET /api/v1/template/{name} Example Request: curl -X GET http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  "},{"title":"Patch template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#patch-template","content":"PATCH /api/v1/template/{name} curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author-new&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  &quot;description&quot;, &quot;parameters&quot;, &quot;experimentSpec&quot;, &quot;author&quot; etc can be updated using this API. &quot;name&quot; of experiment template is not supported. "},{"title":"Delete template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#delete-template","content":"GET /api/v1/template/{name} Example Request: curl -X DELETE http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  "},{"title":"Use template to create a experiment​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/api/experiment-template#use-template-to-create-a-experiment","content":"POST /api/v1/experiment/{template_name} Example Request: curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;tf-mnist&quot;, &quot;params&quot;: { &quot;learning_rate&quot;:&quot;0.01&quot;, &quot;batch_size&quot;:&quot;150&quot;, &quot;experiment_name&quot;:&quot;newexperiment1&quot; } } ' http://127.0.0.1:32080/api/v1/experiment/my-tf-mnist-template  "},{"title":"Apache Submarine Community","type":0,"sectionRef":"#","url":"docs/0.6.0/community/","content":"","keywords":""},{"title":"Communicating​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#communicating","content":"You can reach out to the community members via any one of the following ways: Slack Developer: https://the-asf.slack.com/submarine-dev/ Slack User: https://the-asf.slack.com/submarine-user/ Zoom: https://cloudera.zoom.us/j/97264903288 Sync Up: https://docs.google.com/document/d/16pUO3TP4SxSeLduG817GhVAjtiph9HYpRHo_JgduDvw/edit "},{"title":"Your First Contribution​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#your-first-contribution","content":"You can start by finding an existing issue with the https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE?filter=allopenissues label. These issues are well suited for new contributors. If a PR (Pull Request) submitted to the Submarine Github projects by you is approved and merged, then you become a Submarine Contributor. If you want to work on a new idea of relatively small scope: Submit an issue describing your proposed change to the repo in question. The repo owners will respond to your issue promptly. Submit a pull request of Submarine containing a tested change. Contributions are welcomed and greatly appreciated. See CONTRIBUTING for details on submitting patches and the contribution workflow. "},{"title":"How Do I Become a Committer?​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#how-do-i-become-a-committer","content":"First of all, you need to get involved and be a Contributor. Based on your track-record as a contributor, Per Apache code, PMCs vote on committership, may invite you to be a committer (after we've called a vote). When that happens, if you accept, the following process kicks into place... Note that becoming a committer is not just about submitting some patches; it‘s also about helping out on the development and user Slack User, helping with documentation and the issues. "},{"title":"How to commit​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#how-to-commit","content":"See How to commit for helper doc for Submarine committers. "},{"title":"Communication​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#communication","content":"Communication within the Submarine community abides by Apache’s Code of Conduct. "},{"title":"Mailing lists​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#mailing-lists","content":"Get help using Apache Submarine or contribute to the project on our mailing lists: Users : subscribe, unsubscribe, archivesfor usage questions, help, and announcements.Dev : subscribe, unsubscribe, archivesfor people wanting to contribute to the project.Commits : subscribe, unsubscribe, archivesfor commit messages and patches. "},{"title":"License​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/0.6.0/community/#license","content":"Submarine source code is under the Apache 2.0 license. See the LICENSE file for details. "},{"title":"How To Contribute to Submarine","type":0,"sectionRef":"#","url":"docs/0.6.0/community/contributing","content":"","keywords":""},{"title":"Preface​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#preface","content":"Apache Submarine is an Apache 2.0 License Software. Contributing to Submarine means you agree to the Apache 2.0 License. Please read Code of Conduct carefully.The document How It Works can help you understand Apache Software Foundation further. "},{"title":"Build Submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#build-submarine","content":"Build From Code "},{"title":"Creating patches​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#creating-patches","content":"Submarine follows Fork &amp; Pull model. "},{"title":"Step1: Fork apache/submarine github repository (first time)​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step1-fork-apachesubmarine-github-repository-first-time","content":"Visit https://github.com/apache/submarineClick the Fork button to create a fork of the repository "},{"title":"Step2: Clone the Submarine to your local machine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step2-clone-the-submarine-to-your-local-machine","content":"# USERNAME – your Github user account name. git clone git@github.com:${USERNAME}/submarine.git # or: git clone https://github.com/${USERNAME}/submarine.git cd submarine # set upstream git remote add upstream git@github.com:apache/submarine.git # or: git remote add upstream https://github.com/apache/submarine.git # Don't push to the upstream master. git remote set-url --push upstream no_push # Check upstream/origin: # origin git@github.com:${USERNAME}/submarine.git (fetch) # origin git@github.com:${USERNAME}/submarine.git (push) # upstream git@github.com:apache/submarine.git (fetch) # upstream no_push (push) git remote -v  "},{"title":"Step3: Create a new Jira in Submarine project​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step3-create-a-new-jira-in-submarine-project","content":"New contributors need privilege to create JIRA issues. Please email kaihsun@apache.org with your Jira username. In addition, the email title should be &quot;[New Submarine Contributor]&quot;.Check Jira issue tracker for existing issues.Create a new Jira issue in Submarine project. When the issue is created, a Jira number (eg. SUBMARINE-748) will be assigned to the issue automatically. "},{"title":"Step4: Create a local branch for your contribution​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step4-create-a-local-branch-for-your-contribution","content":"cd submarine # Make your local master up-to-date git checkout master git fetch upstream git rebase upstream/master # Create a new branch fro issue SUBMARINE-${jira_number} git checkout -b SUBMARINE-${jira_number} # Example: git checkout -b SUBMARINE-748  "},{"title":"Step5: Develop & Create commits​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step5-develop--create-commits","content":"You can edit the code on the SUBMARINE-${jira_number} branch. (Coding Style: Code Convention)Create commits git add ${edited files} git commit -m &quot;SUBMARINE-${jira_number}. ${Commit Message}&quot; # Example: git commit -m &quot;SUBMARINE-748. Update Contributing guide&quot;  "},{"title":"Step6: Syncing your local branch with upstream/master​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step6-syncing-your-local-branch-with-upstreammaster","content":"# On SUBMARINE-${jira_number} branch git fetch upstream git rebase upstream/master  Please do not use git pull to synchronize your local branch. Because git pull does a merge to create merged commits, these will make commit history messy. "},{"title":"Step7: Push your local branch to your personal fork​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step7-push-your-local-branch-to-your-personal-fork","content":"git push origin SUBMARINE-${jira_number}  "},{"title":"Step8: Check GitHub Actions status of your personal commit​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step8-check-github-actions-status-of-your-personal-commit","content":"Visit https://github.com/${USERNAME}/submarine/actionsPlease make sure your new commits can pass all workflows before creating a pull request.  "},{"title":"Step9: Create a pull request on github UI​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step9-create-a-pull-request-on-github-ui","content":"Visit your fork at https://github.com/${USERNAME}/submarine.gitClick Compare &amp; Pull Request button to create pull request. Pull Request template​ Pull request templateFilling the template thoroughly can improve the speed of the review process. Example:   "},{"title":"Step10: Check GitHub Actions status of your pull request in apache/submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step10-check-github-actions-status-of-your-pull-request-in-apachesubmarine","content":"Visit https://github.com/apache/submarine/actionsPlease make sure your pull request can pass all workflows.  "},{"title":"Step11: The Review Process​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step11-the-review-process","content":"Anyone can be a reviewer and comment on the pull requests.Reviewer can indicate that a patch looks suitable for merging with a comment such as: &quot;Looks good&quot;, &quot;LGTM&quot;, &quot;+1&quot;. (PS: LGTM = Looks Good To Me)At least one indication of suitability (e.g. &quot;LGTM&quot;) from a committer is required to be merged. A committer can then initiate lazy consensus (&quot;Merge if there is no more discussion&quot;) after which the code can be merged after a particular time (usually 24 hours) if there are no more reviews.Contributors can ping reviewers (including committers) by commenting 'Ready to review'. "},{"title":"Step12: Address review comments​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#step12-address-review-comments","content":"Push new commits to SUBMARINE-${jira_number} branch. The pull request will update automatically.After you address all review comments, committers will merge the pull request. "},{"title":"Code convention​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/0.6.0/community/contributing#code-convention","content":"We are following Google Code style: Java styleShell style There are some plugins to format, lint your code in IDE (use dev-support/maven-config/checkstyle.xml as rules) Checkstyle plugin for Intellij (Setting Guide)Checkstyle plugin for Eclipse (Setting Guide) "},{"title":"Guide for Apache Submarine Committers","type":0,"sectionRef":"#","url":"docs/0.6.0/community/HowToCommit","content":"","keywords":""},{"title":"New committers​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/0.6.0/community/HowToCommit#new-committers","content":"New committers are encouraged to first read Apache's generic committer documentation: Apache New Committer GuideApache Committer FAQ The first act of a new core committer is typically to add their name to the credits page. This requires changing the site source inhttps://github.com/apache/submarine-site/blob/master/community/member.md. Once done, update the Submarine website as describedhere(TLDR; don't forget to regenerate the site with hugo, and commit the generated results, too). "},{"title":"Review​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/0.6.0/community/HowToCommit#review","content":"Submarine committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they've not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected. The list of submitted patches can be found in the GitHubPull Requests page. Committers should scan the list from top-to-bottom, looking for patches that they feel qualified to review and possibly commit. For non-trivial changes, it is best to get another committer to review &amp; approve your own patches before commit. "},{"title":"Reject​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/0.6.0/community/HowToCommit#reject","content":"Patches should be rejected which do not adhere to the guidelines inContribution Guidelines. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review. "},{"title":"Commit individual patches​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/0.6.0/community/HowToCommit#commit-individual-patches","content":"Submarine uses git for source code version control. The writable repo is at -https://gitbox.apache.org/repos/asf/submarine.git It is strongly recommended to use the cicd script to merge the PRs. See the instructions athttps://github.com/apache/submarine/tree/master/dev-support/cicd "},{"title":"Adding Contributors role​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/0.6.0/community/HowToCommit#adding-contributors-role","content":"There are three roles (Administrators, Committers, Contributors) in the project. Contributors who have Contributors role can become assignee of the issues in the project.Committers who have Committers role can set arbitrary roles in addition to Contributors role.Committers who have Administrators role can edit or delete all comments, or even delete issues in addition to Committers role. How to set roles Login to ASF JIRAGo to the project page (e.g. https://issues.apache.org/jira/browse/SUBMARINE )Hit &quot;Administration&quot; tabHit &quot;Roles&quot; tab in left sideAdd Administrators/Committers/Contributors role "},{"title":"Notebook REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/api/notebook","content":"","keywords":""},{"title":"Create a notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/api/notebook#create-a-notebook-instance","content":"POST /api/v1/notebook Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;test-nb&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;ownerId&quot;: &quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;: { &quot;name&quot;: &quot;notebook-env&quot; }, &quot;spec&quot;: { &quot;envVars&quot;: { &quot;TEST_ENV&quot;: &quot;test&quot; }, &quot;resources&quot;: &quot;cpu=1,memory=1.0Gi&quot; } } ' http://127.0.0.1:32080/api/v1/notebook  Example Response: { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1597931805405_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;5a94c01d-6a92-4222-bc66-c610c277546d&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/&quot;, &quot;status&quot;:&quot;creating&quot;, &quot;reason&quot;:&quot;The notebook instance is creating&quot;, &quot;createdTime&quot;:&quot;2020-08-20T21:58:27.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.5.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;: &quot;team_default_python_3.7&quot;, &quot;channels&quot;: [ &quot;defaults&quot; ], &quot;dependencies&quot;: [ &quot;&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu=1,memory=1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"List notebook instances which belong to user​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/api/notebook#list-notebook-instances-which-belong-to-user","content":"GET /api/v1/notebook Example Request: curl -X GET http://127.0.0.1:32080/api/v1/notebook?id={user_id}  Example Response: { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;List all notebook instances&quot;, &quot;result&quot;:[ { &quot;notebookId&quot;:&quot;notebook_1597931805405_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;5a94c01d-6a92-4222-bc66-c610c277546d&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/&quot;, &quot;status&quot;: &quot;running&quot;, &quot;reason&quot;: &quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2020-08-20T21:58:27.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.5.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;: &quot;team_default_python_3.7&quot;, &quot;channels&quot;: [ &quot;defaults&quot; ], &quot;dependencies&quot;: [ &quot;&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu=1,memory=1.0Gi&quot; } } } ], &quot;attributes&quot;:{} }  "},{"title":"Get the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/api/notebook#get-the-notebook-instance","content":"GET /api/v1/notebook/{id} Example Request: curl -X GET http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response: { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Get the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1597931805405_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;5a94c01d-6a92-4222-bc66-c610c277546d&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/&quot;, &quot;status&quot;:&quot;running&quot;, &quot;reason&quot;:&quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2020-08-20T21:58:27.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.5.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;: &quot;team_default_python_3.7&quot;, &quot;channels&quot;: [ &quot;defaults&quot; ], &quot;dependencies&quot;: [ &quot;&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu=1,memory=1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Delete the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/api/notebook#delete-the-notebook-instance","content":"DELETE /api/v1/notebook/{id} Example Request: curl -X DELETE http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response: { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;success&quot;: true, &quot;message&quot;: &quot;Delete the notebook instance&quot;, &quot;result&quot;: { &quot;notebookId&quot;: &quot;notebook_1597931805405_0001&quot;, &quot;name&quot;: &quot;test-nb&quot;, &quot;uid&quot;: &quot;5a94c01d-6a92-4222-bc66-c610c277546d&quot;, &quot;url&quot;: &quot;/notebook/default/test-nb/&quot;, &quot;status&quot;: &quot;terminating&quot;, &quot;reason&quot;: &quot;The notebook instance is terminating&quot;, &quot;createdTime&quot;: &quot;2020-08-22T14:03:19.000+08:00&quot;, &quot;deletedTime&quot;: &quot;2020-08-22T14:46:28+0800&quot;, &quot;spec&quot;: { &quot;meta&quot;: { &quot;name&quot;: &quot;test-nb&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;: { &quot;name&quot;: &quot;notebook-env&quot;, &quot;dockerImage&quot;: &quot;apache/submarine:jupyter-notebook-0.5.0&quot;, &quot;kernelSpec&quot;: { &quot;name&quot;: &quot;team_default_python_3.7&quot;, &quot;channels&quot;: [ &quot;defaults&quot; ], &quot;dependencies&quot;: [ &quot;&quot; ] }, &quot;description&quot;: null, &quot;image&quot;: null }, &quot;spec&quot;: { &quot;envVars&quot;: { &quot;TEST_ENV&quot;: &quot;test&quot; }, &quot;resources&quot;: &quot;cpu=1,memory=1.0Gi&quot; } } }, &quot;attributes&quot;: {} }  "},{"title":"Environments Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/environments-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#overview","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker and/or VM-image (such as, VirtualBox/VMWare images, Amazon Machine Images - AMI, Or custom image of Azure VM) defines the base layer of the environment. Please note that VM-image is different from VM instance type, On top of that, users can define a set of libraries (such as Python/R) to install, we call it kernel. Example of Environment  +-------------------+ |+-----------------+| || Python=3.7 || || Tensorflow=2.0 || |+---Exp Dependency+| |+-----------------+| ||OS=Ubuntu16.04 || ||CUDA=10.2 || ||GPU_Driver=375.. || |+---Base Library--+| +-------------------+  As you can see, There're base libraries, such as what OS, CUDA version, GPU driver, etc. They can be achieved by specifying a VM-image / Docker image. On top of that, user can bring their dependencies, such as different version of Python, Tensorflow, Pandas, etc. How users use environment? Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use.  +-------------------+ |+-----------------+| +------------+ || Python=3.7 || |User1 | || Tensorflow=2.0 || +------------+ |+---Kernel -------+| +------------+ |+-----------------+|&lt;----+ |User2 | ||OS=Ubuntu16.04 || + +------------+ ||CUDA=10.2 || | +------------+ ||GPU_Driver=375.. || | |User3 | |+---Base Library--+| | +------------+ +-----Default-Env---+ | | | +-------------------+ | |+-----------------+| | || Python=3.3 || | || Tensorflow=2.0 || | |+---kernel--------+| | |+-----------------+| | ||OS=Ubuntu16.04 || | ||CUDA=10.3 ||&lt;----+ ||GPU_Driver=375.. || |+---Base Library--+| +-----My-Customized-+  There're two environments in the above graph, &quot;Default-Env&quot; and &quot;My-Customized&quot;, which can have different combinations of libraries for different experiments/notebooks. Users can choose different environments for different experiments as they want. Environments can be added/listed/deleted/selected through CLI/SDK/UI. Implementation "},{"title":"Environment API definition​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#environment-api-definition","content":"Let look at what object definition looks like to define an environment, API of environment looks like:  name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: &lt;object of kernel&gt; description: &quot;this is the most common env used by team ABC&quot;  vm-image is optional if we don't need to launch new VM (like running a training job in a cloud-remote machine). docker-image is requiredkernel could be optional if kernel is already included by vm-image or docker-image.name of the environment should be unique in the system, so user can reference it when create a new experiment/notebook. "},{"title":"VM-image and Docker-image​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#vm-image-and-docker-image","content":"Docker-image and VM image should be prepared by system admin / SREs, it is hard for Data-Scientists to write an error-proof Dockerfile, and push/manage Docker images. This is one of the reason we hide Docker-image inside &quot;environment&quot;, we will encourage users to customize their kernels if needed, but don't have to touch Dockerfile and build/push/manage new Docker images. As a project, we will document what's the best practice and example of Dockerfiles. Dockerfile should include proper ENTRYPOINT definition which pointed to our default script, so no matter it is notebook, or an experiment, we will setup kernel (see below) and other environment variables properly. "},{"title":"Kernel Implementation​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#kernel-implementation","content":"After investigating different alternatives (such as pipenv, venv, etc.), we decided to use Conda environment which nicely replaces Python virtual env, pip, and can also support other languages. More details can be found at: https://medium.com/@krishnaregmi/pipenv-vs-virtualenv-vs-conda-environment-3dde3f6869ed When once Conda, users can easily add, remove dependency of a Conda environment. User can also easily export environment to yaml file. The yaml file of Conda environment by using conda env export looks like: name: base channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0 - anaconda-project=0.8.4=py_0 - applaunchservices=0.2.1=py_0  Including Conda kernel, the environment object may look like: name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: name: team_default_python_3.7 channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0  When launch a new experiment / notebook session using the my_submarine_env, submarine server will use defined Docker image, and Conda kernel to launch of container. "},{"title":"Storage of Environment​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#storage-of-environment","content":"Environment of Submarine is just a simple text file, so it will be persisted in Submarine metastore, which is ideally a Database. Docker image is stored inside a regular Docker registry, which will be handled outside of the system. Conda dependencies are stored in Conda channel (where referenced packages are stored), which will be handled/setuped separately. (Popular conda channels are default and conda-forge) For more detailed discussion about storage-related implementations, please refer to storage-implementation. "},{"title":"How to implement to make user can easily use Submarine environments?​","type":1,"pageTitle":"Environments Implementation","url":"docs/0.6.0/designDocs/environments-implementation#how-to-implement-to-make-user-can-easily-use-submarine-environments","content":"We like simplicities, and we don't want to leak complexities of implementations to the users. To make it happen, we have to do some works to hide complexities. There're two primary uses of environments: experiments and notebook, for both of them, users should not do works like explictily call conda active $env_name to active environments. To make it happen, what we can do is to include following parts in Dockerfile FROM ubuntu:18.04 &lt;Include whatever base-libraries like CUDA, etc.&gt; &lt;Make sure conda (with our preferred version) is installed&gt; &lt;Make sure Jupyter (with our preferred version) is installed&gt; # This is just a sample of Dockerfile, users can do more customizations if needed ENTRYPOINT [&quot;/submarine-bootstrap.sh&quot;]  When Submarine Server (this is implementation detail of Submarine Server, user will not see it at all) launch an experiment, or notebook, it will invoke following docker run command (or any other equvilant like using K8s spec): docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... python train.py --batch_size 5 (and other parameters)  Similarily, to launch a notebook: docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... jupyter  The submarine-bootstrap.sh is part of Submarine repo, and will handle --kernel argument which will invoke conda active $kernel_name before anything else. (Like run the training job). "},{"title":"Architecture and Requirment","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/architecture-and-requirements","content":"","keywords":""},{"title":"Terminology​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#terminology","content":"Term\tDescriptionUser\tA single data-scientist/data-engineer. User has resource quota, credentials Team\tUser belongs to one or more teams, teams have ACLs for artifacts sharing such as notebook content, model, etc. Admin\tAlso called SRE, who manages user's quotas, credentials, team, and other components. "},{"title":"Background​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#background","content":"Everybody talks about machine learning today, and lots of companies are trying to leverage machine learning to push the business to the next level. Nowadays, as more and more developers, infrastructure software companies coming to this field, machine learning becomes more and more achievable. In the last decade, the software industry has built many open source tools for machine learning to solve the pain points: It was not easy to build machine learning algorithms manually, such as logistic regression, GBDT, and many other algorithms:Answer to that: Industries have open sourced many algorithm libraries, tools, and even pre-trained models so that data scientists can directly reuse these building blocks to hook up to their data without knowing intricate details inside these algorithms and models. It was not easy to achieve &quot;WYSIWYG, what you see is what you get&quot; from IDEs: not easy to get output, visualization, troubleshooting experiences at the same place.Answer to that: Notebooks concept was added to this picture, notebook brought the experiences of interactive coding, sharing, visualization, debugging under the same user interface. There're popular open-source notebooks like Apache Zeppelin/Jupyter. It was not easy to manage dependencies: ML applications can run on one machine is hard to deploy on another machine because it has lots of libraries dependencies.Answer to that: Containerization becomes popular and a standard to packaging dependencies to make it easier to &quot;build once, run anywhere&quot;. Fragmented tools, libraries were hard for ML engineers to learn. Experiences learned in one company are not naturally migratable to another company.Answer to that: A few dominant open-source frameworks reduced the overhead of learning too many different frameworks, concepts. Data-scientist can learn a few libraries such as Tensorflow/PyTorch, and a few high-level wrappers like Keras will be able to create your machine learning application from other open-source building blocks. Similarly, models built by one library (such as libsvm) were hard to be integrated into machine learning pipeline since there's no standard format.Answer to that: Industry has built successful open-source standard machine learning frameworks such as Tensorflow/PyTorch/Keras so their format can be easily shared across. And efforts to build an even more general model format such as ONNX. It was hard to build a data pipeline that flows/transform data from a raw data source to whatever required by ML applications.Answer to that: Open source big data industry plays an important role in providing, simplify, unify processes and building blocks for data flows, transformations, etc. The machine learning industry is moving on the right track to solve major roadblocks. So what are the pain points now for companies which have machine learning needs? What can we help here? To answer this question, let's look at machine learning workflow first. "},{"title":"Machine Learning Workflows & Pain points​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#machine-learning-workflows--pain-points","content":"1) From different data sources such as edge, clickstream, logs, etc. =&gt; Land to data lakes 2) From data lake, data transformation: =&gt; Data transformations: Cleanup, remove invalid rows/columns, select columns, sampling, split train/test data-set, join table, etc. =&gt; Data prepared for training. 3) From prepared data: =&gt; Training, model hyper-parameter tuning, cross-validation, etc. =&gt; Models saved to storage. 4) From saved models: =&gt; Model assurance, deployment, A/B testing, etc. =&gt; Model deployed for online serving or offline scoring.  Typically data scientists responsible for item 2)-4), 1) typically handled by a different team (called Data Engineering team in many companies, some Data Engineering team also responsible for part of data transformation) "},{"title":"Pain #1 Complex workflow/steps from raw data to model, different tools needed by different steps, hard to make changes to workflow, and not error-proof​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#pain-1-complex-workflowsteps-from-raw-data-to-model-different-tools-needed-by-different-steps-hard-to-make-changes-to-workflow-and-not-error-proof","content":"It is a complex workflow from raw data to usable models, after talking to many different data scientists, we have learned that a typical procedure to train a new model and push to production can take months to 1-2 years. It is also a wide skill set required by this workflow. For example, data transformation needs tools like Spark/Hive for large scale and tools like Pandas for a small scale. And model training needs to be switched between XGBoost, Tensorflow, Keras, PyTorch. Building a data pipeline requires Apache Airflow or Oozie. Yes, there are great, standardized open-source tools built for many of such purposes. But how about changes need to be made for a particular part of the data pipeline? How about adding a few columns to the training data for experiments? How about training models, and push models to validation, A/B testing before rolling to production? All these steps need jumping between different tools, UIs, and very hard to make changes, and it is not error-proof during these procedures. "},{"title":"Pain #2 Dependencies of underlying resource management platform​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#pain-2-dependencies-of-underlying-resource-management-platform","content":"To make jobs/services required by a machine learning platform to be able to run, we need an underlying resource management platform. There're some choices of resource management platform, and they have distinct advantages and disadvantages. For example, there're many machine learning platform built on top of K8s. It is relatively easy to get a K8s from a cloud vendor, easy to orchestrate machine learning required services/daemons run on K8s. However, K8s doesn't offer good support jobs like Spark/Flink/Hive. So if your company has Spark/Flink/Hive running on YARN, there're gaps and a significant amount of work to move required jobs from YARN to K8s. Maintaining a separate K8s cluster is also overhead to Hadoop-based data infrastructure. Similarly, if your company's data pipelines are mostly built on top of cloud resources and SaaS offerings, asking you to install a separate YARN cluster to run a new machine learning platform doesn't make a lot of sense. "},{"title":"Pain #3 Data scientist are forced to interact with lower-level platform components​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#pain-3-data-scientist-are-forced-to-interact-with-lower-level-platform-components","content":"In addition to the above pain, we do see Data Scientists are forced to learn underlying platform knowledge to be able to build a real-world machine learning workflow. For most of the data scientists we talked with, they're experts of ML algorithms/libraries, feature engineering, etc. They're also most familiar with Python, R, and some of them understand Spark, Hive, etc. If they're asked to do interactions with lower-level components like fine-tuning a Spark job's performance; or troubleshooting job failed to launch because of resource constraints; or write a K8s/YARN job spec and mount volumes, set networks properly. They will scratch their heads and typically cannot perform these operations efficiently. "},{"title":"Pain #4 Comply with data security/governance requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#pain-4-comply-with-data-securitygovernance-requirements","content":"TODO: Add more details. "},{"title":"Pain #5 No good way to reduce routine ML code development​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#pain-5-no-good-way-to-reduce-routine-ml-code-development","content":"After the data is prepared, the data scientist needs to do several routine tasks to build the ML pipeline. To get a sense of the existing the data set, it usually needs a split of the data set, the statistics of data set. These tasks have a common duplicate part of code, which reduces the efficiency of data scientists. An abstraction layer/framework to help the developer to boost ML pipeline development could be valuable. It's better than the developer only needs to fill callback function to focus on their key logic. Submarine "},{"title":"Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#overview","content":""},{"title":"A little bit history​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#a-little-bit-history","content":"Initially, Submarine is built to solve problems of running deep learning jobs like Tensorflow/PyTorch on Apache Hadoop YARN, allows admin to monitor launched deep learning jobs, and manage generated models. It was part of YARN initially, and code resides under hadoop-yarn-applications. Later, the community decided to convert it to be a subproject within Hadoop (Sibling project of YARN, HDFS, etc.) because we want to support other resource management platforms like K8s. And finally, we're reconsidering Submarine's charter, and the Hadoop community voted that it is the time to moved Submarine to a separate Apache TLP. "},{"title":"Why Submarine?​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#why-submarine","content":"ONE PLATFORM Submarine is the ONE PLATFORM to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM means it supports Data Scientists and data engineers to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training, and tuning, and push model to production. All these steps can be completed within the ONE PLATFORM. Resource Management Independent It is also designed to be resource management independent, no matter if you have Apache Hadoop YARN, K8s, or just a container service, you will be able to run Submarine on top it. "},{"title":"Requirements and non-requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#requirements-and-non-requirements","content":""},{"title":"Notebook​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#notebook","content":"1) Users should be able to create, edit, delete a notebook. (P0) 2) Notebooks can be persisted to storage and can be recovered if failure happens. (P0) 3) Users can trace back to history versions of a notebook. (P1) 4) Notebooks can be shared with different users. (P1) 5) Users can define a list of parameters of a notebook (looks like parameters of the notebook's main function) to allow executing a notebook like a job. (P1) 6) Different users can collaborate on the same notebook at the same time. (P2) A running notebook instance is called notebook session (or session for short). "},{"title":"Experiment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#experiment","content":"Experiments of Submarine is an offline task. It could be a shell command, a Python command, a Spark job, a SQL query, or even a workflow. The primary purposes of experiments under Submarine's context is to do training tasks, offline scoring, etc. However, experiment can be generalized to do other tasks as well. Major requirement of experiment: 1) Experiments can be submitted from UI/CLI/SDK. 2) Experiments can be monitored/managed from UI/CLI/SDK. 3) Experiments should not bind to one resource management platform (K8s/YARN). Type of experiments​  There're two types of experiments:Adhoc experiments: which includes a Python/R/notebook, or even an adhoc Tensorflow/PyTorch task, etc. Predefined experiment library: This is specialized experiments, which including developed libraries such as CTR, BERT, etc. Users are only required to specify a few parameters such as input, output, hyper parameters, etc. Instead of worrying about where's training script/dependencies located. Adhoc experiment​ Requirements: Allow run adhoc scripts.Allow model engineer, data scientist to run Tensorflow/Pytorch programs on YARN/K8s/Container-cloud. Allow jobs easy access data/models in HDFS/s3, etc. Support run distributed Tensorflow/Pytorch jobs with simple configs.Support run user-specified Docker images.Support specify GPU and other resources. Predefined experiment library​ Here's an example of predefined experiment library to train deepfm model: { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  Predefined experiment libraries can be shared across users on the same platform, users can also add new or modified predefined experiment library via UI/REST API. We will also model AutoML, auto hyper-parameter tuning to predefined experiment library. Pipeline​ Pipeline is a special kind of experiment: A pipeline is a DAG of experiments. Can be also treated as a special kind of experiment.Users can submit/terminate a pipeline.Pipeline can be created/submitted via UI/API. "},{"title":"Environment Profiles​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#environment-profiles","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker or VM image (such as AMI: Amazon Machine Images) defines the base layer of the environment. On top of that, users can define a set of libraries (such as Python/R) to install. Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use. Environments can be added/listed/deleted/selected through CLI/SDK. "},{"title":"Model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#model","content":"Model management​ Model artifacts are generated by experiments or notebook.A model consists of artifacts from one or multiple files. Users can choose to save, tag, version a produced model.Once The Model is saved, Users can do the online model serving or offline scoring of the model. Model serving​ After model saved, users can specify a serving script, a model and create a web service to serve the model. We call the web service to &quot;endpoint&quot;. Users can manage (add/stop) model serving endpoints via CLI/API/UI. "},{"title":"Metrics for training job and model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#metrics-for-training-job-and-model","content":"Submarine-SDK provides tracking/metrics APIs, which allows developers to add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Deployment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#deployment","content":"Submarine Services (See architecture overview below) should be deployed easily on-prem / on-cloud. Since there're more and more public cloud offering for compute/storage management on cloud, we need to support deploy Submarine compute-related workloads (such as notebook session, experiments, etc.) to cloud-managed clusters. This also include Submarine may need to take input parameters from customers and create/manage clusters if needed. It is also a common requirement to use hybrid of on-prem/on-cloud clusters. "},{"title":"Security / Access Control / User Management / Quota Management​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#security--access-control--user-management--quota-management","content":"There're 4 kinds of objects need access-control: Assets belong to Submarine system, which includes notebook, experiments and results, models, predefined experiment libraries, environment profiles.Data security. (Who owns what data, and what data can be accessed by each users). User credentials. (Such as LDAP).Other security, such as Git repo access, etc. For the data security / user credentials / other security, it will be delegated to 3rd libraries such as Apache Ranger, IAM roles, etc. Assets belong to Submarine system will be handled by Submarine itself. Here're operations which Submarine admin can do for users / teams which can be used to access Submarine's assets. Operations for admins Admin uses &quot;User Management System&quot; to onboard new users, upload user credentials, assign resource quotas, etc. Admins can create new users, new teams, update user/team mappings. Or remove users/teams. Admin can set resource quotas (if different from system default), permissions, upload/update necessary credentials (like Kerberos keytab) of a user.A DE/DS can also be an admin if the DE/DS has admin access. (Like a privileged user). This will be useful when a cluster is exclusively shared by a user or only shared by a small team.Resource Quota Management System helps admin to manage resources quotas of teams, organizations. Resources can be machine resources like CPU/Memory/Disk, etc. It can also include non-machine resources like $$-based budgets. "},{"title":"Dataset​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#dataset","content":"There's also need to tag dataset which will be used for training and shared across the platform by different users. Like mentioned above, access to the actual data will be handled by 3rd party system like Apache Ranger / Hive Metastore which is out of the Submarine's scope. "},{"title":"Architecture Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#architecture-overview","content":""},{"title":"Architecture Diagram​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/0.6.0/designDocs/architecture-and-requirements#architecture-diagram","content":" +-----------------------------------------------------------------+ | Submarine UI / CLI / REST API / SDK | | Mini-Submarine | +-----------------------------------------------------------------+ +--------------------Submarine Server-----------------------------+ | +---------+ +---------+ +----------+ +----------+ +------------+| | |Data set | |Notebooks| |Experiment| |Models | |Servings || | +---------+ +---------+ +----------+ +----------+ +------------+| |-----------------------------------------------------------------| | | | +-----------------+ +-----------------+ +---------------------+ | | |Experiment | |Compute Resource | |Other Management | | | |Manager | | Manager | |Services | | | +-----------------+ +-----------------+ +---------------------+ | | Spark, template YARN/K8s/Docker | | TF, PyTorch, pipeline | | | + +-----------------+ + | |Submarine Meta | | | | Store | | | +-----------------+ | | | +-----------------------------------------------------------------+ (You can use http://stable.ascii-flow.appspot.com/#Draw to draw such diagrams)  Compute Resource Manager Helps to manage compute resources on-prem/on-cloud, this module can also handle cluster creation / management, etc. Experiment Manager Work with &quot;Compute Resource Manager&quot; to submit different kinds of workloads such as (distributed) Tensorflow / Pytorch, etc. Submarine SDK provides Java/Python/REST API to allow DS or other engineers to integrate into Submarine services. It also includes a mini-submarine component that launches Submarine components from a single Docker container (or a VM image). Details of Submarine Server design can be found at submarine-server-design. "},{"title":"Implementation Notes","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/implementation-notes","content":"Implementation Notes Before digging into details of implementations, you should read architecture-and-requirements first to understand overall requirements and architecture. Here're sub topics of Submarine implementations: Submarine Storage: How to store metadata, logs, metrics, etc. of Submarine.Submarine Environment: How environments created, managed, stored in Submarine. Submarine Experiment: How experiments managed, stored, and how the predefined experiment template works.Submarine Notebook: How experiments managed, stored, and how the predefined experiment template works.Submarine Server: How Submarine server is designed, architecture, implementation notes, etc. Working-in-progress designs, Below are designs which are working-in-progress, we will move them to the upper section once design &amp; review is finished: Submarine HA Design: How Submarine HA can be achieved, using RAFT, etc.Submarine services deployment module: How to deploy submarine services to k8s, YARN or cloud.","keywords":""},{"title":"Notebook Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/notebook-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#overview","content":""},{"title":"User's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#users-interaction","content":"Users can start N (N &gt;= 0) number of Notebook sessions, a notebook session is a running notebook instance. Notebook session can be launched by Submarine UI (P0), and Submarine CLI (P2). When launch notebook session, users can choose T-shirt size of notebook session (how much mem/cpu/gpu resources, or resource profile such as small, medium, large, etc.). (P0)And user can choose an environment for notebook. More details please refer to environment implementation (P0)When start a notebook, user can choose what code to be initialized, similar to experiment. (P1)Optionally, users can choose to attach a persistent volume to a notebook session. (P2) Users can get a list of notebook sessions belongs to themselves, and connect to notebook session. User can choose to terminate a running notebook session. "},{"title":"Admin's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#admins-interaction","content":"How many concurrent notebook sessions can be launched by each user is determined by resource quota limits of each user, and maximum concurrent notebook sessions can be launched by each user. (P2) "},{"title":"Relationship with other components​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#relationship-with-other-components","content":""},{"title":"Metadata store​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#metadata-store","content":"Running notebook sessions' metadata need persistented in Submarine's metadata store (Database). "},{"title":"Submarine Server​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#submarine-server","content":" +--------------+ +--------Submarine Server--------------------+ |Submarine UI | | +-------------------+ | | |+---&gt; Submarine | | | Notebook | | | Notebook REST API| | +--------------+ | | | | | +--------+----------+ +--------------+ | | | +-&gt;|Metastore | | | +--------v----------+ | |DB | | | | Submarine +--+ +--------------+ | | | Notebook Mgr | | | | | | | | | | | +--------+----------+ | | | | +----------|---------------------------------+ | +--------------+ +--------v---------+ | Notebook Session | | | | instance | | | +------------------+  Once user use Submarine UI to launch a notebook session, Submarine notebook manager inside Submarine Server will persistent notebook session's metadata, and launch a new notebook session instance. "},{"title":"Resource manager​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#resource-manager","content":"When using K8s as resource manager, Submarine notebook session will run as a new POD. "},{"title":"Storage​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#storage","content":"There're several different types of storage requirements for Submarine notebook. For code, environment, etc, storage, please refer to storage implementation, check &quot;Localization of experiment/notebook/model-serving code&quot;. When there're needs to attach volume (such as user's home folder) to Submarine notebook session, please check storage implementation, check &quot;Attachable volume&quot;. "},{"title":"Environment​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#environment","content":"Submarine notebook's environment should be used to run experiment, model serving, etc. Please check environment implementation. (More specific to notebook, please check &quot;How to implement to make user can easily use Submarine environments&quot;) Please note that notebook's Environment should include right version of notebook libraries, and admin should follow the guidance to build correct Docker image, Conda libraries to correctly run Notebook. "},{"title":"Submarine SDK (For Experiment, etc.)​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#submarine-sdk-for-experiment-etc","content":"Users can run new experiment, access metrics information, or do model operations using Submarine SDK. Submarine SDK is a Python library which can talk to Submarine Server which need Submarine Server's endpoint as well as user credentials. To ensure better experience, we recommend always install proper version of Submarine SDK from environment which users can use Submarine SDK directly from commandline. (We as Submarine community can provide sample Dockerfile or Conda environment which have correct base libraries installed for Submarine SDK). Submarine Server IP will be configured automatically by Submarine Server, and added as an envar when Submarine notebook session got launched. "},{"title":"Security​","type":1,"pageTitle":"Notebook Implementation","url":"docs/0.6.0/designDocs/notebook-implementation#security","content":"Please refer to Security Implementation Once user accessed to a running notebook session, the user can also access resources of the notebook, capability of submit new experiment, and access data. This is also very dangerous so we have to protect it. A simple solution is to use token-based authentication https://jupyter-notebook.readthedocs.io/en/stable/security.html. A more common way is to use solutions like KNOX to support SSO. We need expand this section to more details. (TODO). "},{"title":"Experiment Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/experiment-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#overview","content":"This document talks about implementation of experiment, flows and design considerations. Experiment consists of following components, also interact with other Submarine or 3rd-party components, showing below:  +---------------------------------------+ +----------+ | Experiment Tasks | |Run | | | |Configs | | +----------------------------------+ | +----------+ | | Experiment Runnable Code | | +-----------------+ +----------+ | | | | |Output Artifacts | |Input Data| | | (Like train-job.py) | | |(Models, etc.) | | | | +----------------------------------+ | +-----------------+ | | | +----------------------------------+ | +----------+ | | Experiment Deps (Like Python) | | +-------------+ | +----------------------------------+ | |Logs/Metrics | | +----------------------------------+ | | | | | OS, Base Libaries (Like CUDA) | | +-------------+ | +----------------------------------+ | +---------------------------------------+ ^ | (Launch Task with resources) + +---------------------------------+ |Resource Manager (K8s/YARN/Cloud)| +---------------------------------+  As showing in the above diagram, Submarine experiment consists of the following items: On the left side, there're input data and run configs. In the middle box, they're experiment tasks, it could be multiple tasks when we run distributed training, pipeline, etc. There're main runnable code, such as train.py for the training main entry point. The two boxes below: experiment dependencies and OS/Base libraries we called Submarine Environment Profile or Environment for short. Which defined what is the basic libraries to run the main experiment code. Experiment tasks are launched by Resource Manager, such as K8s/YARN/Cloud or just launched locally. There're resources constraints for each experiment tasks. (e.g. how much memory, cores, GPU, disk etc. can be used by tasks). On the right side, they're artifacts generated by experiments: Output artifacts: Which are main output of the experiment, it could be model(s), or output data when we do batch prediction.Logs/Metrics for further troubleshooting or understanding of experiment's quality. For the rest of the design doc, we will talk about how we handle environment, code, and manage output/logs, etc. "},{"title":"API of Experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#api-of-experiment","content":"This is not a full definition of experiment, for more details, please reference to experiment API. Here's just an example of experiment object which help developer to understand what included in an experiment. experiment: name: &quot;abc&quot;, type: &quot;script&quot;, environment: &quot;team-default-ml-env&quot; code: sync_mode: s3 url: &quot;s3://bucket/training-job.tar.gz&quot; parameter: &gt; python training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; timeout: &quot;30 mins&quot;  This defined a &quot;script&quot; experiment, which has a name &quot;abc&quot;, the name can be used to track the experiment. There's environment &quot;team-default-ml-env&quot; defined to make sure dependencies of the job can be downloaded properly before executing the job. code defined where the experiment code will be downloaded, we will support a couple of sync_mode like s3 (or abfs/hdfs), git, etc. Different types of experiments will have different specs, for example distributed Tensorflow spec may look like: experiment: name: &quot;abc-distributed-tf&quot;, type: &quot;distributed-tf&quot;, ps: environment: &quot;team-default-ml-cpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=0&quot; worker: environment: &quot;team-default-ml-gpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot; parameter: &gt; python /code/training-job/training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output tensorboard: enabled timeout: &quot;30 mins&quot;  Since we have different Docker image, one is using GPU and one is not using GPU, we can specify different environment and resource constraint. "},{"title":"Manage environments for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#manage-environments-for-experiment","content":"Please refer to environment-implementation.md for more details "},{"title":"Manage storages for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#manage-storages-for-experiment","content":"There're different types of storage, such as logs, metrics, dependencies (environments). For more details. Please refer to storage-implementations for more details. This also includes how to manage code for experiment code. "},{"title":"Manage Pre-defined experiment libraries​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#manage-pre-defined-experiment-libraries","content":""},{"title":"Flow: Submit an experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#flow-submit-an-experiment","content":""},{"title":"Submit via SDK Flows.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#submit-via-sdk-flows","content":"To better understand experiment implementation, It will be good to understand what is the steps of experiment submission. Please note that below code is just pseudo code, not official APIs. "},{"title":"Specify what environment to use​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#specify-what-environment-to-use","content":"Before submit the environment, you have to choose what environment to choose. Environment defines dependencies, etc. of an experiment or a notebook. might looks like below: conda_environment = &quot;&quot;&quot; name: conda-env channels: - defaults dependencies: - asn1crypto=1.3.0=py37_0 - blas=1.0=mkl - ca-certificates=2020.1.1=0 - certifi=2020.4.5.1=py37_0 - cffi=1.14.0=py37hb5b8e2f_0 - chardet=3.0.4=py37_1003 prefix: /opt/anaconda3/envs/conda-env &quot;&quot;&quot; # This environment can be different from notebook's own environment environment = create_environment { DockerImage = &quot;ubuntu:16&quot;, CondaEnvironment = conda_environment }  To better understand how environment works, please refer to environment-implementation. "},{"title":"Create experiment, specify where's training code located, and parameters.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#create-experiment-specify-wheres-training-code-located-and-parameters","content":"For ad-hoc experiment (code located at S3), assume training code is part of the training-job.tar.gz and main class is train.py. When the job is launched, whatever specified in the localize_artifacts will be downloaded. experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/training-job.tar.gz&quot; ], name = &quot;abc&quot;, parameter = &quot;python training.py --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run notebook file in offline mode​ It is possible we want to run a notebook file in offline mode, to do that, here's code to use to run a notebook code experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/folder/notebook-123.ipynb&quot; ], name = &quot;abc&quot;, parameter = &quot;runipy training.ipynb --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run pre-defined experiment library​ experiment = create_experiment { # Here you can use default environment of library Environment = environment, ExperimentConfig = { type = &quot;template&quot;, name = &quot;abc&quot;, # A unique name of template template = &quot;deepfm_ctr&quot;, # yaml file defined what is the parameters need to be specified. parameter = { Input: &quot;S3://.../input&quot;, Output: &quot;S3://.../output&quot; Training: { &quot;batch_size&quot;: 512, &quot;l2_reg&quot;: 0.01, ... } } } } experiment.run() experiment.wait_for_finish(print_output=True)  "},{"title":"Summarize: Experiment v.s. Notebook session​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#summarize-experiment-vs-notebook-session","content":"There's a common misunderstanding about what is the differences between running experiment v.s. running task from a notebook session. We will talk about differences and commonalities: Differences \tExperiment\tNotebook SessionRun mode\tOffline\tInteractive Output Artifacts (a.k.a model)\tPersisted in a shared storage (like S3/NFS)\tLocal in the notebook session container, could be ephemeral Run history (meta, logs, metrics)\tMeta/logs/metrics can be traced from experiment UI (or corresponding API)\tNo run history can be traced from Submarine UI/API. Can view the current running paragraph's log/metrics, etc. What to run?\tCode from Docker image or shared storage (like Tarball on S3, Github, etc.)\tLocal in the notebook's paragraph Commonalities \tExperiment &amp; Notebook SessionEnvironment\tThey can share the same Environment configuration "},{"title":"Experiment-related modules inside Submarine-server​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#experiment-related-modules-inside-submarine-server","content":"(Please refer to architecture of submarine server for more details) "},{"title":"Experiment Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#experiment-manager","content":"The experiment manager receives the experiment requests, persisting the experiment metas in a database(e.g. MySQL), will invoke subsequence modules to submit and monitor the experiment's execution. "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#compute-cluster-manager","content":"After experiment accepted by experiment manager, based on which cluster the experiment intended to run (like mentioned in the previous sections, Submarine supports to manage multiple compute clusters), compute cluster manager will returns credentials to access the compute cluster. It will also be responsible to create a new compute cluster if needed. For most of the on-prem use cases, there's only one cluster involved, for such cases, ComputeClusterManager returns credentials to access local cluster if needed. "},{"title":"Experiment Submitter​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#experiment-submitter","content":"Experiment Submitter handles different kinds of experiments to run (e.g. ad-hoc script, distributed TF, MPI, pre-defined templates, Pipeline, AutoML, etc.). And such experiments can be managed by different resource management systems (e.g. K8s, YARN, container cloud, etc.) To meet the requirements to support variant kinds of experiments and resource managers, we choose to use plug-in modules to support different submitters (which requires jars to submarine-server’s classpath). To avoid jars and dependencies of plugins break the submarine-server, the plug-ins manager, or both. To solve this issue, we can instantiate submitter plug-ins using a classloader that is different from the system classloader. Submitter Plug-ins​ Each plug-in uses a separate module under the server-submitter module. As the default implements, we provide for YARN and K8s. For YARN cluster, we provide the submitter-yarn and submitter-yarnservice plug-ins. The submitter-yarn plug-in used the TonY as the runtime to run the training job, and the submitter-yarnservice plug-in direct use the YARN Service which supports Hadoop v3.1 above. The submitter-k8s plug-in is used to submit the job to Kubernetes cluster and use the operator as the runtime. The submitter-k8s plug-in implements the operation of CRD object and provides the java interface. In the beginning, we use the tf-operator for the TensorFlow. If Submarine want to support the other resource management system in the future, such as submarine-docker-cluster (submarine uses the Raft algorithm to create a docker cluster on the docker runtime environment on multiple servers, providing the most lightweight resource scheduling system for small-scale users). We should create a new plug-in module named submitter-docker under the server-submitter module. "},{"title":"Experiment Monitor​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#experiment-monitor","content":"The monitor tracks the experiment life cycle and records the main events and key info in runtime. As the experiment run progresses, the metrics are needed for evaluation of the ongoing success or failure of the execution progress. Due to adapt the different cluster resource management system, so we need a generic metric info structure and each submitter plug-in should inherit and complete it by itself. "},{"title":"Invoke flows of experiment-related components​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#invoke-flows-of-experiment-related-components","content":" +-----------------+ +----------------+ +----------------+ +-----------------+ |Experiments | |Compute Cluster | |Experiment | | Experiment | |Mgr | |Mgr | |Submitter | | Monitor | +-----------------+ +----------------+ +----------------+ +-----------------+ + + + + User | | | | Submit |+-------------------------------------&gt;+ + Xperiment| Use submitter.validate(spec) | | | to validate spec and create | | | experiment object (state- | | | machine). | | | | | | The experiment manager will | | | persist meta-data to Database| | | | | | | | + + |+-----------------&gt; + | | | Submit Experiments| | | | To ComputeCluster| | | | Mgr, get existing|+----------------&gt;| | | cluster, or | Use Submitter | | | create a new one.| to submit |+---------------&gt; | | | Different kinds | Once job is | | | of experiments | submitted, use |+----+ | | to k8s/yarn, etc| monitor to get | | | | | status updates | | | | | | | Monitor | | | | | Xperiment | | | | | status | | | | | |&lt;--------------------------------------------------------+| | | | | | | | Update Status back to Experiment | | | | Manager | |&lt;----+ | | | | | | | | | | | | v v v v  TODO: add more details about template, environment, etc. "},{"title":"Common modules of experiment/notebook-session/model-serving​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#common-modules-of-experimentnotebook-sessionmodel-serving","content":"Experiment/notebook-session/model-serving share a lot of commonalities, all of them are: Some workloads running on YARN/K8s.Need persist meta data to DB. Need monitor task/service running status from resource management system.  We need to make their implementation are loose-coupled, but at the same time, share some building blocks as much as possible (e.g. submit PodSpecs to K8s, monitor status, get logs, etc.) to reduce duplications. "},{"title":"Support Predefined-experiment-templates​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#support-predefined-experiment-templates","content":"Predefined Experiment Template is just a way to save data-scientists time to repeatedly entering parameters which is not error-proof and user experience is also bad. "},{"title":"Predefined-experiment-template API to run experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#predefined-experiment-template-api-to-run-experiment","content":"Predefined experiment template consists a list of parameters, each of the parameter has 4 properties: Key\tRequired\tDefault Value\tDescriptionName of the key\ttrue/false\tWhen required = false, a default value can be provided by the template\tDescription of the parameter For the example of deepfm CTR training experiment mentioned in the architecture-and-requirements.md { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  The template will be (in yaml format): # deepfm.ctr template name: deepfm.ctr author: description: &gt; This is a template to run CTR training using deepfm algorithm, by default it runs single node TF job, you can also overwrite training parameters to use distributed training. parameters: - name: input.train_data required: true description: &gt; train data is expected in SVM format, and can be stored in HDFS/S3 ... - name: training.batch_size required: false default: 32 description: This is batch size of training  The batch format can be used in UI/API. "},{"title":"Handle Predefined-experiment-template from server side​","type":1,"pageTitle":"Experiment Implementation","url":"docs/0.6.0/designDocs/experiment-implementation#handle-predefined-experiment-template-from-server-side","content":"Please note that, the conversion of predefined-experiment-template will be always handled by server. The invoke flow looks like:  +------------Submarine Server -----------------------+ +--------------+ | +-----------------+ | |Client |+-------&gt;|Experimment Mgr | | | | | | | | +--------------+ | +-----------------+ | | + | Submit | +-------v---------+ Get Experiment Template | Template | |Experiment |&lt;-----+From pre-registered | Parameters | |Template Registry| Templates | to Submarine | +-------+---------+ | Server | | | | +-------v---------+ +-----------------+ | | |Deepfm CTR Templ-| |Experiment- | | | |ate Handler +------&gt;|Tensorflow | | | +-----------------+ +--------+--------+ | | | | | | | | +--------v--------+ | | |Experiment | | | |Submitter | | | +--------+--------+ | | | | | | | | +--------v--------+ | | | | | | | ...... | | | +-----------------+ | | | +----------------------------------------------------+  Basically, from Client, it submitted template parameters to Submarine Server, inside submarine server, it finds the corresponding template handler based on the name. And the template handler converts input parameters to an actual experiment, such as a distributed TF experiment. After that, it goes the similar route to validate experiment spec, compute cluster manager, etc. to get the experiment submitted and monitored. Predefined-experiment-template is able to create any kind of experiment, it could be a pipeline:  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | | | | v +--------------------+ +------------------+ | +-----------------+| | Predefined | | | Split Train/ ||&lt;----+| Pipeline | | | Test data || +------------------+ | +-------+---------+| | | | | +-------v---------+| | | Spark Job ETL || | | || | +-------+---------+| | | | | +-------v---------+| | | Train using || | | XGBoost || | +-------+---------+| | | | | +-------v---------+| | | Validate Train || | | Results || | +-----------------+| | | +--------------------+  Template can be also chained to reuse other template handlers  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | v +------------------+ +------------------+ |Distributed | | ABC Template | |TF Experiment |&lt;----+| Handler | +------------------+ +------------------+  Template Handler is a callable class inside Submarine Server with a standard interface defined like. interface ExperimentTemplateHandler { ExperimentSpec createExperiment(TemplatedExperimentParameters param) }  We should avoid users to do coding when they want to add new template, we should have several standard template handler to deal with most of the template handling. Experiment templates can be registered/updated/deleted via Submarine Server's REST API, which need to be discussed separately in the doc. (TODO) "},{"title":"Storage Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/storage-implementation","content":"","keywords":""},{"title":"ML-related objects and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#ml-related-objects-and-their-storages","content":"First let's look at what user will interact for most of the time: Notebook ExperimentModel Servings  +---------+ +------------+ |Logs |&lt;--+|Notebook | +----------+ +---------+ +------------+ +----------------+ |Trackings | &lt;-+|Experiment |&lt;--+&gt;|Model Artifacts | +----------+ +-----------------+ +------------+ +----------------+ +----------+&lt;---+|ML-related Metric|&lt;--+Servings | |tf.events | +-----------------+ +------------+ +----------+ ^ +-----------------+ + | Environments | +----------------------+ | | +-----------------+ | Submarine Metastore | | Dependencies | |Code | +----------------------+ | | +-----------------+ |Experiment Meta | | Docker Images | +----------------------+ +-----------------+ |Model Store Meta | +----------------------+ |Model Serving Meta | +----------------------+ |Notebook meta | +----------------------+ |Experiment Templates | +----------------------+ |Environments Meta | +----------------------+  First of all, all the notebook-sessions / experiments / model-serving instances) are more or less interact with following storage objects: Logs for these tasks for troubleshooting. ML-related metrics such as loss, epoch, etc. (in contrast of system metrics such as CPU/memory usage, etc.) There're different types of ML-related metrics, for Tensorflow/pytorch, they can use tf.events and get visualizations on tensorboard. Or they can use tracking APIs (such as Submarine tracking, mlflow tracking, etc.) to output customized tracking results for non TF/Pytorch workloads. Training jobs of experiment typically generate model artifacts (files) which need persisted, and both of notebook, model serving needs to load model artifacts from persistent storage. There're various of meta information, such as experiment meta, model registry, model serving, notebook, experiment, environment, etc. We need be able to read these meta information back.We also have code for experiment (like training/batch-prediction), notebook (ipynb), and model servings.And notebook/experiments/model-serving need depend on environments (dependencies such as pip, and Docker Images). "},{"title":"Implementation considerations for ML-related objects​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#implementation-considerations-for-ml-related-objects","content":"Object Type\tCharacteristics\tWhere to storeMetrics: tf.events\tTime series data with k/v, appendable to file\tLocal/EBS, HDFS, Cloud Blob Storage Metrics: other tracking metrics\tTime series data with k/v, appendable to file\tLocal, HDFS, Cloud Blob Storage, Database Logs\tLarge volumes, #files are potentially huge.\tLocal (temporary), HDFS (need aggregation), Cloud Blob Storage Submarine Metastore\tCRUD operations for small meta data.\tDatabase Model Artifacts\tSize varies for model (from KBs to GBs). #files are potentially huge.\tHDFS, Cloud Blob Storage Code\tNeed version control. (Please find detailed discussions below for code storage and localization)\tTarball on HDFS/Cloud Blog Storage, or Git Environment (Dependencies, Docker Image) Public/private environment repo (like Conda channel), Docker registry. "},{"title":"Detailed discussions​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#detailed-discussions","content":"Store code for experiment/notebook/model-serving​ There're following ways to get experiment code: 1) Code is part of Git repo: (Recommended) This is our recommended approach, once code is part of Git, it will be stored in version control, any change will be tracked, and much easier for users to trace back what change triggered a new bug, etc. 2) Code is part of Docker image: This is an anti-pattern and we will NOT recommend you to use it, Docker image can be used to include ANYTHING, like dependencies, the code you will execute, or even data. But this doesn't mean you should do it. We recommend to use Docker image ONLY for libraries/dependencies. Making code to be part of Docker image makes hard to edit code (if you want to update a value in your Python file, you will have to recreate the Docker image, push it and rerun it). 3) Code is part of S3/HDFS/ABFS: User may want to store their training code to a tarball on a shared storage. Submarine need to download code from remote storage to the launched container before running the code. Localization of experiment/notebook/model-serving code​ To make user experiences keeps same across different environment, we will localize code to a same folder after the container is launched, preferably /code For example, there's a git repo need to be synced up for an experiment/notebook/model-serving (example above): experiment: #Or notebook, model-serving name: &quot;abc&quot;, environment: &quot;team-default-ml-env&quot; ... (other fields) code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot;  After localize, training-job/ will be placed under /code When we running on K8s environment, we can use K8s's initContainer and emptyDir to do these things for us. K8s POD spec (generated by Submarine server instead of user, user should NEVER edit K8s spec, that's too unfriendly to data-scientists): apiVersion: v1 kind: Pod metadata: name: experiment-abc spec: containers: - name: experiment-task image: training-job volumeMounts: - name: code-dir mountPath: /code initContainers: - name: git-localize image: git-sync command: &quot;git clone .. /code/&quot; volumeMounts: - name: code-dir mountPath: /code volumes: - name: code-dir emptyDir: {}  The above K8s spec create a code-dir and mount it to /code to launched containers. The initContainer git-localize uses https://github.com/kubernetes/git-sync to do the sync up. (If other storages are used such as s3, we can use similar initContainer approach to download contents) "},{"title":"System-related metrics/logs and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#system-related-metricslogs-and-their-storages","content":"Other than ML-related objects, we have system-related objects, including: Daemon logs (like logs of Submarine server). Logs for other dependency components (like Kubernetes logs when running on K8s). System metrics (Physical resource usages by daemons, launched training containers, etc.).  All these information should be handled by 3rd party system, such as Grafana, Prometheus, etc. And system admins are responsible to setup these infrastructures, dashboard. Users of submarine should NOT interact with system related metrics/logs. It is system admin's responsibility. "},{"title":"Attachable Volumes​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#attachable-volumes","content":"It is possible user has needs to have an attachable volume for their experiment / notebook, this is especially useful for notebook storage, since contents of notebook can be automatically saved, and it can be used as user's home folder. Downside of attachable volume is, it is not versioned, even notebook is mainly used for adhoc exploring tasks, an unversioned notebook file can lead to maintenance issues in the future. Since this is a common requirement, we can consider to support attachable volumes in Submarine in a long run, but with relatively lower priority. "},{"title":"In-scope / Out-of-scope​","type":1,"pageTitle":"Storage Implementation","url":"docs/0.6.0/designDocs/storage-implementation#in-scope--out-of-scope","content":"Describe what Submarine project should own and what Submarine project should NOT own. "},{"title":"Generic Expeiment Spec","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec","content":"","keywords":""},{"title":"Motivation​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#motivation","content":"As the machine learning platform, the submarine should support multiple machine learning frameworks, such as Tensorflow, Pytorch etc. But different framework has different distributed components for the training experiment. So that we designed a generic experiment spec to abstract the training experiment across different frameworks. In this way, the submarine-server can hide the complexity of underlying infrastructure differences and provide a cleaner interface to manager experiments "},{"title":"Proposal​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#proposal","content":"Considering the Tensorflow and Pytorch framework, we propose one spec which consists of library spec, submitter spec and task specs etc. Such as: name: &quot;mnist&quot; librarySpec: name: &quot;TensorFlow&quot; version: &quot;2.1.0&quot; image: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; cmd: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot; envVars: ENV_1: &quot;ENV1&quot; submitterSpec: type: &quot;k8s&quot; namespace: &quot;submarine&quot; taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  "},{"title":"Library Spec​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#library-spec","content":"The library spec describes the info about machine learning framework. All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tNO\tMachine Learning Framework name. Only &quot;tensorflow&quot; and &quot;pytorch&quot; is supported. It doesn't matter if the value is uppercase or lowercase. version\tstring\tNO\tThe version of ML framework. Such as: 2.1.0 image\tstring\tNO\tThe public image used for each task if not specified. Such as: apache/submarine cmd\tstring\tYES\tThe public entry cmd for the task if not specified. envVars\tkey/value\tYES\tThe public env vars for the task if not specified. "},{"title":"Submitter Spec​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#submitter-spec","content":"It describes the info of submitter which the user specified, such as yarn, yarnservice or k8s. All the fields as below: field\ttype\toptional\tdescriptiontype\tstring\tNO\tThe submitter type, supports k8s now configPath\tstring\tYES\tThe config path of the specified resource manager. You can set it in submarine-site.xml if run submarine-server locally namespace\tstring\tNO\tIt's known as queue in Apache Hadoop YARN and namespace in Kubernetes. kind\tstring\tYES\tIt's used for k8s submitter, supports TFJob and PyTorchJob apiVersion\tstring\tYES\tIt should pair with the kind, such as the TFJob's api version is kubeflow.org/v1 "},{"title":"Task Spec​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#task-spec","content":"It describes the task info, the tasks make up the experiment. So it must be specified when submit the experiment. All the tasks should putted into the key value collection. Such as: taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tYES\tThe experiment name, if not specify using the library name image\tstring\tYES\tThe experiment docker image cmd\tstring\tYES\tThe entry command for running task envVars\tkey/value\tYES\tThe environment variables for the task resources\tstring\tNO\tThe limit resource for the task. Formatter: cpu=%s,memory=%s,nvidia.com/gpu=%s "},{"title":"Implements​","type":1,"pageTitle":"Generic Expeiment Spec","url":"docs/0.6.0/designDocs/submarine-server/experimentSpec#implements","content":"For more info see SUBMARINE-321 "},{"title":"Submarine Server Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/submarine-server/architecture","content":"","keywords":""},{"title":"Architecture Overview​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#architecture-overview","content":" +---------------Submarine Server ---+ | | | +------------+ +------------+ | | |Web Svc/Prxy| |Backend Svc | | +--Submarine Asset + | +------------+ +------------+ | |Project/Notebook | | ^ ^ | |Model/Metrics | +---|---------|---------------------+ |Libraries/Dataset | | | +------------------+ | | | +--|-Compute Cluster 1---+ +--Image Registry--+ + | | | | User's Images | User / | + | | | Admin | User Notebook Instance | +------------------+ | Experiment Runs | +------------------------+ +-Data Storage-----+ | S3/HDFS, etc. | +----Compute Cluster 2---+ | | +------------------+ ...  Here's a diagram to illustrate the Submarine's deployment. Submarine Server consists of web service/proxy, and backend services. They're like &quot;control planes&quot; of Submarine, and users will interact with these services.Submarine server could be a microservice architecture and can be deployed to one of the compute clusters. (see below, this will be useful when we only have one cluster). There're multiple compute clusters that could be used by Submarine service. For user's running notebook instance, jobs, etc. they will be placed to one of the compute clusters by user's preference or defined policies.Submarine's asset includes project/notebook(content)/models/metrics/dataset-meta, etc. can be stored inside Submarine's own database.Datasets can be stored in various locations such as S3/HDFS. Users can push container (such as Docker) images to a preconfigured registry in Submarine, so Submarine service can know how to pull required container images.Image Registry/Data-Storage, etc. are outside of Submarine server's scope and should be managed by 3rd party applications. "},{"title":"Submarine Server and its APIs​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#submarine-server-and-its-apis","content":"Submarine server is designed to allow data scientists to access notebooks, submit/manage jobs, manage models, create model training workflows, access datasets, etc. Submarine Server exposed UI and REST API. Users can also use CLI / SDK to manage assets inside Submarine Server.  +----------+ | CLI |+---+ +----------+ v +----------------+ +--------------+ | Submarine | +----------+ | REST API | | | | SDK |+&gt;| |+&gt; Server | +----------+ +--------------+ | | ^ +----------------+ +----------+ | | UI |+---+ +----------+  REST API will be used by the other 3 approaches. (CLI/SDK/UI) The REST API Service handles HTTP requests and is responsible for authentication. It acts as the caller for the JobManager component. The REST component defines the generic job spec which describes the detailed info about job. For more details, refer to here. (Please note that we're converting REST endpoint description from Java-based REST API to swagger definition, once that is done, we should replace the link with swagger definition spec). "},{"title":"Proposal​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#proposal","content":" +---------------------+ +-----------+ | +--------+ +----+ | | | | |runtime1+--&gt;+job1| | | workbench +---+ +----------------------------------+ | +--------+ +----+ | | | | | +------+ +---------------------+ | +--&gt;+ +--------+ +----+ | +-----------+ | | | | | +------+ +-------+ | | | | |runtime2+--&gt;+job2| | | | | | | | YARN | | K8s | | | | | +--------+ +----+ | +-----------+ | | | | | +------+ +-------+ | | | | YARN Cluster | | | | | | | | submitter | | | +---------------------+ | CLI +------&gt;+ | REST | +---------------------+ +---+ | | | | | | +---------------------+ | | +---------------------+ +-----------+ | | | | | +-------+ +-------+ | | | | +--------+ +----+ | | | | | | |PlugMgr| |monitor| | | | | | +--&gt;+job1| | +-----------+ | | | | | +-------+ +-------+ | | | | | | +----+ | | | | | | | | JobManager | | +--&gt;+ |operator| +----+ | | SDK +---+ | +------+ +---------------------+ | | | +--&gt;+job2| | | | +----------------------------------+ | +--------+ +----+ | +-----------+ | K8s Cluster | client server +---------------------+  We propose to split the original core module in the old layout into two modules, CLI and server as shown in FIG. The submarine-client calls the REST APIs to submit and retrieve the job info. The submarine-server provides the REST service, job management, submitting the job to cluster, and running job in different clusters through the corresponding runtime. "},{"title":"Submarine Server Components​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#submarine-server-components","content":" +----------------------Submarine Server--------------------------------+ | +-----------------+ +------------------+ +--------------------+ | | | Experiment | |Notebook Session | |Environment Mgr | | | | Mgr | |Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | Model Registry | |Model Serving Mgr | |Compute Cluster Mgr | | | | | | | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | DataSet Mgr | |User/Team | |Metadata Mgr | | | | | |Permission Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | +----------------------------------------------------------------------+  "},{"title":"Experiment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#experiment-manager","content":"TODO "},{"title":"Notebook Sessions Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#notebook-sessions-manager","content":"TODO "},{"title":"Environment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#environment-manager","content":"TODO "},{"title":"Model Registry​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#model-registry","content":"TODO "},{"title":"Model Serving Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#model-serving-manager","content":"TODO "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#compute-cluster-manager","content":"TODO "},{"title":"Dataset Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#dataset-manager","content":"TODO "},{"title":"User/team permissions manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#userteam-permissions-manager","content":"TODO "},{"title":"Metadata Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#metadata-manager","content":"TODO "},{"title":"Components/services outside of Submarine Server's scope​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/0.6.0/designDocs/submarine-server/architecture#componentsservices-outside-of-submarine-servers-scope","content":"TODO: Describe what are the out-of-scope components, which should be handled and managed outside of Submarine server. Candidates are: Identity management, data storage, metastore storage, etc. "},{"title":"Security Implementation","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/wip-designs/security-implementation","content":"","keywords":""},{"title":"Handle User's Credential​","type":1,"pageTitle":"Security Implementation","url":"docs/0.6.0/designDocs/wip-designs/security-implementation#handle-users-credential","content":"Users credential includes Kerberoes Keytabs, Docker registry credentials, Github ssh-keys, etc. User's credential must be stored securitely, for example, via KeyCloak or K8s Secrets. (More details TODO) "},{"title":"Cluster Server Design - High-Availability","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer","content":"","keywords":""},{"title":"Below is existing proposal:​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#below-is-existing-proposal","content":""},{"title":"Introduction​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#introduction","content":"The Submarine system contains a total of two daemon services, Submarine Server and Workbench Server. Submarine Server mainly provides job submission, job scheduling, job status monitoring, and model online service for Submarine. Workbench Server is mainly for algorithm users to provide algorithm development, Python/Spark interpreter operation, and other services through Notebook. The goal of the Submarine project is to provide high availability and high-reliability services for big data processing, algorithm development, job scheduling, model online services, model batch, and incremental updates. In addition to the high availability of big data and machine learning frameworks, the high availability of Submarine Server and Workbench Server itself is a key consideration. "},{"title":"Requirement​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#requirement","content":""},{"title":"Cluster Metadata Center​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#cluster-metadata-center","content":"Multiple Submarine (or Workbench) Server processes create a Submarine Cluster through the RAFT algorithm library. The cluster internally maintains a metadata center. All servers can operate the metadata. The RAFT algorithm ensures that multiple processes are simultaneously co-located. A data modification will not cause problems such as mutual coverage and dirty data. This metadata center stores data by means of key-value pairs. it can store/support a variety of data, but it should be noted that metadata is only suitable for storing small amounts of data and cannot be used to replace data storage. "},{"title":"Service discovery​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#service-discovery","content":"By storing the information of the service or process in the metadata center, we can easily find the information of the service or process we need in any place, for example, the IP address and port where the Python interpreter will be the process. Information is stored in metadata, and other services can easily find process information through process IDs and connect to provide service discovery capabilities. "},{"title":"Cluster event​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#cluster-event","content":"In the entire Submarine cluster, the servers can communicate with each other and other child processes to send cluster events to each other. The service or process processes the corresponding programs according to the cluster events. For example, the Workbench Server can be managed to Python. The interpreter process sends a shutdown event that controls the operation of the services and individual subprocesses throughout the cluster. Cluster events support both broadcast and separate delivery capabilities. "},{"title":"Independence​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#independence","content":"We implement Submarine's clustering capabilities through the RAFT algorithm library, without relying on any external services (e.g. Zookeeper, Etcd, etc.) "},{"title":"Disadvantages​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#disadvantages","content":"Because the RAFT algorithm requires more than half of the servers available to ensure the normality of the RAFT algorithm, if we need to turn on the clustering capabilities of Submarine (Workbench) Server, when more than half of the servers are unavailable, some programs may appear abnormal. Of course, we also detected this in the system, downgrading the system or refusing to provide service status. "},{"title":"System design​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#system-design","content":""},{"title":"Universal design​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#universal-design","content":"Modular design, Submarine (Workbench) Server exists in the Submarine system, these two services need to provide clustering capabilities, so we abstract the cluster function into a separate module for development so that Submarine (Workbench) Server can reuse the cluster function module. "},{"title":"ClusterConfigure​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#clusterconfigure","content":"Add a submarine.server.addr and workbench.server.addr configuration items in submarine-site.xml, submarine.server.addr=ip1, ip2, ip3, through the IP list, the RAFT algorithm module in the server process can Cluster with other server processes. "},{"title":"ClusterServer​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#clusterserver","content":"The ClusterServer module encapsulates the RAFT algorithm module, which can create a service cluster and read and write metadata based on the two configuration items submarine.server.addr or workbench.server.addr. The cluster management service runs in each submarine server; The cluster management service establishes a cluster by using the atomix RaftServer class of the Raft algorithm library, maintains the ClusterStateMachine, and manages the service state metadata of each submarine server through the PutCommand, GetQuery, and DeleteCommand operation commands. "},{"title":"ClusterClient​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#clusterclient","content":"The ClusterClient module encapsulates the RAFT algorithm client module, which can communicate with the cluster according to the two configuration items submarine.server.addr or workbench.server.addr, read and write metadata, and write the IP and port information of the client process. Into the cluster's metadata center. The cluster management client runs in each submarine server and submarine Interpreter process; The cluster management client manages the submarine server and submarine Interpreter process state (metadata information) in the ClusterStateMachine by using the atomix RaftClient class of the Raft library to connect to the atomix RaftServer. When the submarine server and Submarine Interpreter processes are started, they are added to the ClusterStateMachine and are removed from the ClusterStateMachine when the Submarine Server and Submarine Interpreter processes are closed. "},{"title":"ClusterMetadata​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#clustermetadata","content":"Metadata stores metadata information in a KV key-value pair。 ServerMeta：key='host:port'，value= {SERVER_HOST=...，SERVER_PORT=...，...} Name\tDescriptionSUBMARINE_SERVER_HOST\tSubmarine server IP SUBMARINE_SERVER_PORT\tSubmarine server port WORKBENCH_SERVER_HOST\tSubmarine workbench server IP WORKBENCH_SERVER_PORT\tSubmarine workbench server port InterpreterMeta：key=InterpreterGroupId，value={INTP_TSERVER_HOST=...，...} Name\tDescriptionINTP_TSERVER_HOST\tSubmarine Interpreter Thrift IP INTP_TSERVER_PORT\tSubmarine Interpreter Thrift port INTP_START_TIME\tSubmarine Interpreter start time HEARTBEAT\tSubmarine Interpreter heartbeat time "},{"title":"Network fault tolerance​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#network-fault-tolerance","content":"In a distributed environment, there may be network anomalies, network delays, or service exceptions. After submitting metadata to the cluster, check whether the submission is successful. After the submission fails, save the metadata in the local message queue. A separate commit thread to retry; "},{"title":"Cluster monitoring​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#cluster-monitoring","content":"The cluster needs to monitor whether the Submarine Server and Submarine-Interpreter processes are working properly. The Submarine Server and Submarine Interpreter processes periodically send heartbeats to update their own timestamps in the cluster metadata. The Submarine Server with Leader identity periodically checks the timestamps of the Submarine Server and Submarine Interpreter processes to clear the timeout services and processes. The cluster monitoring module runs in each Submarine Server and Submarine Interpreter process, periodically sending heartbeat data of the service or process to the cluster; When the cluster monitoring module runs in Submarine Server, it sends the heartbeat to the cluster's ClusterStateMachine. If the cluster does not receive heartbeat information for a long time, Indicates that the service or process is abnormal and unavailable. Resource usage statistics strategy, in order to avoid the instantaneous high peak and low peak of the server, the cluster monitoring will collect the average resource usage in the most recent period for reporting, and improve the reasonable line and effectiveness of the server resources as much as possible; When the cluster monitoring module runs in the Submarine Server, it checks the heartbeat data of each Submarine Server and Submarine Interpreter process. If it times out, it considers that the service or process is abnormally unavailable and removes it from the cluster. "},{"title":"Atomix Raft algorithm library​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#atomix-raft-algorithm-library","content":"In order to reduce the deployment complexity of distributed mode, submarine server does not use Zookeeper to build a distributed cluster. Multiple submarine server groups are built into distributed clusters by using the Raft algorithm in submarine server. The Raft algorithm is involved by atomix lib of atomix that has passed Jepsen consistency verification. "},{"title":"Synchronize workbench notes​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#synchronize-workbench-notes","content":"In cluster mode, the user creates, modifies, and deletes the note on any of the servers. All need to be notified to all the servers in the cluster to synchronize the update of Notebook. Failure to do so will result in the user not being able to continue while switching to another server. "},{"title":"Listen for note update events​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#listen-for-note-update-events","content":"Listen for the NEW_NOTE, DEL_NOTE, REMOVE_NOTE_TO_TRASH ... event of the notebook in the NotebookServer#onMessage() function. "},{"title":"Broadcast note update event​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/0.6.0/designDocs/wip-designs/submarine-clusterServer#broadcast-note-update-event","content":"The note is refreshed by notifying the event to all Submarine servers in the cluster via messaging Service. "},{"title":"Submarine Launcher","type":0,"sectionRef":"#","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#introduction","content":"Submarine is built and run in Cloud Native, taking advantage of the cloud computing model. To give full play to the advantages of cloud computing. These applications are characterized by rapid and frequent build, release, and deployment. Combined with the features of cloud computing, they are decoupled from the underlying hardware and operating system, and can easily meet the requirements of scalability, availability, and portability. And provide better economy. In the enterprise data center, submarine can support k8s/yarn/docker three resource scheduling systems; in the public cloud environment, submarine can support these cloud services in GCE/AWS/Azure; "},{"title":"Requirement​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#requirement","content":""},{"title":"Cloud-Native Service​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#cloud-native-service","content":"The submarine server is a long-running services in the daemon mode. The submarine server is mainly used by algorithm engineers to provide online front-end functions such as algorithm development, algorithm debugging, data processing, and workflow scheduling. And submarine server also mainly used for back-end functions such as scheduling and execution of jobs, tracking of job status, and so on. Through the ability of rolling upgrades, we can better provide system stability. For example, we can upgrade or restart the workbench server without affecting the normal operation of submitted jobs. You can also make full use of system resources. For example, when the number of current developers or job tasks increases, The number of submarine server instances can be adjusted dynamically. In addition, submarine will provide each user with a completely independent workspace container. This workspace container has already deployed the development tools and library files commonly used by algorithm engineers including their operating environment. Algorithm engineers can work in our prepared workspaces without any extra work. Each user's workspace can also be run through a cloud service. "},{"title":"Service discovery​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#service-discovery","content":"With the cluster function of submarine, each service only needs to run in the container, and it will automatically register the service in the submarine cluster center. Submarine cluster management will automatically maintain the relationship between service and service, service and user. "},{"title":"Design​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#design","content":" "},{"title":"Launcher​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher","content":"The submarine launcher module defines the complete interface. By using this interface, you can run the submarine server, and workspace in k8s / yarn / docker / AWS / GCE / Azure. "},{"title":"Launcher On Docker​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-docker","content":"In order to allow some small and medium-sized users without k8s/yarn to use submarine, we support running the submarine system in docker mode. Users only need to provide several servers with docker runtime environment. The submarine system can automatically cluster these servers into clusters, manage all the hardware resources of the cluster, and run the service or workspace container in this cluster through scheduling algorithms. "},{"title":"Launcher On Kubernetes​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-kubernetes","content":"submarine operator "},{"title":"Launcher On Yarn​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-yarn","content":"[TODO] "},{"title":"Launcher On AWS​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-aws","content":"[TODO] "},{"title":"Launcher On GCP​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-gcp","content":"[TODO] "},{"title":"Launcher On Azure​","type":1,"pageTitle":"Submarine Launcher","url":"docs/0.6.0/designDocs/wip-designs/submarine-launcher#launcher-on-azure","content":"[TODO] "},{"title":"How to Build Submarine","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/BuildFromCode","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"How to Build Submarine","url":"docs/0.6.0/devDocs/BuildFromCode#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Quick Start​","type":1,"pageTitle":"How to Build Submarine","url":"docs/0.6.0/devDocs/BuildFromCode#quick-start","content":""},{"title":"Build Your Custom Submarine Docker Images​","type":1,"pageTitle":"How to Build Submarine","url":"docs/0.6.0/devDocs/BuildFromCode#build-your-custom-submarine-docker-images","content":"Submarine provides default Docker image in the release artifacts, sometimes you would like to do some modifications on the images. You can rebuild Docker image after you make changes. Note that you need to make sure the images built above can be accessed in k8s Usually this needs to rename and push to a proper Docker registry. mvn clean package -DskipTests  Build submarine server image: ./dev-support/docker-images/submarine/build.sh  Build submarine database image: ./dev-support/docker-images/database/build.sh  "},{"title":"Building source code / binary distribution​","type":1,"pageTitle":"How to Build Submarine","url":"docs/0.6.0/devDocs/BuildFromCode#building-source-code--binary-distribution","content":"Checking releases for licenses mvn clean org.apache.rat:apache-rat-plugin:check  Create binary distribution with default hadoop version mvn clean package -DskipTests  Create binary distribution with hadoop-2.9.x version mvn clean package -DskipTests -Phadoop-2.9  Create binary distribution with hadoop-2.10.x version mvn clean package -DskipTests -Phadoop-2.10  Create binary distribution with hadoop-3.1.x version mvn clean package -DskipTests -Phadoop-3.1  Create binary distribution with hadoop-3.2.x version mvn clean package -DskipTests -Phadoop-3.2  Create source code distribution mvn clean package -DskipTests -Psrc  "},{"title":"Building source code / binary distribution with Maven Wrapper​","type":1,"pageTitle":"How to Build Submarine","url":"docs/0.6.0/devDocs/BuildFromCode#building-source-code--binary-distribution-with-maven-wrapper","content":"Maven Wrapper (Optional): Maven Wrapper can help you avoid dependencies problem about Maven version. # Setup Maven Wrapper (Maven 3.6.1) mvn -N io.takari:maven:0.7.7:wrapper -Dmaven=3.6.1 # Check Maven Wrapper ./mvnw -version # Replace 'mvn' with 'mvnw'. Example: ./mvnw clean package -DskipTests  "},{"title":"Project Architecture","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/","content":"","keywords":""},{"title":"1. Introduction​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#1-introduction","content":"This document mainly describes the structure of each module of the Submarine project, the development and test description of each module. "},{"title":"2. Submarine Project Structure​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#2-submarine-project-structure","content":""},{"title":"2.1. submarine-client​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#21-submarine-client","content":"Provide the CLI interface for submarine user. (Currently only support YARN service) "},{"title":"2.2. submarine-cloud-v2​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#22-submarine-cloud-v2","content":"The operator for Submarine application. For details, please see the README on github. "},{"title":"2.3. submarine-commons​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#23-submarine-commons","content":"Define utility function used in multiple packages, mainly related to hadoop. "},{"title":"2.4. submarine-dist​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#24-submarine-dist","content":"Store the pre-release files. "},{"title":"2.5. submarine-sdk​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#25-submarine-sdk","content":"Provide Python SDK for submarine user. "},{"title":"2.6. submarine-security​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#26-submarine-security","content":"Provide authorization for Apache Spark to talking to Ranger Admin. "},{"title":"2.7. submarine-server​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#27-submarine-server","content":"Include core server, restful api, and k8s/yarn submitter. "},{"title":"2.8. submarine-test​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#28-submarine-test","content":"Provide end-to-end and k8s test for submarine. "},{"title":"2.9. submarine-workbench​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#29-submarine-workbench","content":"workbench-server: is a Jetty-based web server service. Workbench-server provides RESTful interface and Websocket interface. The RESTful interface provides workbench-web with management capabilities for databases such as project, department, user, and role.workbench-web: is a web front-end service based on Angular.js framework. With workbench-web users can manage Submarine project, department, user, role through browser. You can also use the notebook to develop machine learning algorithms, model release and other lifecycle management. "},{"title":"2.10 dev-support​","type":1,"pageTitle":"Project Architecture","url":"docs/0.6.0/devDocs/#210-dev-support","content":"mini-submarine: by using the docker image provided by Submarine, you can experience all the functions of Submarine in a single docker environment, while mini-submarine also provides developers with a development and testing environment, Avoid the hassle of installing and deploying the runtime environment.submarine-installer: submarine-installer is our submarine runtime environment installation tool for yarn-3.1+ and above.By using submarine-installer, it is easy to install and deploy system services such asdocker, nvidia-docker, nvidia driver, ETCD, Calico network etc. required by yarn-3.1+. "},{"title":"Dependencies for Submarine","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/Dependencies","content":"","keywords":""},{"title":"Kubernetes​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/0.6.0/devDocs/Dependencies#kubernetes","content":"Kubernetes Version\tSupport?1.13.x (or earlier)\tX 1.14.x\t√ 1.15.x\t√ 1.16.x\t√ 1.17.x\tTo be verified 1.18.x\tTo be verified "},{"title":"KinD​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/0.6.0/devDocs/Dependencies#kind","content":"KinD Version\tSupport?0.5.x (or earlier)\tX 0.6.x\t√ 0.7.x\t√ 0.8.x\t√ 0.9.x\t√ 0.10.x\t√ 0.11.x\t√ "},{"title":"Java​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/0.6.0/devDocs/Dependencies#java","content":"TODO "},{"title":"Maven​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/0.6.0/devDocs/Dependencies#maven","content":"TODO "},{"title":"Docker​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/0.6.0/devDocs/Dependencies#docker","content":"TODO "},{"title":"How to Run Frontend Integration Test","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/IntegrationTestE2E","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/0.6.0/devDocs/IntegrationTestE2E#introduction","content":"The test cases under the directory test-e2e are integration tests to ensure the correctness of the Submarine Workbench. These test cases can be run either locally or on GitHub Actions. "},{"title":"Run E2E test locally​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/0.6.0/devDocs/IntegrationTestE2E#run-e2e-test-locally","content":"Ensure you have setup the submarine locally. If not, you can refer to Submarine Local Deployment. Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80 Modify run_frontend_e2e.sh You need to modify the port and the URL in this script to where you run the workbench on. Example: If your Submarine workbench is running on 127.0.0.1:4200, you should modify the WORKBENCH_PORT to 4200. # at submarine-test/test_e2e/run_frontend_e2e.sh ... # ======= Modifiable Variables ======= # # Note: URL must start with &quot;http&quot; # (Ref: https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/WebDriver.html#get(java.lang.String)) WORKBENCH_PORT=8080 #&lt;= modify this URL=&quot;http://127.0.0.1&quot; #&lt;=modify this # ==================================== # ... Run run_frontend_e2e.sh (Run a specific test case) This script will check whether the port can be accessed or not, and run the test case. # at submarine-test/test_e2e ./run_fronted_e2e.sh ${TESTCASE} # TESTCASE is the IT you want to run, ex: loginIT, experimentIT... Run all test cases Following commands will compile all files and run all files ending with &quot;IT&quot; in the directory. # Make sure the Submarine workbench is running on 127.0.0.1:8080 cd submarine/submarine-test/test-e2e # Method 1: mvn verify # Method 2: mvn clean install -U  "},{"title":"Run E2E test in GitHub Actions​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/0.6.0/devDocs/IntegrationTestE2E#run-e2e-test-in-github-actions","content":"Each time a commit is pushed, GitHub Actions will be triggered automatically. "},{"title":"Add a new frontend E2E test case​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/0.6.0/devDocs/IntegrationTestE2E#add-a-new-frontend-e2e-test-case","content":"WARNING You MUST read the document carefully, and understand the difference between explicit wait, implicit wait, and fluent wait.Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. We define many useful functions in AbstractSubmarineIT.java. "},{"title":"Development Guide","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/Development","content":"","keywords":""},{"title":"Video​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#video","content":"From this Video, you will know how to deal with the configuration of Submarine and be able to contribute to it via Github. "},{"title":"Develop server​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#develop-server","content":""},{"title":"Prerequisites​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Setting up checkstyle in IDE​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#setting-up-checkstyle-in-ide","content":"Checkstyle plugin may help to detect violations directly from the IDE. Install Checkstyle+IDEA plugin from Preference -&gt; PluginsOpen Preference -&gt; Tools -&gt; Checkstyle -&gt; Set Checkstyle version: Checkstyle version: 8.0 Add (+) a new Configuration File Description: SubmarineUse a local checkstyle ${SUBMARINE_HOME}/dev-support/maven-config/checkstyle.xml Open the Checkstyle Tool Window, select the Submarine rule and execute the check "},{"title":"Testing​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#testing","content":"Unit Test For each class, there is a corresponding testClass. For example, SubmarineServerTest is used for testing SubmarineServer. Whenever you add a funtion in classes, you must write a unit test to test it. Integration Test: IntegrationTestK8s.md "},{"title":"Build from source​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#build-from-source","content":"Before building We assume the developer use minikube as a local kubernetes cluster.Make sure you have installed the submarine helm-chart in the cluster. Package the Submarine server into a new jar file mvn package -DskipTests Build the new server docker image in minikube # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/submarine/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Update server pod helm upgrade --set submarine.server.dev=true submarine ./helm-charts/submarine Set submarine.server.dev to true, enabling the server pod to be launched with the new docker image. "},{"title":"Develop workbench​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#develop-workbench","content":"Deploy the Submarine Follow Getting Started/Quickstart, and make sure you can connect to http://localhost:32080 in the browser. Install the dependencies cd submarine-workbench/workbench-web npm install Run the workbench based on proxy server npm run start The request sent to http://localhost:4200 will be redirected to http://localhost:32080.Open http://localhost:4200 in browser to see the real-time change of workbench. Frontend E2E test: IntegrationTestE2E.md "},{"title":"Develop database​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#develop-database","content":"Build the docker image # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/database/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Deploy new pods in the cluster helm upgrade --set submarine.database.dev=true submarine ./helm-charts/submarine Develop operator​ Before building We assume the developer use minikube as a local kubernetes cluster.Make sure you have NOT installed the submarine helm-chart in the cluster. Start the minikube cluster minikube start --vm-driver=docker --kubernetes-version v1.15.11 Install the dependencies cd submarine-cloud-v2/ cp -r ../helm-charts/submarine/charts ./helm-charts/submarine-operator/ go mod vendor helm install --set dev=true submarine-operator ./helm-charts/submarine-operator/ Run the operator out-of-cluster make ./submarine-operator Deploy a Submarine kubectl apply -f artifacts/examples/crd.yaml kubectl create ns submarine-user-test kubectl apply -n submarine-user-test -f artifacts/examples/example-submarine.yaml Exposing service # Method1 -- use minikube ip minikube ip # you'll get the IP address of minikube, ex: 192.168.49.2 # Method2 -- use port-forwarding kubectl port-forward --address 0.0.0.0 -n submarine-user-test service/traefik 32080:80 View workbench If you use method 1 in step 5, please go to http://{minikube ip}:32080, ex: http://192.168.49.2:32080 If you use method 2 in step 5, please go to http://127.0.0.1:32080 Delete submarine kubectl delete submarine example-submarine -n submarine-user-test Stop the operator Press ctrl+c to stop the operator. Uninstall helm chart dependencies helm delete submarine-operator  For other details, please check out the README and Developer Guide on GitHub. "},{"title":"Develop Submarine Website​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#develop-submarine-website","content":"Submarine website is built using Docusaurus 2, a modern static website generator. We store all the website content in markdown format in the submarine/website/docs. When committing a new patch to the submarine repo, Docusaurus will help us generate the html and javascript files and push them to https://github.com/apache/submarine-site/tree/asf-site. To update the website, click “Edit this page” on the website.  "},{"title":"Add a new page​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#add-a-new-page","content":"If you want to add a new page to the website, make sure to add the file path to sidebars.js. "},{"title":"Installation​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#installation","content":"We use the yarn package manager to install all dependencies for the website yarn install  "},{"title":"Build​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#build","content":"Make sure you can successfully build the website before creating a pull request. yarn build  "},{"title":"Local Development​","type":1,"pageTitle":"Development Guide","url":"docs/0.6.0/devDocs/Development#local-development","content":"This command starts a local development server and open up a browser window. Most changes are reflected live without having to restart the server. yarn start  "},{"title":"Download Apache Submarine","type":0,"sectionRef":"#","url":"docs/0.6.0/download","content":"","keywords":""},{"title":"Verify the integrity of the files​","type":1,"pageTitle":"Download Apache Submarine","url":"docs/0.6.0/download#verify-the-integrity-of-the-files","content":"It is essential that you verify the integrity of the downloaded files using the PGP or MD5 signatures. This signature should be matched against the KEYS file. gpg --import KEYS gpg --verify submarine-dist-X.Y.Z-src.tar.gz.asc  "},{"title":"Old releases​","type":1,"pageTitle":"Download Apache Submarine","url":"docs/0.6.0/download#old-releases","content":"Apache Submarine 0.4.0 released on Jul 05, 2020 (release notes) (git tag) Binary package with submarine:submarine-dist-0.4.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)Source:submarine-dist-0.4.0-src.tar.gz (6 MB,checksum,signature)Docker images:mini-submarine (guide) Apache Submarine 0.3.0 released on Feb 01, 2020 (release notes) (git tag) Binary package with submarine:submarine-dist-0.3.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)Source:submarine-dist-0.3.0-src.tar.gz (6 MB,checksum,signature)Docker images:mini-submarine (guide) Apache Submarine 0.2.0 released on Jul 2, 2019 Binary package with submarine:hadoop-submarine-0.2.0.tar.gz (111 MB,checksum,signature,Announcement) Source:hadoop-submarine-0.2.0-src.tar.gz (1.4 MB,checksum,signature) Apache Submarine 0.1.0 released on Jan 16, 2019 Binary package with submarine:submarine-0.2.0-bin-all.tgz (97 MB,checksum,signature,Announcement) Source:submarine-hadoop-3.2.0-src.tar.gz (1.1 MB,checksum,signature) "},{"title":"How to Run Integration K8s Test","type":0,"sectionRef":"#","url":"docs/0.6.0/devDocs/IntegrationTestK8s","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/0.6.0/devDocs/IntegrationTestK8s#introduction","content":"The test cases under the directory test-k8s are integration tests to ensure the correctness of the Submarine RESTful API. You can run these tests either locally or on GitHub Actions. Before running the tests, the minikube (KinD) cluster must be created. Then, compile and package the submarine project in submarine-dist directory for building a docker image. In addition, the 8080 port in submarine-traefik should be forwarded. "},{"title":"Run k8s test locally​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/0.6.0/devDocs/IntegrationTestK8s#run-k8s-test-locally","content":"Ensure you have setup the KinD cluster or minikube cluster. If you haven't, follow this minikube tutorial Build the submarine from source and upgrade the server pod through this guide Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 8080:80 Install the latest package &quot;submarine-server-core&quot; into the local repository, for use as a dependency in the module test-k8s mvn install -DskipTests Execute the test command mvn verify -DskipRat -pl :submarine-test-k8s -Phadoop-2.9 -B   "},{"title":"Run k8s test in GitHub Actions​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/0.6.0/devDocs/IntegrationTestK8s#run-k8s-test-in-github-actions","content":"Each time a code is submitted, GitHub Actions is triggered automatically. "},{"title":"WriteDockerfileKaldi","type":0,"sectionRef":"#","url":"docs/0.6.0/ecosystem/kaldi/WriteDockerfileKaldi","content":"","keywords":""},{"title":"Creating Docker Images for Running Kaldi on YARN​","type":1,"pageTitle":"WriteDockerfileKaldi","url":"docs/0.6.0/ecosystem/kaldi/WriteDockerfileKaldi#creating-docker-images-for-running-kaldi-on-yarn","content":""},{"title":"How to create docker images to run Kaldi on YARN​","type":1,"pageTitle":"WriteDockerfileKaldi","url":"docs/0.6.0/ecosystem/kaldi/WriteDockerfileKaldi#how-to-create-docker-images-to-run-kaldi-on-yarn","content":"Dockerfile to run Kaldi on YARN need two part: Base libraries which Kaldi depends on 1) OS base image, for example nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 2) Kaldi depended libraries and packages. For example python, g++, make. For GPU support, need cuda, cudnn, etc. 3) Kaldi compile. Libraries to access HDFS 1) JDK 2) Hadoop Here's an example of a base image (w/o GPU support) to install Kaldi: FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 RUN apt-get clean &amp;&amp; \\ apt-get update &amp;&amp; \\ apt-get install -y --no-install-recommends \\ sudo \\ openjdk-8-jdk \\ iputils-ping \\ g++ \\ make \\ automake \\ autoconf \\ bzip2 \\ unzip \\ wget \\ sox \\ libtool \\ git \\ subversion \\ python2.7 \\ python3 \\ zlib1g-dev \\ ca-certificates \\ patch \\ ffmpeg \\ vim &amp;&amp; \\ rm -rf /var/lib/apt/lists/* &amp;&amp; \\ ln -s /usr/bin/python2.7 /usr/bin/python RUN git clone --depth 1 https://github.com/kaldi-asr/kaldi.git /opt/kaldi &amp;&amp; \\ cd /opt/kaldi &amp;&amp; \\ cd /opt/kaldi/tools &amp;&amp; \\ ./extras/install_mkl.sh &amp;&amp; \\ make -j $(nproc) &amp;&amp; \\ cd /opt/kaldi/src &amp;&amp; \\ ./configure --shared --use-cuda &amp;&amp; \\ make depend -j $(nproc) &amp;&amp; \\ make -j $(nproc)  On top of above image, add files, install packages to access HDFS RUN apt-get update &amp;&amp; apt-get install -y openjdk-8-jdk wget # Install hadoop ENV HADOOP_VERSION=&quot;3.2.1&quot; ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 RUN wget https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz &amp;&amp; \\ tar zxf hadoop-${HADOOP_VERSION}.tar.gz &amp;&amp; \\ ln -s hadoop-${HADOOP_VERSION} hadoop-current &amp;&amp; \\ rm hadoop-${HADOOP_VERSION}.tar.gz  Build and push to your own docker registry: Use docker build ... and docker push ... to finish this step. "},{"title":"Use examples to build your own Kaldi docker images​","type":1,"pageTitle":"WriteDockerfileKaldi","url":"docs/0.6.0/ecosystem/kaldi/WriteDockerfileKaldi#use-examples-to-build-your-own-kaldi-docker-images","content":"We provided following examples for you to build kaldi docker images. For latest Kaldi *base/ubuntu-18.04/Dockerfile.gpu.kaldi_latest: Latest Kaldi that supports GPU, which is prebuilt to CUDA10, with models. "},{"title":"Build Docker images​","type":1,"pageTitle":"WriteDockerfileKaldi","url":"docs/0.6.0/ecosystem/kaldi/WriteDockerfileKaldi#build-docker-images","content":"Manually build Docker image:​ Under docker/ directory,The CLUSTER_NAME can be modified in build-all.sh to have installation permissions, run build-all.sh to build Docker images. It will build following images: kaldi-latest-gpu-base:0.0.1 for base Docker image which includes Hadoop, Kaldi, GPU base libraries, which includes thchs30 model. Use prebuilt images​ (No liability) You can also use prebuilt images for convenience in the docker hub: hadoopsubmarine/kaldi-latest-gpu-base:0.0.1 "},{"title":"RunningDistributedThchs30KaldiJobs","type":0,"sectionRef":"#","url":"docs/0.6.0/ecosystem/kaldi/RunningDistributedThchs30KaldiJobs","content":"","keywords":""},{"title":"Prepare data for training​","type":1,"pageTitle":"RunningDistributedThchs30KaldiJobs","url":"docs/0.6.0/ecosystem/kaldi/RunningDistributedThchs30KaldiJobs#prepare-data-for-training","content":"Thchs30 is a common benchmark in machine learning for speech data and transcripts. Below example is based on Thchs30 dataset. 1) download gz file: THCHS30_PATH=/data/hdfs1/nfs/aisearch/kaldi/thchs30 mkdir $THCHS30_PATH/data &amp;&amp; cd $THCHS30_PATH/data wget http://www.openslr.org/resources/18/data_thchs30.tgz wget http://www.openslr.org/resources/18/test-noise.tgz wget http://www.openslr.org/resources/18/resource.tgz  2) Checkout https://github.com/apache/submarine.git: git clone https://github.com/apache/submarine.git  3) Go to submarine/docker/ecosystem/ cp -r ./kaldi/sge $THCHS30_PATH/sge  4) optional，Modify /opt/kaldi/egs/thchs30/s5/cmd.sh in the Container,This queue is used by default export train_cmd=&quot;queue.pl -q all.q&quot;  Warning: Please note that YARN service doesn't allow multiple services with the same name, so please run following command yarn application -destroy &lt;service-name&gt;  to delete services if you want to reuse the same service name. "},{"title":"Prepare Docker images​","type":1,"pageTitle":"RunningDistributedThchs30KaldiJobs","url":"docs/0.6.0/ecosystem/kaldi/RunningDistributedThchs30KaldiJobs#prepare-docker-images","content":"Refer to Write Dockerfile to build a Docker image or use prebuilt one: hadoopsubmarine/kaldi-latest-gpu-base:0.0.1 "},{"title":"Run Kaldi jobs​","type":1,"pageTitle":"RunningDistributedThchs30KaldiJobs","url":"docs/0.6.0/ecosystem/kaldi/RunningDistributedThchs30KaldiJobs#run-kaldi-jobs","content":""},{"title":"Run distributed training​","type":1,"pageTitle":"RunningDistributedThchs30KaldiJobs","url":"docs/0.6.0/ecosystem/kaldi/RunningDistributedThchs30KaldiJobs#run-distributed-training","content":"# Change the variables according to your needs SUBMARINE_VERSION=3.3.0-SNAPSHOT WORKER_NUM=2 SGE_CFG_PATH=/cfg THCHS30_PATH=/data/hdfs1/nfs/aisearch/kaldi/thchs30 DOCKER_HADOOP_HDFS_HOME=/app/${SUBMARINE_VERSION} # Dependent on registrydns, you must fill in &lt; your RegistryDNSIP&gt; in resolv.conf yarn jar /usr/local/matrix/share/hadoop/yarn/${SUBMARINE_VERSION}.jar \\ job run --name kaldi-thchs30-distributed \\ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \\ --env DOCKER_HADOOP_HDFS_HOME=$DOCKER_HADOOP_HDFS_HOME \\ --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \\ --env PYTHONUNBUFFERED=&quot;0&quot; \\ --env TZ=&quot;Asia/Shanghai&quot; \\ --env YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=${THCHS30_PATH}/sge/resolv.conf:/etc/resolv.conf,\\ ${THCHS30_PATH}/sge/passwd:/etc/passwd:rw,\\ ${THCHS30_PATH}/sge/group:/etc/group:rw,\\ ${THCHS30_PATH}/sge:$SGE_CFG_PATH,\\ ${THCHS30_PATH}/data:/opt/kaldi/egs/thchs30,\\ ${THCHS30_PATH}/mul/s5:/opt/kaldi/egs/mul-thchs30/s5 \\ --input_path /opt/kaldi/egs/thchs30/data \\ --docker_image hadoopsubmarine/kaldi-latest-gpu-base:0.0.1 \\ --num_workers $WORKER_NUM \\ --worker_resources memory=64G,vcores=32,gpu=1 \\ --worker_launch_cmd &quot;sudo mkdir -p /opt/kaldi/egs/mul-thchs30/s5 &amp;&amp; \\ sudo cp /opt/kaldi/egs/thchs30/s5/* /opt/kaldi/egs/mul-thchs30/s5 -r &amp;&amp; \\ cluster_user=`whoami` domain_suffix=&quot;ml.com&quot; &amp;&amp; \\ cd /cfg &amp;&amp; bash sge_run.sh $WORKER_NUM $SGE_CFG_PATH &amp;&amp; \\ if [ $(echo $HOST_NAME |grep &quot;^master-&quot;) ] then sleep 2m &amp;&amp; cd /opt/kaldi/egs/mul-thchs30/s5 &amp;&amp; ./run.sh fi&quot; \\ --verbose  Explanations: &gt;1 num_workers indicates it is a distributed training.Parameters / resources / Docker image of parameter server can be specified separately. For many cases, parameter server doesn't require GPU.We don't need parameter server here For the meaning of the individual parameters, see the QuickStart page! Outputs of distributed training Sample output of master: ... Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: bsd-mailx cpio gridengine-common ifupdown iproute2 isc-dhcp-client isc-dhcp-common libatm1 libdns-export162 libisc-export160 liblockfile-bin liblockfile1 libmnl0 libxmuu1 libxtables11 ncurses-term netbase openssh-client openssh-server openssh-sftp-server postfix python3-chardet python3-pkg-resources python3-requests python3-six python3-urllib3 ssh-import-id ssl-cert tcsh xauth Suggested packages: libarchive1 gridengine-qmon ppp rdnssd iproute2-doc resolvconf avahi-autoipd isc-dhcp-client-ddns apparmor ssh-askpass libpam-ssh keychain monkeysphere rssh molly-guard ufw procmail postfix-mysql postfix-pgsql postfix-ldap postfix-pcre sasl2-bin libsasl2-modules dovecot-common postfix-cdb postfix-doc python3-setuptools python3-ndg-httpsclient python3-openssl python3-pyasn1 openssl-blacklist The following NEW packages will be installed: bsd-mailx cpio gridengine-client gridengine-common gridengine-exec gridengine-master ifupdown iproute2 isc-dhcp-client isc-dhcp-common libatm1 libdns-export162 libisc-export160 liblockfile-bin liblockfile1 libmnl0 libxmuu1 libxtables11 ncurses-term netbase openssh-client openssh-server openssh-sftp-server postfix python3-chardet python3-pkg-resources python3-requests python3-six python3-urllib3 ssh-import-id ssl-cert tcsh xauth 0 upgraded, 33 newly installed, 0 to remove and 30 not upgraded. Need to get 12.1 MB of archives. After this operation, 65.8 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libatm1 amd64 1:2.5.1-1.5 [24.2 kB] Get:2 http://archive.ubuntu.com/ubuntu xenial/main amd64 libmnl0 amd64 1.0.3-5 [12.0 kB] Get:3 http://archive.ubuntu.com/ubuntu xenial/main amd64 liblockfile-bin amd64 1.09-6ubuntu1 [10.8 kB] Get:4 http://archive.ubuntu.com/ubuntu xenial/main amd64 liblockfile1 amd64 1.09-6ubuntu1 [8056 B] Get:5 http://archive.ubuntu.com/ubuntu xenial/main amd64 cpio amd64 2.11+dfsg-5ubuntu1 [74.8 kB] Get:6 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 iproute2 amd64 4.3.0-1ubuntu3.16.04.5 [523 kB] Get:7 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 ifupdown amd64 0.8.10ubuntu1.4 [54.9 kB] Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libisc-export160 amd64 1:9.10.3.dfsg.P4-8ubuntu1.15 [153 kB] Get:9 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libdns-export162 amd64 1:9.10.3.dfsg.P4-8ubuntu1.15 [665 kB] Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 isc-dhcp-client amd64 4.3.3-5ubuntu12.10 [224 kB] Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 isc-dhcp-common amd64 4.3.3-5ubuntu12.10 [105 kB] Get:12 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxtables11 amd64 1.6.0-2ubuntu3 [27.2 kB] Get:13 http://archive.ubuntu.com/ubuntu xenial/main amd64 netbase all 5.3 [12.9 kB] Get:14 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxmuu1 amd64 2:1.1.2-2 [9674 B] Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-client amd64 1:7.2p2-4ubuntu2.8 [590 kB] Get:16 http://archive.ubuntu.com/ubuntu xenial/main amd64 xauth amd64 1:1.0.9-1ubuntu2 [22.7 kB] Get:17 http://archive.ubuntu.com/ubuntu xenial/main amd64 ssl-cert all 1.0.37 [16.9 kB] Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 postfix amd64 3.1.0-3ubuntu0.3 [1152 kB] Get:19 http://archive.ubuntu.com/ubuntu xenial/main amd64 bsd-mailx amd64 8.1.2-0.20160123cvs-2 [63.7 kB] Get:20 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-common all 6.2u5-7.4 [156 kB] Get:21 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-client amd64 6.2u5-7.4 [3394 kB] Get:22 http://archive.ubuntu.com/ubuntu xenial/universe amd64 tcsh amd64 6.18.01-5 [410 kB] Get:23 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-exec amd64 6.2u5-7.4 [990 kB] Get:24 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-master amd64 6.2u5-7.4 [2429 kB] Get:25 http://archive.ubuntu.com/ubuntu xenial/main amd64 ncurses-term all 6.0+20160213-1ubuntu1 [249 kB] Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-sftp-server amd64 1:7.2p2-4ubuntu2.8 [38.9 kB] Get:27 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-server amd64 1:7.2p2-4ubuntu2.8 [335 kB] Get:28 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-pkg-resources all 20.7.0-1 [79.0 kB] Get:29 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-chardet all 2.3.0-2 [96.2 kB] Get:30 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-six all 1.10.0-3 [11.0 kB] Get:31 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 python3-urllib3 all 1.13.1-2ubuntu0.16.04.3 [58.5 kB] Get:32 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 python3-requests all 2.9.1-3ubuntu0.1 [55.8 kB] Get:33 http://archive.ubuntu.com/ubuntu xenial/main amd64 ssh-import-id all 5.5-0ubuntu1 [10.2 kB] Fetched 12.1 MB in 0s (15.0 MB/s) Selecting previously unselected package libatm1:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 21398 files and directories currently installed.) Preparing to unpack .../libatm1_1%3a2.5.1-1.5_amd64.deb ... Unpacking libatm1:amd64 (1:2.5.1-1.5) ... Selecting previously unselected package libmnl0:amd64. Preparing to unpack .../libmnl0_1.0.3-5_amd64.deb ... Unpacking libmnl0:amd64 (1.0.3-5) ... Selecting previously unselected package liblockfile-bin. Preparing to unpack .../liblockfile-bin_1.09-6ubuntu1_amd64.deb ... Unpacking liblockfile-bin (1.09-6ubuntu1) ... Selecting previously unselected package liblockfile1:amd64. Preparing to unpack .../liblockfile1_1.09-6ubuntu1_amd64.deb ... Unpacking liblockfile1:amd64 (1.09-6ubuntu1) ... Selecting previously unselected package cpio. Preparing to unpack .../cpio_2.11+dfsg-5ubuntu1_amd64.deb ... Unpacking cpio (2.11+dfsg-5ubuntu1) ... Selecting previously unselected package iproute2. Preparing to unpack .../iproute2_4.3.0-1ubuntu3.16.04.5_amd64.deb ... Unpacking iproute2 (4.3.0-1ubuntu3.16.04.5) ... Selecting previously unselected package ifupdown. Preparing to unpack .../ifupdown_0.8.10ubuntu1.4_amd64.deb ... Unpacking ifupdown (0.8.10ubuntu1.4) ... Selecting previously unselected package libisc-export160. Preparing to unpack .../libisc-export160_1%3a9.10.3.dfsg.P4-8ubuntu1.15_amd64.deb ... Unpacking libisc-export160 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Selecting previously unselected package libdns-export162. Preparing to unpack .../libdns-export162_1%3a9.10.3.dfsg.P4-8ubuntu1.15_amd64.deb ... Unpacking libdns-export162 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Selecting previously unselected package isc-dhcp-client. Preparing to unpack .../isc-dhcp-client_4.3.3-5ubuntu12.10_amd64.deb ... Unpacking isc-dhcp-client (4.3.3-5ubuntu12.10) ... Selecting previously unselected package isc-dhcp-common. Preparing to unpack .../isc-dhcp-common_4.3.3-5ubuntu12.10_amd64.deb ... Unpacking isc-dhcp-common (4.3.3-5ubuntu12.10) ... Selecting previously unselected package libxtables11:amd64. Preparing to unpack .../libxtables11_1.6.0-2ubuntu3_amd64.deb ... Unpacking libxtables11:amd64 (1.6.0-2ubuntu3) ... Selecting previously unselected package netbase. Preparing to unpack .../archives/netbase_5.3_all.deb ... Unpacking netbase (5.3) ... Selecting previously unselected package libxmuu1:amd64. Preparing to unpack .../libxmuu1_2%3a1.1.2-2_amd64.deb ... Unpacking libxmuu1:amd64 (2:1.1.2-2) ... Selecting previously unselected package openssh-client. Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-client (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package xauth. Preparing to unpack .../xauth_1%3a1.0.9-1ubuntu2_amd64.deb ... Unpacking xauth (1:1.0.9-1ubuntu2) ... Selecting previously unselected package ssl-cert. Preparing to unpack .../ssl-cert_1.0.37_all.deb ... Unpacking ssl-cert (1.0.37) ... Selecting previously unselected package postfix. Preparing to unpack .../postfix_3.1.0-3ubuntu0.3_amd64.deb ... Unpacking postfix (3.1.0-3ubuntu0.3) ... Selecting previously unselected package bsd-mailx. Preparing to unpack .../bsd-mailx_8.1.2-0.20160123cvs-2_amd64.deb ... Unpacking bsd-mailx (8.1.2-0.20160123cvs-2) ... Selecting previously unselected package gridengine-common. Preparing to unpack .../gridengine-common_6.2u5-7.4_all.deb ... Unpacking gridengine-common (6.2u5-7.4) ... Selecting previously unselected package gridengine-client. Preparing to unpack .../gridengine-client_6.2u5-7.4_amd64.deb ... Unpacking gridengine-client (6.2u5-7.4) ... Selecting previously unselected package tcsh. Preparing to unpack .../tcsh_6.18.01-5_amd64.deb ... Unpacking tcsh (6.18.01-5) ... Selecting previously unselected package gridengine-exec. Preparing to unpack .../gridengine-exec_6.2u5-7.4_amd64.deb ... Unpacking gridengine-exec (6.2u5-7.4) ... Selecting previously unselected package gridengine-master. Preparing to unpack .../gridengine-master_6.2u5-7.4_amd64.deb ... Unpacking gridengine-master (6.2u5-7.4) ... Selecting previously unselected package ncurses-term. Preparing to unpack .../ncurses-term_6.0+20160213-1ubuntu1_all.deb ... Unpacking ncurses-term (6.0+20160213-1ubuntu1) ... Selecting previously unselected package openssh-sftp-server. Preparing to unpack .../openssh-sftp-server_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-sftp-server (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package openssh-server. Preparing to unpack .../openssh-server_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-server (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package python3-pkg-resources. Preparing to unpack .../python3-pkg-resources_20.7.0-1_all.deb ... Unpacking python3-pkg-resources (20.7.0-1) ... Selecting previously unselected package python3-chardet. Preparing to unpack .../python3-chardet_2.3.0-2_all.deb ... Unpacking python3-chardet (2.3.0-2) ... Selecting previously unselected package python3-six. Preparing to unpack .../python3-six_1.10.0-3_all.deb ... Unpacking python3-six (1.10.0-3) ... Selecting previously unselected package python3-urllib3. Preparing to unpack .../python3-urllib3_1.13.1-2ubuntu0.16.04.3_all.deb ... Unpacking python3-urllib3 (1.13.1-2ubuntu0.16.04.3) ... Selecting previously unselected package python3-requests. Preparing to unpack .../python3-requests_2.9.1-3ubuntu0.1_all.deb ... Unpacking python3-requests (2.9.1-3ubuntu0.1) ... Selecting previously unselected package ssh-import-id. Preparing to unpack .../ssh-import-id_5.5-0ubuntu1_all.deb ... Unpacking ssh-import-id (5.5-0ubuntu1) ... Processing triggers for systemd (229-4ubuntu21.22) ... Processing triggers for libc-bin (2.23-0ubuntu11) ... Setting up libatm1:amd64 (1:2.5.1-1.5) ... Setting up libmnl0:amd64 (1.0.3-5) ... Setting up liblockfile-bin (1.09-6ubuntu1) ... Setting up liblockfile1:amd64 (1.09-6ubuntu1) ... Setting up cpio (2.11+dfsg-5ubuntu1) ... update-alternatives: using /bin/mt-gnu to provide /bin/mt (mt) in auto mode Setting up iproute2 (4.3.0-1ubuntu3.16.04.5) ... Setting up ifupdown (0.8.10ubuntu1.4) ... Creating /etc/network/interfaces. Setting up libisc-export160 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Setting up libdns-export162 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Setting up isc-dhcp-client (4.3.3-5ubuntu12.10) ... Setting up isc-dhcp-common (4.3.3-5ubuntu12.10) ... Setting up libxtables11:amd64 (1.6.0-2ubuntu3) ... Setting up netbase (5.3) ... Setting up libxmuu1:amd64 (2:1.1.2-2) ... Setting up openssh-client (1:7.2p2-4ubuntu2.8) ... Setting up xauth (1:1.0.9-1ubuntu2) ... Setting up ssl-cert (1.0.37) ... Setting up postfix (3.1.0-3ubuntu0.3) ... Creating /etc/postfix/dynamicmaps.cf setting myhostname: master-0.XXX setting alias maps setting alias database changing /etc/mailname to master-0.XXX setting myorigin setting destinations: $myhostname, master-0.XXX, localhost.XXX, , localhost setting relayhost: setting mynetworks: 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128 setting mailbox_size_limit: 0 setting recipient_delimiter: + setting inet_interfaces: all setting inet_protocols: all /etc/aliases does not exist, creating it. WARNING: /etc/aliases exists, but does not have a root alias. Postfix is now set up with a default configuration. If you need to make changes, edit /etc/postfix/main.cf (and others) as needed. To view Postfix configuration values, see postconf(1). After modifying main.cf, be sure to run '/etc/init.d/postfix reload'. Running newaliases invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of restart. Setting up bsd-mailx (8.1.2-0.20160123cvs-2) ... update-alternatives: using /usr/bin/bsd-mailx to provide /usr/bin/mailx (mailx) in auto mode Setting up gridengine-common (6.2u5-7.4) ... Creating config file /etc/default/gridengine with new version Setting up gridengine-client (6.2u5-7.4) ... Setting up tcsh (6.18.01-5) ... update-alternatives: using /bin/tcsh to provide /bin/csh (csh) in auto mode Setting up gridengine-exec (6.2u5-7.4) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Setting up gridengine-master (6.2u5-7.4) ... su: Authentication failure (Ignored) Initializing cluster with the following parameters: =&gt; SGE_ROOT: /var/lib/gridengine =&gt; SGE_CELL: default =&gt; Spool directory: /var/spool/gridengine/spooldb =&gt; Initial manager user: sgeadmin Initializing spool (/var/spool/gridengine/spooldb) Initializing global configuration based on /usr/share/gridengine/default-configuration Initializing complexes based on /usr/share/gridengine/centry Initializing usersets based on /usr/share/gridengine/usersets Adding user sgeadmin as a manager Cluster creation complete invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Setting up ncurses-term (6.0+20160213-1ubuntu1) ... Setting up openssh-sftp-server (1:7.2p2-4ubuntu2.8) ... Setting up openssh-server (1:7.2p2-4ubuntu2.8) ... Creating SSH2 RSA key; this may take some time ... 2048 SHA256:hfQpES1aS4cjF8AOCIParZR6342vdwutoyITru0wtuE root@master-0.XXX (RSA) Creating SSH2 DSA key; this may take some time ... 1024 SHA256:gOsPMVgwXBHJzixN/gtJAG+hVCHqw8t7Fhy4nsx8od0 root@master-0.XXX (DSA) Creating SSH2 ECDSA key; this may take some time ... 256 SHA256:3D5SNniUb4z+/BuqXheFgG+DfjsxXqTT/zwWAqdX4jM root@master-0.XXX (ECDSA) Creating SSH2 ED25519 key; this may take some time ... 256 SHA256:SwyeV9iSqOW4TKLi4Wvc0zD8lWtupHCJpDu8oWBwbfU root@master-0.XXX (ED25519) invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Setting up python3-pkg-resources (20.7.0-1) ... Setting up python3-chardet (2.3.0-2) ... Setting up python3-six (1.10.0-3) ... Setting up python3-urllib3 (1.13.1-2ubuntu0.16.04.3) ... Setting up python3-requests (2.9.1-3ubuntu0.1) ... Setting up ssh-import-id (5.5-0ubuntu1) ... Processing triggers for libc-bin (2.23-0ubuntu11) ... Processing triggers for systemd (229-4ubuntu21.22) ... Reading package lists... Building dependency tree... Reading state information... 0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.  cat $SGE_CFG_PATH/setcfg.log finish master add worker node worker-0.XXX  Sample output of worker: please wait Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: bsd-mailx cpio gridengine-common ifupdown iproute2 isc-dhcp-client isc-dhcp-common libatm1 libdns-export162 libisc-export160 liblockfile-bin liblockfile1 libmnl0 libxmuu1 libxtables11 ncurses-term netbase openssh-client openssh-server openssh-sftp-server postfix python3-chardet python3-pkg-resources python3-requests python3-six python3-urllib3 ssh-import-id ssl-cert tcsh xauth Suggested packages: libarchive1 gridengine-qmon ppp rdnssd iproute2-doc resolvconf avahi-autoipd isc-dhcp-client-ddns apparmor ssh-askpass libpam-ssh keychain monkeysphere rssh molly-guard ufw procmail postfix-mysql postfix-pgsql postfix-ldap postfix-pcre sasl2-bin libsasl2-modules dovecot-common postfix-cdb postfix-doc python3-setuptools python3-ndg-httpsclient python3-openssl python3-pyasn1 openssl-blacklist The following NEW packages will be installed: bsd-mailx cpio gridengine-client gridengine-common gridengine-exec ifupdown iproute2 isc-dhcp-client isc-dhcp-common libatm1 libdns-export162 libisc-export160 liblockfile-bin liblockfile1 libmnl0 libxmuu1 libxtables11 ncurses-term netbase openssh-client openssh-server openssh-sftp-server postfix python3-chardet python3-pkg-resources python3-requests python3-six python3-urllib3 ssh-import-id ssl-cert tcsh xauth 0 upgraded, 32 newly installed, 0 to remove and 30 not upgraded. Need to get 9633 kB of archives. After this operation, 51.2 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libatm1 amd64 1:2.5.1-1.5 [24.2 kB] Get:2 http://archive.ubuntu.com/ubuntu xenial/main amd64 libmnl0 amd64 1.0.3-5 [12.0 kB] Get:3 http://archive.ubuntu.com/ubuntu xenial/main amd64 liblockfile-bin amd64 1.09-6ubuntu1 [10.8 kB] Get:4 http://archive.ubuntu.com/ubuntu xenial/main amd64 liblockfile1 amd64 1.09-6ubuntu1 [8056 B] Get:5 http://archive.ubuntu.com/ubuntu xenial/main amd64 cpio amd64 2.11+dfsg-5ubuntu1 [74.8 kB] Get:6 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 iproute2 amd64 4.3.0-1ubuntu3.16.04.5 [523 kB] Get:7 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 ifupdown amd64 0.8.10ubuntu1.4 [54.9 kB] Get:8 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libisc-export160 amd64 1:9.10.3.dfsg.P4-8ubuntu1.15 [153 kB] Get:9 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 libdns-export162 amd64 1:9.10.3.dfsg.P4-8ubuntu1.15 [665 kB] Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 isc-dhcp-client amd64 4.3.3-5ubuntu12.10 [224 kB] Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 isc-dhcp-common amd64 4.3.3-5ubuntu12.10 [105 kB] Get:12 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxtables11 amd64 1.6.0-2ubuntu3 [27.2 kB] Get:13 http://archive.ubuntu.com/ubuntu xenial/main amd64 netbase all 5.3 [12.9 kB] Get:14 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxmuu1 amd64 2:1.1.2-2 [9674 B] Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-client amd64 1:7.2p2-4ubuntu2.8 [590 kB] Get:16 http://archive.ubuntu.com/ubuntu xenial/main amd64 xauth amd64 1:1.0.9-1ubuntu2 [22.7 kB] Get:17 http://archive.ubuntu.com/ubuntu xenial/main amd64 ssl-cert all 1.0.37 [16.9 kB] Get:18 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 postfix amd64 3.1.0-3ubuntu0.3 [1152 kB] Get:19 http://archive.ubuntu.com/ubuntu xenial/main amd64 bsd-mailx amd64 8.1.2-0.20160123cvs-2 [63.7 kB] Get:20 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-common all 6.2u5-7.4 [156 kB] Get:21 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-client amd64 6.2u5-7.4 [3394 kB] Get:22 http://archive.ubuntu.com/ubuntu xenial/universe amd64 tcsh amd64 6.18.01-5 [410 kB] Get:23 http://archive.ubuntu.com/ubuntu xenial/universe amd64 gridengine-exec amd64 6.2u5-7.4 [990 kB] Get:24 http://archive.ubuntu.com/ubuntu xenial/main amd64 ncurses-term all 6.0+20160213-1ubuntu1 [249 kB] Get:25 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-sftp-server amd64 1:7.2p2-4ubuntu2.8 [38.9 kB] Get:26 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-server amd64 1:7.2p2-4ubuntu2.8 [335 kB] Get:27 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-pkg-resources all 20.7.0-1 [79.0 kB] Get:28 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-chardet all 2.3.0-2 [96.2 kB] Get:29 http://archive.ubuntu.com/ubuntu xenial/main amd64 python3-six all 1.10.0-3 [11.0 kB] Get:30 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 python3-urllib3 all 1.13.1-2ubuntu0.16.04.3 [58.5 kB] Get:31 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 python3-requests all 2.9.1-3ubuntu0.1 [55.8 kB] Get:32 http://archive.ubuntu.com/ubuntu xenial/main amd64 ssh-import-id all 5.5-0ubuntu1 [10.2 kB] Fetched 9633 kB in 2s (4496 kB/s) Selecting previously unselected package libatm1:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 21398 files and directories currently installed.) Preparing to unpack .../libatm1_1%3a2.5.1-1.5_amd64.deb ... Unpacking libatm1:amd64 (1:2.5.1-1.5) ... Selecting previously unselected package libmnl0:amd64. Preparing to unpack .../libmnl0_1.0.3-5_amd64.deb ... Unpacking libmnl0:amd64 (1.0.3-5) ... Selecting previously unselected package liblockfile-bin. Preparing to unpack .../liblockfile-bin_1.09-6ubuntu1_amd64.deb ... Unpacking liblockfile-bin (1.09-6ubuntu1) ... Selecting previously unselected package liblockfile1:amd64. Preparing to unpack .../liblockfile1_1.09-6ubuntu1_amd64.deb ... Unpacking liblockfile1:amd64 (1.09-6ubuntu1) ... Selecting previously unselected package cpio. Preparing to unpack .../cpio_2.11+dfsg-5ubuntu1_amd64.deb ... Unpacking cpio (2.11+dfsg-5ubuntu1) ... Selecting previously unselected package iproute2. Preparing to unpack .../iproute2_4.3.0-1ubuntu3.16.04.5_amd64.deb ... Unpacking iproute2 (4.3.0-1ubuntu3.16.04.5) ... Selecting previously unselected package ifupdown. Preparing to unpack .../ifupdown_0.8.10ubuntu1.4_amd64.deb ... Unpacking ifupdown (0.8.10ubuntu1.4) ... Selecting previously unselected package libisc-export160. Preparing to unpack .../libisc-export160_1%3a9.10.3.dfsg.P4-8ubuntu1.15_amd64.deb ... Unpacking libisc-export160 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Selecting previously unselected package libdns-export162. Preparing to unpack .../libdns-export162_1%3a9.10.3.dfsg.P4-8ubuntu1.15_amd64.deb ... Unpacking libdns-export162 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Selecting previously unselected package isc-dhcp-client. Preparing to unpack .../isc-dhcp-client_4.3.3-5ubuntu12.10_amd64.deb ... Unpacking isc-dhcp-client (4.3.3-5ubuntu12.10) ... Selecting previously unselected package isc-dhcp-common. Preparing to unpack .../isc-dhcp-common_4.3.3-5ubuntu12.10_amd64.deb ... Unpacking isc-dhcp-common (4.3.3-5ubuntu12.10) ... Selecting previously unselected package libxtables11:amd64. Preparing to unpack .../libxtables11_1.6.0-2ubuntu3_amd64.deb ... Unpacking libxtables11:amd64 (1.6.0-2ubuntu3) ... Selecting previously unselected package netbase. Preparing to unpack .../archives/netbase_5.3_all.deb ... Unpacking netbase (5.3) ... Selecting previously unselected package libxmuu1:amd64. Preparing to unpack .../libxmuu1_2%3a1.1.2-2_amd64.deb ... Unpacking libxmuu1:amd64 (2:1.1.2-2) ... Selecting previously unselected package openssh-client. Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-client (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package xauth. Preparing to unpack .../xauth_1%3a1.0.9-1ubuntu2_amd64.deb ... Unpacking xauth (1:1.0.9-1ubuntu2) ... Selecting previously unselected package ssl-cert. Preparing to unpack .../ssl-cert_1.0.37_all.deb ... Unpacking ssl-cert (1.0.37) ... Selecting previously unselected package postfix. Preparing to unpack .../postfix_3.1.0-3ubuntu0.3_amd64.deb ... Unpacking postfix (3.1.0-3ubuntu0.3) ... Selecting previously unselected package bsd-mailx. Preparing to unpack .../bsd-mailx_8.1.2-0.20160123cvs-2_amd64.deb ... Unpacking bsd-mailx (8.1.2-0.20160123cvs-2) ... Selecting previously unselected package gridengine-common. Preparing to unpack .../gridengine-common_6.2u5-7.4_all.deb ... Unpacking gridengine-common (6.2u5-7.4) ... Selecting previously unselected package gridengine-client. Preparing to unpack .../gridengine-client_6.2u5-7.4_amd64.deb ... Unpacking gridengine-client (6.2u5-7.4) ... Selecting previously unselected package tcsh. Preparing to unpack .../tcsh_6.18.01-5_amd64.deb ... Unpacking tcsh (6.18.01-5) ... Selecting previously unselected package gridengine-exec. Preparing to unpack .../gridengine-exec_6.2u5-7.4_amd64.deb ... Unpacking gridengine-exec (6.2u5-7.4) ... Selecting previously unselected package ncurses-term. Preparing to unpack .../ncurses-term_6.0+20160213-1ubuntu1_all.deb ... Unpacking ncurses-term (6.0+20160213-1ubuntu1) ... Selecting previously unselected package openssh-sftp-server. Preparing to unpack .../openssh-sftp-server_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-sftp-server (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package openssh-server. Preparing to unpack .../openssh-server_1%3a7.2p2-4ubuntu2.8_amd64.deb ... Unpacking openssh-server (1:7.2p2-4ubuntu2.8) ... Selecting previously unselected package python3-pkg-resources. Preparing to unpack .../python3-pkg-resources_20.7.0-1_all.deb ... Unpacking python3-pkg-resources (20.7.0-1) ... Selecting previously unselected package python3-chardet. Preparing to unpack .../python3-chardet_2.3.0-2_all.deb ... Unpacking python3-chardet (2.3.0-2) ... Selecting previously unselected package python3-six. Preparing to unpack .../python3-six_1.10.0-3_all.deb ... Unpacking python3-six (1.10.0-3) ... Selecting previously unselected package python3-urllib3. Preparing to unpack .../python3-urllib3_1.13.1-2ubuntu0.16.04.3_all.deb ... Unpacking python3-urllib3 (1.13.1-2ubuntu0.16.04.3) ... Selecting previously unselected package python3-requests. Preparing to unpack .../python3-requests_2.9.1-3ubuntu0.1_all.deb ... Unpacking python3-requests (2.9.1-3ubuntu0.1) ... Selecting previously unselected package ssh-import-id. Preparing to unpack .../ssh-import-id_5.5-0ubuntu1_all.deb ... Unpacking ssh-import-id (5.5-0ubuntu1) ... Processing triggers for systemd (229-4ubuntu21.22) ... Processing triggers for libc-bin (2.23-0ubuntu11) ... Setting up libatm1:amd64 (1:2.5.1-1.5) ... Setting up libmnl0:amd64 (1.0.3-5) ... Setting up liblockfile-bin (1.09-6ubuntu1) ... Setting up liblockfile1:amd64 (1.09-6ubuntu1) ... Setting up cpio (2.11+dfsg-5ubuntu1) ... update-alternatives: using /bin/mt-gnu to provide /bin/mt (mt) in auto mode Setting up iproute2 (4.3.0-1ubuntu3.16.04.5) ... Setting up ifupdown (0.8.10ubuntu1.4) ... Creating /etc/network/interfaces. Setting up libisc-export160 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Setting up libdns-export162 (1:9.10.3.dfsg.P4-8ubuntu1.15) ... Setting up isc-dhcp-client (4.3.3-5ubuntu12.10) ... Setting up isc-dhcp-common (4.3.3-5ubuntu12.10) ... Setting up libxtables11:amd64 (1.6.0-2ubuntu3) ... Setting up netbase (5.3) ... Setting up libxmuu1:amd64 (2:1.1.2-2) ... Setting up openssh-client (1:7.2p2-4ubuntu2.8) ... Setting up xauth (1:1.0.9-1ubuntu2) ... Setting up ssl-cert (1.0.37) ... Setting up postfix (3.1.0-3ubuntu0.3) ... Creating /etc/postfix/dynamicmaps.cf setting myhostname: worker-0.XXX setting alias maps setting alias database changing /etc/mailname to worker-0.XXX setting myorigin setting destinations: $myhostname, worker-0.XXX, localhost.XXX, , localhost setting relayhost: setting mynetworks: 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128 setting mailbox_size_limit: 0 setting recipient_delimiter: + setting inet_interfaces: all setting inet_protocols: all /etc/aliases does not exist, creating it. WARNING: /etc/aliases exists, but does not have a root alias. Postfix is now set up with a default configuration. If you need to make changes, edit /etc/postfix/main.cf (and others) as needed. To view Postfix configuration values, see postconf(1). After modifying main.cf, be sure to run '/etc/init.d/postfix reload'. Running newaliases invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of restart. Setting up bsd-mailx (8.1.2-0.20160123cvs-2) ... update-alternatives: using /usr/bin/bsd-mailx to provide /usr/bin/mailx (mailx) in auto mode Setting up gridengine-common (6.2u5-7.4) ... Creating config file /etc/default/gridengine with new version Setting up gridengine-client (6.2u5-7.4) ... Setting up tcsh (6.18.01-5) ... update-alternatives: using /bin/tcsh to provide /bin/csh (csh) in auto mode Setting up gridengine-exec (6.2u5-7.4) ... invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Setting up ncurses-term (6.0+20160213-1ubuntu1) ... Setting up openssh-sftp-server (1:7.2p2-4ubuntu2.8) ... Setting up openssh-server (1:7.2p2-4ubuntu2.8) ... Creating SSH2 RSA key; this may take some time ... 2048 SHA256:ok/TxzwtF5W8I55sDxrt4Agy4fuWn39BiSovvDObhVE root@worker-0.XXX (RSA) Creating SSH2 DSA key; this may take some time ... 1024 SHA256:4y48kVYt3mS3q1KgZzEoYMnS/2d/tA8TJUK5uNSaxZY root@worker-0.XXX (DSA) Creating SSH2 ECDSA key; this may take some time ... 256 SHA256:4D7zm4cD2IbDnHoXnzcIo3FISbvOW8eOstGBNf1/bvo root@worker-0.XXX (ECDSA) Creating SSH2 ED25519 key; this may take some time ... 256 SHA256:/HrA3xiZiH5CZkXwtcfE6GwcMM+hEhZzTdFHxj4PzDg root@worker-0.XXX (ED25519) invoke-rc.d: could not determine current runlevel invoke-rc.d: policy-rc.d denied execution of start. Setting up python3-pkg-resources (20.7.0-1) ... Setting up python3-chardet (2.3.0-2) ... Setting up python3-six (1.10.0-3) ... Setting up python3-urllib3 (1.13.1-2ubuntu0.16.04.3) ... Setting up python3-requests (2.9.1-3ubuntu0.1) ... Setting up ssh-import-id (5.5-0ubuntu1) ... Processing triggers for libc-bin (2.23-0ubuntu11) ... Processing triggers for systemd (229-4ubuntu21.22) ... Reading package lists... Building dependency tree... Reading state information... 0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.  cat $SGE_CFG_PATH/setcfg.log please wait Start SGE for worker is finished done for worker-0.XXX worker.  Sample output of sge:  "},{"title":"Deploy Submarine with Helm","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/helm","content":"","keywords":""},{"title":"Prerequisite​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#prerequisite","content":"Install Helm v3: https://helm.sh/docs/intro/install/ A Kubernetes environment (ex: minikube or kind) "},{"title":"Deploy Submarine to Kubernetes​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#deploy-submarine-to-kubernetes","content":"git clone https://github.com/apache/submarine.git cd submarine helm install submarine ./helm-charts/submarine  With these commands, the Submarine service will be deployed to the &quot;default&quot; namespace.The first time installation will take about 10 mins because the docker images are pulled from apache/submarine on DockerHub. "},{"title":"Verify installation​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#verify-installation","content":"kubectl get all  TODO: screenshot "},{"title":"Uninstall Submarine​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#uninstall-submarine","content":"helm uninstall submarine # Check helm ls  Helm chart configuation (values.yaml) "},{"title":"Volume Type​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#volume-type","content":"Submarine can support various volume types, currently including hostPath (default) and NFS. It can be easily configured in the ./helm-charts/submarine/values.yaml, or you can override the default values in values.yaml by helm CLI. hostPath​ In hostPath, you can store data directly in your node.Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable hostPath storage, set .storage.type to host.To set the root path for your storage, set .storage.host.root to &lt;any-path&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: host host: root: /tmp  NFS (Network File System)​ In NFS, it allows multiple clients to access a shared space.Prerequisite: A pre-existing NFS server. You have two options. Create NFS server kubectl create -f ./dev-support/nfs-server/nfs-server.yaml It will create a nfs-server pod in kubernetes cluster, and expose nfs-server ip at 10.96.0.2Use your own NFS server Install NFS dependencies in your nodes Ubuntu apt-get install -y nfs-common CentOS yum install nfs-util Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable NFS storage, set .storage.type to nfs.To set the ip for NFS server, set .storage.nfs.ip to &lt;any-ip&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: nfs nfs: ip: 10.96.0.2  "},{"title":"Access to Submarine Server​","type":1,"pageTitle":"Deploy Submarine with Helm","url":"docs/0.6.0/gettingStarted/helm#access-to-submarine-server","content":"Submarine server by default expose 8080 port within K8s cluster. After Submarine v0.5 uses Traefik as reverse-proxy by default. If you don't want to use Traefik, you can modify below value to false in ./helm-charts/submarine/values.yaml. # Use Traefik by default traefik: enabled: true  To access the server from outside of the cluster, we use Traefik ingress controller and NodePort for external access.\\ Please refer to ./helm-charts/submarine/charts/traefik/values.yaml and Traefik docsfor more details if you want to customize the default value for Traefik. Notice:If you use kind to run local Kubernetes cluster, please refer to this docsand set the configuration &quot;extraPortMappings&quot; when creating the k8s cluster. kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraPortMappings: - containerPort: 32080 hostPort: [the port you want to access]  # Use nodePort and Traefik ingress controller by default. # To access the submarine server, open the following URL in your browser. http://127.0.0.1:32080  If minikube is installed, use the following command to find the URL to the Submarine server. $ minikube service submarine-traefik --url  "},{"title":"Setup a Kubernetes cluster using KinD","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/kind","content":"","keywords":""},{"title":"Create Kubernetes cluster with KinD​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#create-kubernetes-cluster-with-kind","content":"We recommend users developing Submarine with minikube. However, KinD is also an option to setup a Kubernetes cluster on your local machine. Run the following command, and specify the KinD version and Kubernetes version here. # Download the specific version of KinD (must &gt;= v0.6.0) export KIND_VERSION=v0.11.1 curl -Lo ./kind https://github.com/kubernetes-sigs/kind/releases/download/${KIND_VERSION}/kind-linux-amd64 # Make the binary executable chmod +x ./kind # Move the binary to your executable path sudo mv ./kind /usr/local/bin/ # Create cluster with specific version of kubernetes export KUBE_VERSION=v1.15.12 kind create cluster --image kindest/node:${KUBE_VERSION}  "},{"title":"Kubernetes Dashboard (optional)​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#kubernetes-dashboard-optional","content":""},{"title":"Deploy​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#deploy","content":"To deploy Dashboard, execute the following command: kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml  "},{"title":"Create RBAC​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#create-rbac","content":"Run the following commands to grant the cluster access permission of dashboard: kubectl create serviceaccount dashboard-admin-sa kubectl create clusterrolebinding dashboard-admin-sa --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin-sa  "},{"title":"Get access token (optional)​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#get-access-token-optional","content":"If you want to use the token to login the dashboard, run the following commands to get key: kubectl get secrets # select the right dashboard-admin-sa-token to describe the secret kubectl describe secret dashboard-admin-sa-token-6nhkx  "},{"title":"Start dashboard service​","type":1,"pageTitle":"Setup a Kubernetes cluster using KinD","url":"docs/0.6.0/gettingStarted/kind#start-dashboard-service","content":"kubectl proxy  Now access Dashboard at: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ Dashboard screenshot:  "},{"title":"Submarine Local Deployment","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/localDeployment","content":"","keywords":""},{"title":"Prerequisite​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#prerequisite","content":"kubectlhelm (Helm v3 is minimum requirement.)minikube. "},{"title":"Deploy Kubernetes Cluster​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#deploy-kubernetes-cluster","content":"$ minikube start --vm-driver=docker --cpus 8 --memory 4096 --disk-size=20G --kubernetes-version v1.15.11  "},{"title":"Install Submarine on Kubernetes​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#install-submarine-on-kubernetes","content":"$ git clone https://github.com/apache/submarine.git $ cd submarine $ helm install submarine ./helm-charts/submarine  NAME: submarine LAST DEPLOYED: Fri Jan 29 05:35:36 2021 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None  "},{"title":"Verify installation​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#verify-installation","content":"Once you got it installed, check with below commands and you should see similar outputs: $ kubectl get pods  NAME READY STATUS RESTARTS AGE notebook-controller-deployment-5db8b6cbf7-k65jm 1/1 Running 0 5s pytorch-operator-7ff5d96d59-gx7f5 1/1 Running 0 5s submarine-database-8d95d74f7-ntvqp 1/1 Running 0 5s submarine-server-b6cd4787b-7bvr7 1/1 Running 0 5s submarine-traefik-9bb6f8577-66sx6 1/1 Running 0 5s tf-job-operator-7844656dd-lfgmd 1/1 Running 0 5s  warning Note that if you encounter below issue when installation: Error: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: namespace: , name: podgroups.scheduling.incubator.k8s.io, existing_kind: apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition, new_kind: apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition  It might be caused by the previous installed submarine charts. Fix it by running: $ kubectl delete crd/tfjobs.kubeflow.org &amp;&amp; kubectl delete crd/podgroups.scheduling.incubator.k8s.io &amp;&amp; kubectl delete crd/pytorchjobs.kubeflow.org  "},{"title":"Access Submarine in a Cluster​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#access-submarine-in-a-cluster","content":"# #Listen on port 32080 on all addresses, forwarding to 80 in the pod # Method1 -- using minikube ip + NodePort $ minikube ip # you'll get the IP address of minikube, ex: 192.168.49.2 # Method2 -- using port-forwarding $ kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80  "},{"title":"Open Workbench in the browser.​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#open-workbench-in-the-browser","content":"Open http://{minikube ip}:32080(from Method1), ex: http://192.168.49.2:32080 or http://127.0.0.1:32080 (from Method 2). The default username and password is admin and admin  "},{"title":"Uninstall Submarine​","type":1,"pageTitle":"Submarine Local Deployment","url":"docs/0.6.0/gettingStarted/localDeployment#uninstall-submarine","content":"$ helm delete submarine  "},{"title":"Jupyter Notebook","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/notebook","content":"","keywords":""},{"title":"Working with notebooks​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/0.6.0/gettingStarted/notebook#working-with-notebooks","content":"We recommend using Web UI to manage notebooks. "},{"title":"Notebooks Web UI​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/0.6.0/gettingStarted/notebook#notebooks-web-ui","content":"Notebooks can be started from the Web UI. You can click the “Notebook” tab in the left-hand panel to manage your notebooks.  To create a new notebook server, click “New Notebook”. You should see a form for entering details of your new notebook server. Notebook Name : Name of the notebook server. It should follow the rules below. Contain at most 63 characters.Contain only lowercase alphanumeric characters or '-'.Start with an alphabetic character.End with an alphanumeric character. Environment : It defines a set of libraries and docker image.CPU and MemoryGPU (optional)EnvVar (optional) : Injects environment variables into the notebook. If you’re not sure which environment you need, please choose the environment “notebook-env” for the new notebook.  You should see your new notebook server. Click the name of your notebook server to connect to it.  "},{"title":"Experiment with your notebook​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/0.6.0/gettingStarted/notebook#experiment-with-your-notebook","content":"The environment “notebook-env” includes Submarine Python SDK which can talk to Submarine Server to create experiments, as the example below: from __future__ import print_function import submarine from submarine.experiment.models.environment_spec import EnvironmentSpec from submarine.experiment.models.experiment_spec import ExperimentSpec from submarine.experiment.models.experiment_task_spec import ExperimentTaskSpec from submarine.experiment.models.experiment_meta import ExperimentMeta from submarine.experiment.models.code_spec import CodeSpec # Create Submarine Client submarine_client = submarine.ExperimentClient() # Define TensorFlow experiment spec environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0') experiment_meta = ExperimentMeta(name='mnist-dist', namespace='default', framework='Tensorflow', cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100', env_vars={'ENV1': 'ENV1'}) worker_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) ps_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) code_spec = CodeSpec(sync_mode='git', url='https://github.com/apache/submarine.git') experiment_spec = ExperimentSpec(meta=experiment_meta, environment=environment, code=code_spec, spec={'Ps' : ps_spec,'Worker': worker_spec}) # Create experiment experiment = submarine_client.create_experiment(experiment_spec=experiment_spec)  You can create a new notebook, paste the above code and run it. Or, you can find the notebook submarine_experiment_sdk.ipynb inside the launched notebook session. You can open it, try it out. After experiment submitted to Submarine server, you can find the experiment jobs on the UI. "},{"title":"Submarine Python SDK","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/python-sdk","content":"","keywords":""},{"title":"Prepare Python Environment to run Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#prepare-python-environment-to-run-submarine-sdk","content":"Submarine SDK requires Python3.7+. It's better to use a new Python environment created by Anoconda or Python virtualenv to try this to avoid trouble to existing Python environment. A sample Python virtual env can be setup like this: wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz # Make sure to install using Python 3 python3 virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate  "},{"title":"Install Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#install-submarine-sdk","content":""},{"title":"Install SDK from pypi.org (recommended)​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#install-sdk-from-pypiorg-recommended","content":"Starting from 0.4.0, Submarine provides Python SDK. Please change it to a proper version needed. More detail: https://pypi.org/project/apache-submarine/ # Install latest stable version pip install apache-submarine # Install specific version pip install apache-submarine==&lt;REPLACE_VERSION&gt;  "},{"title":"Install SDK from source code​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#install-sdk-from-source-code","content":"Please first clone code from github or go to http://submarine.apache.org/download.html to download released source code. git clone https://github.com/apache/submarine.git # (optional) chackout specific branch or release git checkout &lt;correct release tag/branch&gt; cd submarine/submarine-sdk/pysubmarine pip install .  "},{"title":"Manage Submarine Experiment​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#manage-submarine-experiment","content":"Assuming you've installed submarine on K8s and forward the traefik service to localhost, now you can open a Python shell, Jupyter notebook or any tools with Submarine SDK installed. Follow SDK experiment example to run an experiment. "},{"title":"Training a DeepFM model​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/0.6.0/gettingStarted/python-sdk#training-a-deepfm-model","content":"The Submarine also supports users to train an easy-to-use CTR model with a few lines of code and a configuration file, so they don’t need to reimplement the model by themself. In addition, they can train the model on both local on distributed systems, such as Hadoop or Kubernetes. Follow SDK DeepFM example to try the model. "},{"title":"Quickstart","type":0,"sectionRef":"#","url":"docs/0.6.0/gettingStarted/quickstart","content":"","keywords":""},{"title":"Installation​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#installation","content":""},{"title":"Prepare a Kubernetes cluster​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#prepare-a-kubernetes-cluster","content":"Prerequisite Check dependency page for the compatible versionkubectlhelm (Helm v3 is minimum requirement.)minikube. Start minikube cluster $ minikube start --vm-driver=docker --cpus 8 --memory 4096 --kubernetes-version v1.15.11  "},{"title":"Launch submarine in the cluster​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#launch-submarine-in-the-cluster","content":"Clone the project $ git clone https://github.com/apache/submarine.git  Install the resources by helm chart $ cd submarine $ helm install submarine ./helm-charts/submarine  "},{"title":"Ensure submarine is ready​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#ensure-submarine-is-ready","content":"Use kubectl to query the status of pods $ kubectl get pods  Make sure each pod is Running NAME READY STATUS RESTARTS AGE notebook-controller-deployment-5d4f5f874c-vwds8 1/1 Running 0 3h33m pytorch-operator-844c866d54-q5ztd 1/1 Running 0 3h33m submarine-database-674987ff7d-r8zqs 1/1 Running 0 3h33m submarine-minio-5fdd957785-xd987 1/1 Running 0 3h33m submarine-mlflow-76bbf5c7b-g2ntd 1/1 Running 0 3h33m submarine-server-66f7b8658b-sfmv8 1/1 Running 0 3h33m submarine-tensorboard-6c44944dfb-tvbr9 1/1 Running 0 3h33m submarine-traefik-7cbcfd4bd9-4bczn 1/1 Running 0 3h33m tf-job-operator-6bb69fd44-mc8ww 1/1 Running 0 3h33m  "},{"title":"Connect to workbench​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#connect-to-workbench","content":"Port-forwarding # using port-forwarding $ kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80  Open http://0.0.0.0:32080  "},{"title":"Example: Submit a mnist distributed example​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#example-submit-a-mnist-distributed-example","content":"We put the code of this example here. train.py is our training script, and build.sh is the script to build a docker image. "},{"title":"1. Write a python script for distributed training​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#1-write-a-python-script-for-distributed-training","content":"Take a simple mnist tensorflow script as an example. We choose MultiWorkerMirroredStrategy as our distributed strategy. &quot;&quot;&quot; ./dev-support/examples/quickstart/train.py Reference: https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py &quot;&quot;&quot; import tensorflow_datasets as tfds import tensorflow as tf from tensorflow.keras import layers, models from submarine import ModelsClient def make_datasets_unbatched(): BUFFER_SIZE = 10000 # Scaling MNIST data from (0, 255] to (0., 1.] def scale(image, label): image = tf.cast(image, tf.float32) image /= 255 return image, label datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True) return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE) def build_and_compile_cnn_model(): model = models.Sequential() model.add( layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) return model def main(): strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy( communication=tf.distribute.experimental.CollectiveCommunication.AUTO) BATCH_SIZE_PER_REPLICA = 4 BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync with strategy.scope(): ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat() options = tf.data.Options() options.experimental_distribute.auto_shard_policy = \\ tf.data.experimental.AutoShardPolicy.DATA ds_train = ds_train.with_options(options) # Model building/compiling need to be within `strategy.scope()`. multi_worker_model = build_and_compile_cnn_model() class MyCallback(tf.keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): # monitor the loss and accuracy print(logs) modelClient.log_metrics({&quot;loss&quot;: logs[&quot;loss&quot;], &quot;accuracy&quot;: logs[&quot;accuracy&quot;]}, epoch) with modelClient.start() as run: multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70, callbacks=[MyCallback()]) if __name__ == '__main__': modelClient = ModelsClient() main()  "},{"title":"2. Prepare an environment compatible with the training​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#2-prepare-an-environment-compatible-with-the-training","content":"Build a docker image equipped with the requirement of the environment. $ ./dev-support/examples/quickstart/build.sh  "},{"title":"3. Submit the experiment​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#3-submit-the-experiment","content":"Open submarine workbench and click + New Experiment Fill the form accordingly. Here we set 3 workers. Step 1Step 2Step 3The experiment is successfully submitted "},{"title":"4. Monitor the process (modelClient)​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#4-monitor-the-process-modelclient","content":"In our code, we use modelClient from submarine-sdk to record the metrics. To see the result, click MLflow UI in the workbench. To compare the metrics of each worker, you can select all workers and then click compare "},{"title":"5. Serve the model (In development)​","type":1,"pageTitle":"Quickstart","url":"docs/0.6.0/gettingStarted/quickstart#5-serve-the-model-in-development","content":""},{"title":"Environment REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/api/environment","content":"","keywords":""},{"title":"Create Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#create-environment","content":"POST /api/v1/environment  "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#parameters","content":"Put EnvironmentSpec in request body. EnvironmentSpec​ Field Name\tType\tDescriptionname\tString\tEnvironment name. dockerImage\tString\tDocker image name. kernelSpec\tKernelSpec\tEnvironment spec. description\tString\tDescription of environment. KernelSpec​ Field Name\tType\tDescriptionname\tString\tKernel name. channels\tList&lt;String&gt;\tNames of the channels. condaDependencies\tList&lt;String&gt;\tList of kernel conda dependencies. pipDependencies\tList&lt;String&gt;\tList of kernel pip dependencies. "},{"title":"Code Example​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#code-example","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.6.0&quot;, &quot;pyarrow==0.17.0&quot;] } } ' http://127.0.0.1:32080/api/v1/environment  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1646619331994_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.6.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#list-environment","content":"GET /api/v1/environment  "},{"title":"Code Example​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#code-example-1","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/environment  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;environmentId&quot;:&quot;environment_1600862964725_0002&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-gpu-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-gpu-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1626160071451_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;: [&quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot;], &quot;pipDependencies&quot;: [&quot;apache-submarine\\u003d\\u003d0.5.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot;] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1600862964725_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } } ], &quot;attributes&quot;:{} }  "},{"title":"Get Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#get-environment","content":"GET /api/v1/environment/{name}  "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#parameters-1","content":"Field Name\tType\tIn\tDescriptionname\tString\tpath\tEnvironment name. "},{"title":"Code Example​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#code-example-2","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/environment/my-submarine-env  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1626160071451_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;: [&quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot;], &quot;pipDependencies&quot;: [&quot;apache-submarine\\u003d\\u003d0.5.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot;] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#patch-environment","content":"PATCH /api/v1/environment/{name}  "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#parameters-2","content":"Field Name\tType\tIn\tDescriptionname\tString\tpath and body\tEnvironment name. dockerImage\tString\tbody\tDocker image name. kernelSpec\tKernelSpec\tbody\tEnvironment spec. description\tString\tbody\tDescription of environment. This field is optional. "},{"title":"Code Example​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#code-example-3","content":"shell curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } ' http://127.0.0.1:32080/api/v1/environment/my-submarine-env  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1626160071451_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7_updated&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;: [&quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Delete Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#delete-environment","content":"DELETE /api/v1/environment/{name}  "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#parameters-3","content":"Field Name\tType\tIn\tDescriptionname\tString\tpath\tEnvironment name. "},{"title":"Code Example​","type":1,"pageTitle":"Environment REST API","url":"docs/0.6.0/userDocs/api/environment#code-example-4","content":"shell curl -X DELETE http://127.0.0.1:32080/api/v1/environment/my-submarine-env  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1626160071451_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;: [&quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot;], &quot;pipDependencies&quot;: [&quot;apache-submarine\\u003d\\u003d0.5.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot;] }, &quot;description&quot;:null, &quot;image&quot;:null } },&quot;attributes&quot;:{} }  "},{"title":"Experiment REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/api/experiment","content":"","keywords":""},{"title":"Create Experiment (Using Anonymous/Embedded Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#create-experiment-using-anonymousembedded-environment","content":"POST /api/v1/experiment  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters","content":"Put ExperimentSpec in request body. ExperimentSpec​ Field Name\tType\tDescriptionmeta\tExperimentMeta\tMeta data of the experiment template. environment\tEnvironmentSpec\tEnvironment of the experiment template. spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods. code\tCodeSpec\tExperiment codespec. ExperimentMeta​ Field Name\tType\tDescriptionname\tString\tExperiment name. namespace\tString\tExperiment namespace. framework\tString\tExperiemnt framework. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. EnvironmentSpec​ There are two types of environment: Anonymous and Predefined. Anonymous environment: only specify dockerImage in environment spec. The container will be built on the docker image.Embedded environment: specify name in environment spec. The container will be built on the existing environment (including dockerImage and kernalSpec). See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescriptionreplicas\tInteger\tNumbers of replicas. resoureces\tString\tResouces of the task name\tString\tTask name. image\tString\tImage name. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. CodeSpec​ Currently only support pulling from github. HDFS, NFS and s3 are in development Field Name\tType\tDescriptionsyncMode\tString (git|hdfs|nfs|s3)\tsync mode of code spec. url\tString\turl of code spec. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;5a6ec922-6c90-43d4-844f-039f6804ed36&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:47:51.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Create Experiment (Using Pre-defined/Stored Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#create-experiment-using-pre-definedstored-environment","content":"POST /api/v1/experiment  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters-1","content":"Put ExperimentSpec in request body. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-1","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  Above example assume environment &quot;my-submarine-env&quot; already exists in Submarine. Please refer Environment API Reference doc to environment rest api. response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0005&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;4944c603-0f21-49e5-826a-2ff820bb4d93&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:57:27.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List Experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#list-experiment","content":"GET /api/v1/experiment  "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-2","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/experiment  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;: [{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;name&quot;:&quot;newexperiment1&quot;, &quot;uid&quot;:&quot;b895985c-411c-4e89-90e0-c60a2a8a4235&quot;, &quot;status&quot;:&quot;Succeeded&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:21:31.000+08:00&quot;, &quot;createdTime&quot;:&quot;2021-07-13T16:21:31.000+08:00&quot;, &quot;runningTime&quot;:&quot;2021-07-13T16:21:46.000+08:00&quot;, &quot;finishedTime&quot;:&quot;2021-07-13T16:26:54.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;newexperiment1&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, { &quot;experimentId&quot;:&quot;experiment_1626160071451_0005&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;4944c603-0f21-49e5-826a-2ff820bb4d93&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:57:27.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }], &quot;attributes&quot;:{} }  "},{"title":"Get Experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#get-experiment","content":"GET /api/v1/experiment/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters-2","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tExperiment id. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-3","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/experiment/experiment_1626160071451_0005  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0005&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;4944c603-0f21-49e5-826a-2ff820bb4d93&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:57:27.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch Experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#patch-experiment","content":"PATCH /api/v1/experiment/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters-3","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tExperiment id. meta\tExperimentMeta\tbody\tMeta data of the experiment template. environment\tEnvironmentSpec\tbody\tEnvironment of the experiment template. spec\tMap&lt;String, ExperimentTaskSpec&gt;\tbody\tSpec of pods. code\tCodeSpec\tbody\tTODO "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-4","content":"shell curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment/experiment_1626160071451_0005  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0005&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;4944c603-0f21-49e5-826a-2ff820bb4d93&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:57:27.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Delete Experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#delete-experiment","content":"DELETE /api/v1/experiment/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters-4","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tExperiment id. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-5","content":"shell curl -X DELETE http://127.0.0.1:32080/api/v1/experiment/experiment_1626160071451_0005  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0005&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;uid&quot;:&quot;4944c603-0f21-49e5-826a-2ff820bb4d93&quot;, &quot;status&quot;:&quot;Deleted&quot;, &quot;acceptedTime&quot;:null, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV_1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;2048M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List Experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#list-experiment-log","content":"GET /api/v1/experiment/logs  "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-6","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;: [{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;logContent&quot;: [{ &quot;podName&quot;:&quot;newexperiment1-ps-0&quot;, &quot;podLog&quot;:[] }, { &quot;podName&quot;:&quot;newexperiment1-worker-0&quot;, &quot;podLog&quot;:[] }] }], &quot;attributes&quot;:{} }  "},{"title":"Get Experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#get-experiment-log","content":"GET /api/v1/experiment/logs/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#parameters-5","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tExperiment id. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/0.6.0/userDocs/api/experiment#code-example-7","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs/experiment_1626160071451_0001  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;logContent&quot;: [{ &quot;podName&quot;:&quot;newexperiment1-ps-0&quot;, &quot;podLog&quot;:[] }, { &quot;podName&quot;:&quot;newexperiment1-worker-0&quot;, &quot;podLog&quot;:[] }] }, &quot;attributes&quot;:{} }  "},{"title":"Notebook REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/api/notebook","content":"","keywords":""},{"title":"Create a Notebook Instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#create-a-notebook-instance","content":"POST /api/v1/notebook  "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#parameters","content":"NotebookSpec in request body. NotebookSpec​ Field Name\tType\tDescriptionmeta\tNotebookMeta\tMeta data of the notebook. environment\tEnvironmentSpec\tEnvironment of the experiment template. spec\tNotebookPodSpec\tSpec of the notebook pods. NotebookMeta​ Field Name\tType\tDescriptionname\tString\tNotebook name. namespace\tString\tNotebook namespace. ownerId\tString\tUser id. EnvironmentSpec​ See more details in environment api. NotebookPodSpec​ Field Name\tType\tDescriptionenvVars\tMap&lt;String, String&gt;\tEnvironmental variables. resources\tString\tResourecs of the pod. "},{"title":"Code Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#code-example","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;test-nb&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;ownerId&quot;: &quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;: { &quot;name&quot;: &quot;notebook-env&quot; }, &quot;spec&quot;: { &quot;envVars&quot;: { &quot;TEST_ENV&quot;: &quot;test&quot; }, &quot;resources&quot;: &quot;cpu=1,memory=1.0Gi&quot; } } ' http://127.0.0.1:32080/api/v1/notebook  response: { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1626160071451_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;a56713da-f2a3-40d0-ae2e-45fdc0bb15f5&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;creating&quot;, &quot;reason&quot;:&quot;The notebook instance is creating&quot;, &quot;createdTime&quot;:&quot;2021-07-13T16:23:38.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{&quot;TEST_ENV&quot;:&quot;test&quot;}, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"List notebook instances which belong to user​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#list-notebook-instances-which-belong-to-user","content":"GET /api/v1/notebook  "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#parameters-1","content":"Field Name\tType\tIn\tDescriptionid\tString\tquery\tUser id. "},{"title":"Code Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#code-example-1","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/notebook?id=e9ca23d68d884d4ebb19d07889727dae  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;List all notebook instances&quot;, &quot;result&quot;: [{ &quot;notebookId&quot;:&quot;notebook_1626160071451_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;a56713da-f2a3-40d0-ae2e-45fdc0bb15f5&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;waiting&quot;, &quot;reason&quot;:&quot;ContainerCreating&quot;, &quot;createdTime&quot;:&quot;2021-07-13T16:23:38.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{&quot;TEST_ENV&quot;:&quot;test&quot;}, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }], &quot;attributes&quot;:{} }  "},{"title":"Get the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#get-the-notebook-instance","content":"GET /api/v1/notebook/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#parameters-2","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tNotebook id. "},{"title":"Code Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#code-example-2","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/notebook/notebook_1626160071451_0001  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Get the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1626160071451_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;a56713da-f2a3-40d0-ae2e-45fdc0bb15f5&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;waiting&quot;, &quot;reason&quot;:&quot;ContainerCreating&quot;, &quot;createdTime&quot;:&quot;2021-07-13T16:23:38.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{&quot;TEST_ENV&quot;:&quot;test&quot;}, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Delete the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#delete-the-notebook-instance","content":"DELETE /api/v1/notebook/{id}  "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#parameters-3","content":"Field Name\tType\tIn\tDescriptionid\tString\tpath\tNotebook id. "},{"title":"Code Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/0.6.0/userDocs/api/notebook#code-example-3","content":"shell curl -X DELETE http://127.0.0.1:32080/api/v1/notebook/notebook_1626160071451_0001  response: { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1626160071451_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;a56713da-f2a3-40d0-ae2e-45fdc0bb15f5&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;terminating&quot;, &quot;reason&quot;:&quot;The notebook instance is terminating&quot;, &quot;createdTime&quot;:&quot;2021-07-13T16:23:38.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.6.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[&quot;defaults&quot;], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{&quot;TEST_ENV&quot;:&quot;test&quot;}, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"MLflow UI","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/others/mlflow","content":"","keywords":""},{"title":"Usage​","type":1,"pageTitle":"MLflow UI","url":"docs/0.6.0/userDocs/others/mlflow#usage","content":"MLflow UI shows the tracking result of the experiments. When we use the log_param or log_metric in ModelClient API, we could view the result in MLflow UI. Below is the example of the usage of MLflow UI. "},{"title":"Example​","type":1,"pageTitle":"MLflow UI","url":"docs/0.6.0/userDocs/others/mlflow#example","content":"Run the following code in the cluster from submarine import ModelsClient import random import time if __name__ == &quot;__main__&quot;: modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_param(&quot;learning_rate&quot;, random.random()) for i in range(100): time.sleep(1) modelClient.log_metric(&quot;mse&quot;, random.random() * 100, i) modelClient.log_metric(&quot;acc&quot;, random.random(), i)  In the MLflow UI page, you can see the log_param and the log_metric result. You can also compare the training between different workers.  "},{"title":"Tensorboard","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/others/tensorboard","content":"","keywords":""},{"title":"Write to LogDirs by the environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/0.6.0/userDocs/others/tensorboard#write-to-logdirs-by-the-environment-variable","content":""},{"title":"Environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/0.6.0/userDocs/others/tensorboard#environment-variable","content":"SUBMARINE_TENSORBOARD_LOG_DIR: Exist in every experiment container. You just need to direct your logs to $(SUBMARINE_TENSORBOARD_LOG_DIR) (NOTICE: it is () not {}), and you can inspect the process on the tensorboard webpage. "},{"title":"Example​","type":1,"pageTitle":"Tensorboard","url":"docs/0.6.0/userDocs/others/tensorboard#example","content":"{ &quot;meta&quot;: { &quot;name&quot;: &quot;tensorflow-tensorboard-dist-mnist&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=$(SUBMARINE_TENSORBOARD_LOG_DIR) --learning_rate=0.01 --batch_size=20&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=512M&quot; } } }  "},{"title":"Connect to the tensorboard webpage​","type":1,"pageTitle":"Tensorboard","url":"docs/0.6.0/userDocs/others/tensorboard#connect-to-the-tensorboard-webpage","content":"Open the experiment page in the workbench, and Click the TensorBoard button.  Inspect the process on tensorboard page.  "},{"title":"Experiment Template REST API","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/api/experiment-template","content":"","keywords":""},{"title":"Create Experiment Template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#create-experiment-template","content":"POST /api/v1/template  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#parameters","content":"Field Name\tType\tIn\tDescriptionname\tString\tbody\tExperiment template name. This is required. author\tString\tbody\tAuthor name. description\tString\tbody\tDescription of the experiment template. parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template. experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template. ExperimentTemplateParamSpec​ Field Name\tType\tDescriptionname\tString\tParameter name. required\tBoolean\ttrue / false. Whether the parameter is required. description\tString\tDescription of the parameter. value\tString\tValue of the parameter. ExperimentSpec​ Field Name\tType\tDescriptionmeta\tExperimentMeta\tMeta data of the experiment template. environment\tEnvironmentSpec\tEnvironment of the experiment template. spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods. code\tCodeSpec\tExperiment codespec. ExperimentMeta​ Field Name\tType\tDescriptionname\tString\tExperiment Name. namespace\tString\tExperiment namespace. framework\tString\tExperiment framework. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. EnvironmentSpec​ See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescriptionreplicas\tInteger\tNumbers of replicas. resoureces\tString\tResouces of the task name\tString\tTask name. image\tString\tImage name. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. CodeSpec​ Field Name\tType\tDescriptionsyncMode\tString\tsync mode of code spec. url\tString\turl of code spec. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#code-example","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"List Experiment Template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#list-experiment-template","content":"GET /api/v1/template  "},{"title":"Code Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#code-example-1","content":"shell curl -X GET http://127.0.0.1:32080/api/v1/template  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ [{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }], &quot;attributes&quot;:{} }  "},{"title":"Patch Experiment Template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#patch-experiment-template","content":"PATCH /api/v1/template{name}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#parameters-1","content":"Field Name\tType\tIn\tDescriptionname\tString\tpath and body\tExperiment template name. This is required. author\tString\tbody\tAuthor name. description\tString\tbody\tDescription of the experiment template. parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template. experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#code-example-2","content":"shell curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author-new&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Delete Experiment Template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#delete-experiment-template","content":"DELETE /api/v1/template{name}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#parameters-2","content":"Field Name\tType\tIn\tDescriptionname\tString\tpath\tExperiment template name. This is required. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#code-example-3","content":"shell curl -X DELETE http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  reponse { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Use Template to Create a Experiment​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#use-template-to-create-a-experiment","content":"POST /api/v1/experiment/{template_name}  "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#parameters-3","content":"Field Name\tType\tIn\tDescriptiontemplate_name\tString\tpath\tExperiment template name. name\tString\tbody\tExperiment template name. params\tMap&lt;String, String&gt;\tbody\tParameters of the experiment including experiment_name. "},{"title":"Code Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/0.6.0/userDocs/api/experiment-template#code-example-4","content":"shell curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;tf-mnist&quot;, &quot;params&quot;: { &quot;learning_rate&quot;:&quot;0.01&quot;, &quot;batch_size&quot;:&quot;150&quot;, &quot;experiment_name&quot;:&quot;newexperiment1&quot; } } ' http://127.0.0.1:32080/api/v1/experiment/my-tf-mnist-template  response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;name&quot;:&quot;newexperiment1&quot;, &quot;uid&quot;:&quot;b895985c-411c-4e89-90e0-c60a2a8a4235&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:21:31.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;newexperiment1&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Submarine-SDK","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-sdk/","content":"","keywords":""},{"title":"Summary​","type":1,"pageTitle":"Submarine-SDK","url":"docs/0.6.0/userDocs/submarine-sdk/#summary","content":"Support Python, Scala, R language for algorithm development Support tracking/metrics APIs which allows developers add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Model Client","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-sdk/model-client","content":"","keywords":""},{"title":"class ModelClient()​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#class-modelclient","content":"The submarine ModelsClient provides a high-level API for logging metrics / parameters and managing models. "},{"title":"ModelsClient(tracking_uri=None, registry_uri=None)->ModelsClient​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclienttracking_urinone-registry_urinone-modelsclient","content":"Initialize a ModelsClient instance. Parameters tracking_uri: If run in Submarine, you do not need to specify it. Otherwise, specify the external tracking_uri.registry_uri: If run in Submarine, you do not need to specify it. Otherwise, specify the external registry_uri. Returns ModelsClient instance Example from submarine import ModelsClient modelClient = ModelsClient(tracking_uri=&quot;0.0.0.0:4000&quot;, registry_uri=&quot;0.0.0.0:5000&quot;)  "},{"title":"ModelsClient.start()->[Active Run]​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclientstart-active-run","content":"For details of Active Run Start a new Mlflow run, and direct the logging of the artifacts and metadata to the Run named &quot;worker_i&quot; under Experiment &quot;job_id&quot;. If in distributed training, worker and job id would be parsed from environment variable. If in local traning, worker and job id will be generated. Returns Active Run "},{"title":"ModelsClient.log_param(key, value)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclientlog_paramkey-value-none","content":"Log parameter under the current run. Parameters key – Parameter namevalue – Parameter value Example from submarine import ModelsClient modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_param(&quot;learning_rate&quot;, 0.01)  "},{"title":"ModelsClient.log_params(params)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclientlog_paramsparams-none","content":"Log a batch of params for the current run. Parameters params – Dictionary of param_name: String -&gt; value Example from submarine import ModelsClient params = {&quot;learning_rate&quot;: 0.01, &quot;n_estimators&quot;: 10} modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_params(params)  "},{"title":"ModelsClient.log_metric(self, key, value, step=None)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclientlog_metricself-key-value-stepnone-none","content":"Log a metric under the current run. Parameters key – Metric name (string).value – Metric value (float).step – Metric step (int). Defaults to zero if unspecified. Example from submarine import ModelsClient modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_metric(&quot;mse&quot;, 2500.00)  "},{"title":"ModelsClient.log_metrics(self, metrics, step=None)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#modelsclientlog_metricsself-metrics-stepnone-none","content":"Log multiple metrics for the current run. Parameters metrics – Dictionary of metric_name: String -&gt; value: Float.step – A single integer step at which to log the specified Metrics. If unspecified, each metric is logged at step zero. Example from submarine import ModelsClient metrics = {&quot;mse&quot;: 2500.00, &quot;rmse&quot;: 50.00} modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_metrics(metrics)  "},{"title":"(Beta) ModelsClient.save_model(self, model_type, model, artifact_path, registered_model_name=None)​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#beta-modelsclientsave_modelself-model_type-model-artifact_path-registered_model_namenone","content":"Save model to model registry. "},{"title":"(Beta) ModelsClient.load_model(self, name, version)->mlflow.pyfunc.PyFuncModel​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#beta-modelsclientload_modelself-name-version-mlflowpyfuncpyfuncmodel","content":"Load a model from model registry. "},{"title":"(Beta) ModelsClient.update_model(self, name, new_name)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#beta-modelsclientupdate_modelself-name-new_name-none","content":"Update a model by new name. "},{"title":"(Beta) ModelsClient.delete_model(self, name, version)->None​","type":1,"pageTitle":"Model Client","url":"docs/0.6.0/userDocs/submarine-sdk/model-client#beta-modelsclientdelete_modelself-name-version-none","content":"Delete a model in model registry. "},{"title":"Experiment Client","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client","content":"","keywords":""},{"title":"class ExperimentClient()​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#class-experimentclient","content":"Client of a submarine server that creates and manages experients and logs. "},{"title":"create_experiment(experiment_spec: json) -> dict​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#create_experimentexperiment_spec-json---dict","content":"Create an experiment. Parameters experiment_spec: Submarine experiment spec. More detailed information can be found at Experiment API. Returns: The detailed info about the submarine experiment. Example from submarine import * client = ExperimentClient() client.create_experiment({ &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })  "},{"title":"patch_experiment(id: str, experiment_spec: json) -> dict​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#patch_experimentid-str-experiment_spec-json---dict","content":"Patch an experiment. Parameters id: Submarine experiment id. experiment_spec: Submarine experiment spec. More detailed information can be found at Experiment API. Returns The detailed info about the submarine experiment. Example client.patch_experiment(&quot;experiment_1626160071451_0008&quot;, { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })  "},{"title":"get_experiment(id: str) -> dict​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#get_experimentid-str---dict","content":"Get the experiment's detailed info by id. Parameters id: Submarine experiment id. Returns The detailed info about the submarine experiment. Example experiment = client.get_experiment(&quot;experiment_1626160071451_0008&quot;)  "},{"title":"list_experiments(status: Optional[str]=None) -> list[dict]​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#list_experimentsstatus-optionalstrnone---listdict","content":"List all experiment for the user. Parameters status: Accepted, Created, Running, Succeeded, Deleted. Returns List of submarine experiments. Example experiments = client.list_experiments()  "},{"title":"delete_experiment(id: str) -> dict​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#delete_experimentid-str---dict","content":"Delete the submarine experiment. Parameters id: Submarine experiment id. Returns The detailed info about the deleted submarine experiment. Example client.delete_experiment(&quot;experiment_1626160071451_0008&quot;)  "},{"title":"get_log(id: str, onlyMaster: Optional[bool]=False) -> None​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#get_logid-str-onlymaster-optionalboolfalse---none","content":"Print training logs of all pod of the experiment. By default print all the logs of Pod. Parameters id: Submarine experiment id.onlyMaster: By default include pod log of &quot;master&quot; which might be Tensorflow PS/Chief or PyTorch master. Return The info of pod logs Example client.get_log(&quot;experiment_1626160071451_0009&quot;)  "},{"title":"list_log(status: str) -> list[dict]​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#list_logstatus-str---listdict","content":"List experiment log. Parameters status: Accepted, Created, Running, Succeeded, Deleted. Returns List of submarine experiment logs. Example logs = client.list_log(&quot;Succeeded&quot;)  "},{"title":"wait_for_finish(id: str, polling_interval: Optional[int]=10) -> dict​","type":1,"pageTitle":"Experiment Client","url":"docs/0.6.0/userDocs/submarine-sdk/experiment-client#wait_for_finishid-str-polling_interval-optionalint10---dict","content":"Waits until the experiment is finished or failed. Parameters id: Submarine experiment id.polling_interval: How many seconds between two polls for the status of the experiment. Returns Submarine experiment logs. Example logs = client.wait_for_finish(&quot;experiment_1626160071451_0009&quot;, 5)  "},{"title":"Tracking","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-sdk/tracking","content":"","keywords":""},{"title":"Functions​","type":1,"pageTitle":"Tracking","url":"docs/0.6.0/userDocs/submarine-sdk/tracking#functions","content":""},{"title":"submarine.get_tracking_uri() -> str​","type":1,"pageTitle":"Tracking","url":"docs/0.6.0/userDocs/submarine-sdk/tracking#submarineget_tracking_uri---str","content":"Get the tracking URI. If none has been specified, check the environmental variables. If uri is still none, return the default submarine jdbc url. Returns The tracking URI. "},{"title":"submarine.set_tracking_uri(uri: str) -> None​","type":1,"pageTitle":"Tracking","url":"docs/0.6.0/userDocs/submarine-sdk/tracking#submarineset_tracking_uriuri-str---none","content":"set the tracking URI. You can also set the SUBMARINE_TRACKING_URI environment variable to have Submarine find a URI from there. The URI should be database connection string. Parameters uri - Submarine record data to Mysql server. The database URL is expected in the format &lt;dialect&gt;+&lt;driver&gt;://&lt;username&gt;:&lt;password&gt;@&lt;host&gt;:&lt;port&gt;/&lt;database&gt;. By default it's mysql+pymysql://submarine:password@submarine-database:3306/submarine. More detail : SQLAlchemy docs "},{"title":"submarine.log_param(key: str, value: str) -> None​","type":1,"pageTitle":"Tracking","url":"docs/0.6.0/userDocs/submarine-sdk/tracking#submarinelog_paramkey-str-value-str---none","content":"log a single key-value parameter. The key and value are both strings. Parameters key - Parameter name.value - Parameter value. "},{"title":"submarine.log_metric(key: str, value: float, step=0) -> None​","type":1,"pageTitle":"Tracking","url":"docs/0.6.0/userDocs/submarine-sdk/tracking#submarinelog_metrickey-str-value-float-step0---none","content":"log a single key-value metric. The value must always be a number. Parameters key - Metric name.value - Metric value.step - A single integer step at which to log the specified Metrics, by default it's 0. "},{"title":"Python SDK Development","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#prerequisites","content":"This is required for developing &amp; testing changes, we recommend installing pysubmarine in its own conda environment by running the following conda create --name submarine-dev python=3.6 conda activate submarine-dev # Install auto-format and lints (lint-requirements.txt is in ./dev-support/style-check/python) pip install -r lint-requirements.txt # Install mypy (mypy-requirements.txt is in ./dev-support/style-check/python) pip install -r mypy-requirements.txt # test-requirements.txt is in ./submarine-sdk/pysubmarine/github-actions pip install -r test-requirements.txt # Installs pysubmarine from current checkout pip install ./submarine-sdk/pysubmarine  "},{"title":"PySubmarine Docker​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#pysubmarine-docker","content":"We also use docker to provide build environments for CI, development, generate python sdk from swagger. ./run-pysubmarine-ci.sh  The script does the following things: Start an interactive bash sessionMount submarine directory to /workspace and set it as homeSwitch user to be the same user that calls the run-pysubmarine-ci.sh "},{"title":"Coding Style​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#coding-style","content":"Use isort to sort the Python imports and black to format Python codeBoth style is configured in pyproject.tomlTo autoformat code ./dev-support/style-check/python/auto-format.sh  Use flake8 to verify the linter, its' configure is in .flake8.Also, we are using mypy to check the static type in submarine-sdk/pysubmarine/submarine.Verify linter pass before submitting a pull request by running: ./dev-support/style-check/python/lint.sh  If you encouter a unexpected format, use the following method # fmt: off &quot;Unexpected format, formated by yourself&quot; # fmt: on  "},{"title":"Unit Testing​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#unit-testing","content":"We are using pytest to develop our unit test suite. After building the project (see below) you can run its unit tests like so: cd submarine-sdk/pysubmarine  Run unit test pytest --cov=submarine -vs -m &quot;not e2e&quot;  Run integration test pytest --cov=submarine -vs -m &quot;e2e&quot;  Before run this command in local, you should make sure the submarine server is running. "},{"title":"Generate python SDK from swagger​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#generate-python-sdk-from-swagger","content":"We use open-api generatorto generate pysubmarine client API that used to communicate with submarine server. If change below files, please run ./dev-support/pysubmarine/gen-sdk.shto generate latest version of SDK. Bootstrap.javaExperimentRestApi.java "},{"title":"Model Management Model Development​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#model-management-model-development","content":"For local development, we can access cluster's service easily thanks to telepresence. To elaborate, we can develop the sdk in local but can reach out to mlflow server by proxy. Install telepresence follow the instruction.Start proxy pod telepresence --new-deployment submarine-dev  You can develop as if in the cluster. "},{"title":"Upload package to PyPi​","type":1,"pageTitle":"Python SDK Development","url":"docs/0.6.0/userDocs/submarine-sdk/pysubmarine/development#upload-package-to-pypi","content":"For Apache Submarine committer and PMCs to do a new release. Change the version from 0.x.x-SNAPSHOT to 0.x.x in setup.pyInstall Python packages cd submarine-sdk/pysubmarine pip install -r github-actions/pypi-requirements.txt  Compiling Your Package It will create build, dist, and project.egg.infoin your local directory python setup.py bdist_wheel  Upload python package to TestPyPI for testing python -m twine upload --repository testpypi dist/*  Upload python package to PyPi python -m twine upload --repository-url https://upload.pypi.org/legacy/ dist/*  "},{"title":"Building Submarine Spark Security Plugin","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-security/spark-security/build-submarine-spark-security-plugin","content":"Building Submarine Spark Security Plugin Submarine Spark Security Plugin is built using Apache Maven. To build it, cd to the root direct of submarine project and run: mvn clean package -Dmaven.javadoc.skip=true -DskipTests -pl :submarine-spark-security By default, Submarine Spark Security Plugin is built against Apache Spark 2.3.x and Apache Ranger 1.1.0, which may be incompatible with other Apache Spark or Apache Ranger releases. Currently, available profiles are: Spark: -Pspark-2.3, -Pspark-2.4, -Pspark-3.0 Ranger: -Pranger-1.2, -Pranger-2.0","keywords":""},{"title":"Submarine Spark Security Plugin","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/submarine-security/spark-security/","content":"","keywords":""},{"title":"Build​","type":1,"pageTitle":"Submarine Spark Security Plugin","url":"docs/0.6.0/userDocs/submarine-security/spark-security/#build","content":"Please refer to the online documentation - Building submarine spark security plguin "},{"title":"Quick Start​","type":1,"pageTitle":"Submarine Spark Security Plugin","url":"docs/0.6.0/userDocs/submarine-security/spark-security/#quick-start","content":"Three steps to integrate Apache Spark and Apache Ranger. "},{"title":"Installation​","type":1,"pageTitle":"Submarine Spark Security Plugin","url":"docs/0.6.0/userDocs/submarine-security/spark-security/#installation","content":"Place the submarine-spark-security-&lt;version&gt;.jar into $SPARK_HOME/jars. "},{"title":"Configurations​","type":1,"pageTitle":"Submarine Spark Security Plugin","url":"docs/0.6.0/userDocs/submarine-security/spark-security/#configurations","content":"Settings for Apache Ranger​ Create ranger-spark-security.xml in $SPARK_HOME/conf and add the following configurations for pointing to the right Apache Ranger admin server.  &lt;configuration&gt; &lt;property&gt; &lt;name&gt;ranger.plugin.spark.policy.rest.url&lt;/name&gt; &lt;value&gt;ranger admin address like http://ranger-admin.org:6080&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;ranger.plugin.spark.service.name&lt;/name&gt; &lt;value&gt;a ranger hive service name&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;ranger.plugin.spark.policy.cache.dir&lt;/name&gt; &lt;value&gt;./a ranger hive service name/policycache&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;ranger.plugin.spark.policy.pollIntervalMs&lt;/name&gt; &lt;value&gt;5000&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;ranger.plugin.spark.policy.source.impl&lt;/name&gt; &lt;value&gt;org.apache.ranger.admin.client.RangerAdminRESTClient&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt;  Create ranger-spark-audit.xml in $SPARK_HOME/conf and add the following configurations to enable/disable auditing. &lt;configuration&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.is.enabled&lt;/name&gt; &lt;value&gt;true&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.destination.db&lt;/name&gt; &lt;value&gt;false&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.destination.db.jdbc.driver&lt;/name&gt; &lt;value&gt;com.mysql.jdbc.Driver&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.destination.db.jdbc.url&lt;/name&gt; &lt;value&gt;jdbc:mysql://10.171.161.78/ranger&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.destination.db.password&lt;/name&gt; &lt;value&gt;rangeradmin&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;xasecure.audit.destination.db.user&lt;/name&gt; &lt;value&gt;rangeradmin&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt;  Settings for Apache Spark​ You can configure spark.sql.extensions with the *Extension we provided. For example, spark.sql.extensions=org.apache.submarine.spark.security.api.RangerSparkAuthzExtension Currently, you can set the following options to spark.sql.extensions to choose authorization w/ or w/o extra functions. option\tauthorization\trow filtering\tdata maskingorg.apache.submarine.spark.security.api.RangerSparkAuthzExtension\t√\t×\t× org.apache.submarine.spark.security.api.RangerSparkSQLExtension\t√\t√\t√ "},{"title":"Write Dockerfiles for Submarine","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/Dockerfiles","content":"Write Dockerfiles for Submarine How to write Dockerfile for Submarine TensorFlow jobs How to write Dockerfile for Submarine PyTorch jobs How to write Dockerfile for Submarine MXNet jobs","keywords":""},{"title":"Test and Troubleshooting","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting","content":"","keywords":""},{"title":"Test with a tensorflow job​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#test-with-a-tensorflow-job","content":"Distributed-shell + GPU + cgroup  ... \\ job run \\ --env DOCKER_JAVA_HOME=/opt/java \\ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu \\ --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \\ --worker_docker_image tf-1.13.1-gpu:0.0.1 \\ --ps_docker_image tf-1.13.1-cpu:0.0.1 \\ --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \\ --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \\ --num_ps 0 \\ --ps_resources memory=4G,vcores=2,gpu=0 \\ --ps_launch_cmd &quot;python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data --job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0&quot; \\ --worker_resources memory=4G,vcores=2,gpu=1 --verbose \\ --num_workers 1 \\ --worker_launch_cmd &quot;python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data --job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 --eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1&quot;  "},{"title":"Issues:​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issues","content":""},{"title":"Issue 1: Fail to start nodemanager after system reboot​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issue-1-fail-to-start-nodemanager-after-system-reboot","content":"2018-09-20 18:54:39,785 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems! org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:cpu Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:389) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997) 2018-09-20 18:54:39,789 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED  Solution: Grant user yarn the access to /sys/fs/cgroup/cpu,cpuacct, which is the subfolder of cgroup mount destination. chown :yarn -R /sys/fs/cgroup/cpu,cpuacct chmod g+rwx -R /sys/fs/cgroup/cpu,cpuacct  If GPUs are used，the access to cgroup devices folder is neede as well chown :yarn -R /sys/fs/cgroup/devices chmod g+rwx -R /sys/fs/cgroup/devices  "},{"title":"Issue 2: container-executor permission denied​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issue-2-container-executor-permission-denied","content":"2018-09-21 09:36:26,102 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: IOException executing command: java.io.IOException: Cannot run program &quot;/etc/yarn/sbin/Linux-amd64-64/container-executor&quot;: error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.util.Shell.runCommand(Shell.java:938) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)  Solution: The permission of /etc/yarn/sbin/Linux-amd64-64/container-executor should be 6050 "},{"title":"Issue 3：How to get docker service log​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issue-3how-to-get-docker-service-log","content":"Solution: we can get docker log with the following command journalctl -u docker  "},{"title":"Issue 4：docker can't remove containers with errors like device or resource busy​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issue-4docker-cant-remove-containers-with-errors-like-device-or-resource-busy","content":"$ docker rm 0bfafa146431 Error response from daemon: Unable to remove filesystem for 0bfafa146431771f6024dcb9775ef47f170edb2f1852f71916ba44209ca6120a: remove /app/docker/containers/0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a/shm: device or resource busy  Solution: to find which process leads to a device or resource busy, we can add a shell script, named find-busy-mnt.sh #!/usr/bin/env bash # A simple script to get information about mount points and pids and their # mount namespaces. if [ $# -ne 1 ];then echo &quot;Usage: $0 &lt;devicemapper-device-id&gt;&quot; exit 1 fi ID=$1 MOUNTS=`find /proc/*/mounts | xargs grep $ID 2&gt;/dev/null` [ -z &quot;$MOUNTS&quot; ] &amp;&amp; echo &quot;No pids found&quot; &amp;&amp; exit 0 printf &quot;PID\\tNAME\\t\\tMNTNS\\n&quot; echo &quot;$MOUNTS&quot; | while read LINE; do PID=`echo $LINE | cut -d &quot;:&quot; -f1 | cut -d &quot;/&quot; -f3` # Ignore self and thread-self if [ &quot;$PID&quot; == &quot;self&quot; ] || [ &quot;$PID&quot; == &quot;thread-self&quot; ]; then continue fi NAME=`ps -q $PID -o comm=` MNTNS=`readlink /proc/$PID/ns/mnt` printf &quot;%s\\t%s\\t\\t%s\\n&quot; &quot;$PID&quot; &quot;$NAME&quot; &quot;$MNTNS&quot; done  Kill the process by pid, which is found by the script $ chmod +x find-busy-mnt.sh ./find-busy-mnt.sh 0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a # PID NAME MNTNS # 5007 ntpd mnt:[4026533598] $ kill -9 5007  "},{"title":"Issue 5：Yarn failed to start containers​","type":1,"pageTitle":"Test and Troubleshooting","url":"docs/0.6.0/userDocs/yarn/TestAndTroubleshooting#issue-5yarn-failed-to-start-containers","content":"if the number of GPUs required by applications is larger than the number of GPUs in the cluster, there would be some containers can't be created. "},{"title":"Docker Images for MXNet","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileMX","content":"","keywords":""},{"title":"How to create docker images to run MXNet on YARN​","type":1,"pageTitle":"Docker Images for MXNet","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileMX#how-to-create-docker-images-to-run-mxnet-on-yarn","content":"Dockerfile to run MXNet on YARN needs two parts: Base libraries which MXNet depends on 1) OS base image, for example ubuntu:18.04 2) MXNet dependent libraries and packages. \\ For example python, scipy. For GPU support, you also need cuda, cudnn, etc. 3) MXNet package. Libraries to access HDFS 1) JDK 2) Hadoop Here's an example of a base image (without GPU support) to install MXNet: FROM ubuntu:18.04 # Install some development tools and packages # MXNet 1.6 is going to be the last MXNet release to support Python2 RUN apt-get update &amp;&amp; DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata git \\ wget zip python3 python3-pip python3-distutils libgomp1 libopenblas-dev libopencv-dev # Install latest MXNet using pip (without GPU support) RUN pip3 install mxnet RUN echo &quot;Install python related packages&quot; &amp;&amp; \\ pip3 install --user graphviz==0.8.4 ipykernel jupyter matplotlib numpy pandas scipy sklearn &amp;&amp; \\ python3 -m ipykernel.kernelspec  On top of above image, add files, install packages to access HDFS ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 RUN apt-get update &amp;&amp; apt-get install -y openjdk-8-jdk wget # Install hadoop ENV HADOOP_VERSION=&quot;3.1.2&quot; RUN wget https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz # If you are in mainland China, you can use the following command. # RUN wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz RUN tar zxf hadoop-${HADOOP_VERSION}.tar.gz RUN ln -s hadoop-${HADOOP_VERSION} hadoop-current RUN rm hadoop-${HADOOP_VERSION}.tar.gz  Build and push to your own docker registry: Use docker build ... and docker push ... to finish this step. "},{"title":"Use examples to build your own MXNet docker images​","type":1,"pageTitle":"Docker Images for MXNet","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileMX#use-examples-to-build-your-own-mxnet-docker-images","content":"We provided some example Dockerfiles for you to build your own MXNet docker images. For latest MXNet docker/mxnet/base/ubuntu-18.04/Dockerfile.cpu.mxnet_latest: Latest MXNet that supports CPUdocker/mxnet/base/ubuntu-18.04/Dockerfile.gpu.mxnet_latest: Latest MXNet that supports GPU, which is prebuilt to CUDA10. Build Docker images "},{"title":"Manually build Docker image:​","type":1,"pageTitle":"Docker Images for MXNet","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileMX#manually-build-docker-image","content":"Under docker/mxnet directory, run build-all.sh to build all Docker images. This command will build the following Docker images: mxnet-latest-cpu-base:0.0.1 for base Docker image which includes Hadoop, MXNetmxnet-latest-gpu-base:0.0.1 for base Docker image which includes Hadoop, MXNet, GPU base libraries. "},{"title":"Docker Images for PyTorch","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT","content":"","keywords":""},{"title":"How to create docker images to run PyTorch on YARN​","type":1,"pageTitle":"Docker Images for PyTorch","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT#how-to-create-docker-images-to-run-pytorch-on-yarn","content":"Dockerfile to run PyTorch on YARN needs two parts: Base libraries which PyTorch depends on 1) OS base image, for example ubuntu:18.04 2) PyTorch dependent libraries and packages. For example python, scipy. For GPU support, you also need cuda, cudnn, etc. 3) PyTorch package. Libraries to access HDFS 1) JDK 2) Hadoop Here's an example of a base image (with GPU support) to install PyTorch: FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 ARG PYTHON_VERSION=3.6 RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \\ build-essential \\ cmake \\ git \\ curl \\ vim \\ ca-certificates \\ libjpeg-dev \\ libpng-dev \\ wget &amp;&amp;\\ rm -rf /var/lib/apt/lists/* RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh &amp;&amp; \\ chmod +x ~/miniconda.sh &amp;&amp; \\ ~/miniconda.sh -b -p /opt/conda &amp;&amp; \\ rm ~/miniconda.sh &amp;&amp; \\ /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include cython typing &amp;&amp; \\ /opt/conda/bin/conda install -y -c pytorch magma-cuda100 &amp;&amp; \\ /opt/conda/bin/conda clean -ya ENV PATH /opt/conda/bin:$PATH RUN pip install ninja # This must be done before pip so that requirements.txt is available WORKDIR /opt/pytorch RUN git clone https://github.com/pytorch/pytorch.git WORKDIR pytorch RUN git submodule update --init RUN TORCH_CUDA_ARCH_LIST=&quot;3.5 5.2 6.0 6.1 7.0+PTX&quot; TORCH_NVCC_FLAGS=&quot;-Xfatbin -compress-all&quot; \\ CMAKE_PREFIX_PATH=&quot;$(dirname $(which conda))/../&quot; \\ pip install -v . WORKDIR /opt/pytorch RUN git clone https://github.com/pytorch/vision.git &amp;&amp; cd vision &amp;&amp; pip install -v .  On top of above image, add files, install packages to access HDFS RUN apt-get update &amp;&amp; apt-get install -y openjdk-8-jdk wget # Install hadoop ENV HADOOP_VERSION=&quot;2.9.2&quot; RUN wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz RUN tar zxf hadoop-${HADOOP_VERSION}.tar.gz RUN ln -s hadoop-${HADOOP_VERSION} hadoop-current RUN rm hadoop-${HADOOP_VERSION}.tar.gz  Build and push to your own docker registry: Use docker build ... and docker push ... to finish this step. "},{"title":"Use examples to build your own PyTorch docker images​","type":1,"pageTitle":"Docker Images for PyTorch","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT#use-examples-to-build-your-own-pytorch-docker-images","content":"We provided some example Dockerfiles for you to build your own PyTorch docker images. For latest PyTorch docker/pytorch/base/ubuntu-18.04/Dockerfile.gpu.pytorch_latest: Latest Pytorch that supports GPU, which is prebuilt to CUDA10.docker/pytorch/with-cifar10-models/ubuntu-18.04/Dockerfile.gpu.pytorch_latest: Latest Pytorch that GPU, which is prebuilt to CUDA10, with models. "},{"title":"Build Docker images​","type":1,"pageTitle":"Docker Images for PyTorch","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT#build-docker-images","content":""},{"title":"Manually build Docker image:​","type":1,"pageTitle":"Docker Images for PyTorch","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT#manually-build-docker-image","content":"Under docker/pytorch directory, run build-all.sh to build all Docker images. This command will build the following Docker images: pytorch-latest-gpu-base:0.0.1 for base Docker image which includes Hadoop, PyTorch, GPU base libraries.pytorch-latest-gpu:0.0.1 which includes cifar10 model as well "},{"title":"Use prebuilt images​","type":1,"pageTitle":"Docker Images for PyTorch","url":"docs/0.6.0/userDocs/yarn/WriteDockerfilePT#use-prebuilt-images","content":"(No liability) You can also use prebuilt images for convenience: hadoopsubmarine/pytorch-latest-gpu-base:0.0.1 "},{"title":"README","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/","content":"","keywords":""},{"title":"Prerequisite​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#prerequisite","content":"Install TensorFlow version 1.2.1 or later. Download the CIFAR-10 dataset and generate TFRecord files using the provided script. The script and associated command below will download the CIFAR-10 dataset and then generate a TFRecord for the training, validation, and evaluation datasets. python generate_cifar10_tfrecords.py --data-dir=${PWD}/cifar-10-data  After running the command above, you should see the following files in the --data-dir (ls -R cifar-10-data): train.tfrecordsvalidation.tfrecordseval.tfrecords "},{"title":"Training on a single machine with GPUs or CPU​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#training-on-a-single-machine-with-gpus-or-cpu","content":"Run the training on CPU only. After training, it runs the evaluation. python cifar10_main.py --data-dir=${PWD}/cifar-10-data \\ --job-dir=/tmp/cifar10 \\ --num-gpus=0 \\ --train-steps=1000  Run the model on 2 GPUs using CPU as parameter server. After training, it runs the evaluation. python cifar10_main.py --data-dir=${PWD}/cifar-10-data \\ --job-dir=/tmp/cifar10 \\ --num-gpus=2 \\ --train-steps=1000  Run the model on 2 GPUs using GPU as parameter server. It will run an experiment, which for local setting basically means it will run stop training a couple of times to perform evaluation. python cifar10_main.py --data-dir=${PWD}/cifar-10-data \\ --job-dir=/tmp/cifar10 \\ --variable-strategy GPU \\ --num-gpus=2 \\  There are more command line flags to play with; runpython cifar10_main.py --help for details. "},{"title":"Run distributed training​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#run-distributed-training","content":""},{"title":"(Optional) Running on Google Cloud Machine Learning Engine​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#optional-running-on-google-cloud-machine-learning-engine","content":"This example can be run on Google Cloud Machine Learning Engine (ML Engine), which will configure the environment and take care of running workers, parameters servers, and masters in a fault tolerant way. To install the command line tool, and set up a project and billing, see the quickstart here. You'll also need a Google Cloud Storage bucket for the data. If you followed the instructions above, you can just run: MY_BUCKET=gs://&lt;my-bucket-name&gt; gsutil cp -r ${PWD}/cifar-10-data $MY_BUCKET/  Then run the following command from the tutorials/image directory of this repository (the parent directory of this README): gcloud ml-engine jobs submit training cifarmultigpu \\ --runtime-version 1.2 \\ --job-dir=$MY_BUCKET/model_dirs/cifarmultigpu \\ --config cifar10_estimator/cmle_config.yaml \\ --package-path cifar10_estimator/ \\ --module-name cifar10_estimator.cifar10_main \\ -- \\ --data-dir=$MY_BUCKET/cifar-10-data \\ --num-gpus=4 \\ --train-steps=1000  "},{"title":"Set TF_CONFIG​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#set-tf_config","content":"Considering that you already have multiple hosts configured, all you need is aTF_CONFIG environment variable on each host. You can set up the hosts manually or check tensorflow/ecosystem for instructions about how to set up a Cluster. The TF_CONFIG will be used by the RunConfig to know the existing hosts and their task: master, ps or worker. Here's an example of TF_CONFIG. cluster = {'master': ['master-ip:8000'], 'ps': ['ps-ip:8000'], 'worker': ['worker-ip:8000']} TF_CONFIG = json.dumps( {'cluster': cluster, 'task': {'type': master, 'index': 0}, 'model_dir': 'gs://&lt;bucket_path&gt;/&lt;dir_path&gt;', 'environment': 'cloud' })  Cluster A cluster spec, which is basically a dictionary that describes all of the tasks in the cluster. More about it here. In this cluster spec we are defining a cluster with 1 master, 1 ps and 1 worker. ps: saves the parameters among all workers. All workers can read/write/update the parameters for model via ps. As some models are extremely large the parameters are shared among the ps (each ps stores a subset). worker: does the training. master: basically a special worker, it does training, but also restores and saves checkpoints and do evaluation. Task The Task defines what is the role of the current node, for this example the node is the master on index 0 on the cluster spec, the task will be different for each node. An example of the TF_CONFIG for a worker would be: cluster = {'master': ['master-ip:8000'], 'ps': ['ps-ip:8000'], 'worker': ['worker-ip:8000']} TF_CONFIG = json.dumps( {'cluster': cluster, 'task': {'type': worker, 'index': 0}, 'model_dir': 'gs://&lt;bucket_path&gt;/&lt;dir_path&gt;', 'environment': 'cloud' })  Model_dir This is the path where the master will save the checkpoints, graph and TensorBoard files. For a multi host environment you may want to use a Distributed File System, Google Storage and DFS are supported. Environment By the default environment is local, for a distributed setting we need to change it to cloud. "},{"title":"Running script​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#running-script","content":"Once you have a TF_CONFIG configured properly on each host you're ready to run on distributed settings. Master​ Run this on master: Runs an Experiment in sync mode on 4 GPUs using CPU as parameter server for 40000 steps. It will run evaluation a couple of times during training. The num_workers argument is used only to update the learning rate correctly. Make sure the model_dir is the same as defined on the TF_CONFIG. python cifar10_main.py --data-dir=gs://path/cifar-10-data \\ --job-dir=gs://path/model_dir/ \\ --num-gpus=4 \\ --train-steps=40000 \\ --sync \\ --num-workers=2  Output: INFO:tensorflow:Using model_dir in TF_CONFIG: gs://path/model_dir/ INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 1, '_keep_checkpoint_max': 5, '_task_type': u'master', '_is_chief': True, '_cluster_spec': &lt;tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd16fb2be10&gt;, '_model_dir': 'gs://path/model_dir/', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': intra_op_parallelism_threads: 1 gpu_options { } allow_soft_placement: true , '_tf_random_seed': None, '_environment': u'cloud', '_num_worker_replicas': 1, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1.0 } , '_evaluation_master': '', '_master': u'grpc://master-ip:8000'} ... 2017-08-01 19:59:26.496208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:04.0 Total memory: 11.17GiB Free memory: 11.09GiB 2017-08-01 19:59:26.775660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:05.0 Total memory: 11.17GiB Free memory: 11.10GiB ... 2017-08-01 19:59:29.675171: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:8000 INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_2/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_3/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_4/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_5/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_6/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/avg_pool/: (?, 16, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_2/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_3/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_4/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_2/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_3/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_4/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_5/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_6/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/avg_pool/: (?, 32, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_2/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_3/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_4/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_5/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_6/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/global_avg_pool/: (?, 64) INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:SyncReplicasV2: replicas_to_aggregate=1; total_num_replicas=1 INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Restoring parameters from gs://path/model_dir/model.ckpt-0 2017-08-01 19:59:37.560775: I tensorflow/core/distributed_runtime/master_session.cc:999] Start master session 156fcb55fe6648d6 with config: intra_op_parallelism_threads: 1 gpu_options { per_process_gpu_memory_fraction: 1 } allow_soft_placement: true INFO:tensorflow:Saving checkpoints for 1 into gs://path/model_dir/model.ckpt. INFO:tensorflow:loss = 1.20682, step = 1 INFO:tensorflow:loss = 1.20682, learning_rate = 0.1 INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_2/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_3/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_4/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_5/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_6/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/avg_pool/: (?, 16, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_2/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_3/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_4/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_5/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_6/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/avg_pool/: (?, 32, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_2/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_3/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_4/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_5/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_6/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/global_avg_pool/: (?, 64) INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:SyncReplicasV2: replicas_to_aggregate=2; total_num_replicas=2 INFO:tensorflow:Starting evaluation at 2017-08-01-20:00:14 2017-08-01 20:00:15.745881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -&gt; (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0) 2017-08-01 20:00:15.745949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -&gt; (device: 1, name: Tesla K80, pci bus id: 0000:00:05.0) 2017-08-01 20:00:15.745958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:2) -&gt; (device: 2, name: Tesla K80, pci bus id: 0000:00:06.0) 2017-08-01 20:00:15.745964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:3) -&gt; (device: 3, name: Tesla K80, pci bus id: 0000:00:07.0) 2017-08-01 20:00:15.745969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:4) -&gt; (device: 4, name: Tesla K80, pci bus id: 0000:00:08.0) 2017-08-01 20:00:15.745975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:5) -&gt; (device: 5, name: Tesla K80, pci bus id: 0000:00:09.0) 2017-08-01 20:00:15.745987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:6) -&gt; (device: 6, name: Tesla K80, pci bus id: 0000:00:0a.0) 2017-08-01 20:00:15.745997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:7) -&gt; (device: 7, name: Tesla K80, pci bus id: 0000:00:0b.0) INFO:tensorflow:Restoring parameters from gs://path/model_dir/model.ckpt-10023 INFO:tensorflow:Evaluation [1/100] INFO:tensorflow:Evaluation [2/100] INFO:tensorflow:Evaluation [3/100] INFO:tensorflow:Evaluation [4/100] INFO:tensorflow:Evaluation [5/100] INFO:tensorflow:Evaluation [6/100] INFO:tensorflow:Evaluation [7/100] INFO:tensorflow:Evaluation [8/100] INFO:tensorflow:Evaluation [9/100] INFO:tensorflow:Evaluation [10/100] INFO:tensorflow:Evaluation [11/100] INFO:tensorflow:Evaluation [12/100] INFO:tensorflow:Evaluation [13/100] ... INFO:tensorflow:Evaluation [100/100] INFO:tensorflow:Finished evaluation at 2017-08-01-20:00:31 INFO:tensorflow:Saving dict for global step 1: accuracy = 0.0994, global_step = 1, loss = 630.425  Worker​ Run this on worker: Runs an Experiment in sync mode on 4 GPUs using CPU as parameter server for 40000 steps. It will run evaluation a couple of times during training. Make sure the model_dir is the same as defined on the TF_CONFIG. python cifar10_main.py --data-dir=gs://path/cifar-10-data \\ --job-dir=gs://path/model_dir/ \\ --num-gpus=4 \\ --train-steps=40000 \\ --sync  Output: INFO:tensorflow:Using model_dir in TF_CONFIG: gs://path/model_dir/ INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 1, '_keep_checkpoint_max': 5, '_task_type': u'worker', '_is_chief': False, '_cluster_spec': &lt;tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6918438e10&gt;, '_model_dir': 'gs://&lt;path&gt;/model_dir/', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': intra_op_parallelism_threads: 1 gpu_options { } allow_soft_placement: true , '_tf_random_seed': None, '_environment': u'cloud', '_num_worker_replicas': 1, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1.0 } ... 2017-08-01 19:59:26.496208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:04.0 Total memory: 11.17GiB Free memory: 11.09GiB 2017-08-01 19:59:26.775660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:05.0 Total memory: 11.17GiB Free memory: 11.10GiB ... 2017-08-01 19:59:29.675171: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:8000 INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_1/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_2/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_3/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_4/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_5/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage/residual_v1_6/: (?, 16, 32, 32) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/avg_pool/: (?, 16, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_2/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_3/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_4/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_1/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_2/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_3/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_4/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_5/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_1/residual_v1_6/: (?, 32, 16, 16) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/avg_pool/: (?, 32, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_1/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_2/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_3/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_4/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_5/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/stage_2/residual_v1_6/: (?, 64, 8, 8) INFO:tensorflow:image after unit resnet/tower_0/global_avg_pool/: (?, 64) INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:SyncReplicasV2: replicas_to_aggregate=2; total_num_replicas=2 INFO:tensorflow:Create CheckpointSaverHook. 2017-07-31 22:38:04.629150: I tensorflow/core/distributed_runtime/master.cc:209] CreateSession still waiting for response from worker: /job:master/replica:0/task:0 2017-07-31 22:38:09.263492: I tensorflow/core/distributed_runtime/master_session.cc:999] Start master session cc58f93b1e259b0c with config: intra_op_parallelism_threads: 1 gpu_options { per_process_gpu_memory_fraction: 1 } allow_soft_placement: true INFO:tensorflow:loss = 5.82382, step = 0 INFO:tensorflow:loss = 5.82382, learning_rate = 0.8 INFO:tensorflow:Average examples/sec: 1116.92 (1116.92), step = 10 INFO:tensorflow:Average examples/sec: 1233.73 (1377.83), step = 20 INFO:tensorflow:Average examples/sec: 1485.43 (2509.3), step = 30 INFO:tensorflow:Average examples/sec: 1680.27 (2770.39), step = 40 INFO:tensorflow:Average examples/sec: 1825.38 (2788.78), step = 50 INFO:tensorflow:Average examples/sec: 1929.32 (2697.27), step = 60 INFO:tensorflow:Average examples/sec: 2015.17 (2749.05), step = 70 INFO:tensorflow:loss = 37.6272, step = 79 (19.554 sec) INFO:tensorflow:loss = 37.6272, learning_rate = 0.8 (19.554 sec) INFO:tensorflow:Average examples/sec: 2074.92 (2618.36), step = 80 INFO:tensorflow:Average examples/sec: 2132.71 (2744.13), step = 90 INFO:tensorflow:Average examples/sec: 2183.38 (2777.21), step = 100 INFO:tensorflow:Average examples/sec: 2224.4 (2739.03), step = 110 INFO:tensorflow:Average examples/sec: 2240.28 (2431.26), step = 120 INFO:tensorflow:Average examples/sec: 2272.12 (2739.32), step = 130 INFO:tensorflow:Average examples/sec: 2300.68 (2750.03), step = 140 INFO:tensorflow:Average examples/sec: 2325.81 (2745.63), step = 150 INFO:tensorflow:Average examples/sec: 2347.14 (2721.53), step = 160 INFO:tensorflow:Average examples/sec: 2367.74 (2754.54), step = 170 INFO:tensorflow:loss = 27.8453, step = 179 (18.893 sec) ...  PS​ Run this on ps: The ps will not do training so most of the arguments won't affect the execution python cifar10_main.py --job-dir=gs://path/model_dir/  Output: INFO:tensorflow:Using model_dir in TF_CONFIG: gs://path/model_dir/ INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 1, '_keep_checkpoint_max': 5, '_task_type': u'ps', '_is_chief': False, '_cluster_spec': &lt;tensorflow.python.training.server_lib.ClusterSpec object at 0x7f48f1addf90&gt;, '_model_dir': 'gs://path/model_dir/', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': intra_op_parallelism_threads: 1 gpu_options { } allow_soft_placement: true , '_tf_random_seed': None, '_environment': u'cloud', '_num_worker_replicas': 1, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1.0 } , '_evaluation_master': '', '_master': u'grpc://master-ip:8000'} 2017-07-31 22:54:58.928088: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job master -&gt; {0 -&gt; master-ip:8000} 2017-07-31 22:54:58.928153: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -&gt; {0 -&gt; localhost:8000} 2017-07-31 22:54:58.928160: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -&gt; {0 -&gt; worker-ip:8000} 2017-07-31 22:54:58.929873: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:8000  "},{"title":"Visualizing results with TensorBoard​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#visualizing-results-with-tensorboard","content":"When using Estimators you can also visualize your data in TensorBoard, with no changes in your code. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. You'll see something similar to this if you &quot;point&quot; TensorBoard to thejob dir parameter you used to train or evaluate your model. Check TensorBoard during training or after it. Just point TensorBoard to the model_dir you chose on the previous step. tensorboard --log-dir=&quot;&lt;job dir&gt;&quot;  "},{"title":"Warnings​","type":1,"pageTitle":"README","url":"docs/0.6.0/userDocs/yarn/docker/tensorflow/with-cifar10-models/ubuntu-18.04/cifar10_estimator_tf_1.13.1/#warnings","content":"When running cifar10_main.py with --sync argument you may see an error similar to: File &quot;cifar10_main.py&quot;, line 538, in &lt;module&gt; tf.app.run() File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py&quot;, line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File &quot;cifar10_main.py&quot;, line 518, in main hooks), run_config=config) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py&quot;, line 210, in run return _execute_schedule(experiment, schedule) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py&quot;, line 47, in _execute_schedule return task() File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py&quot;, line 501, in train_and_evaluate hooks=self._eval_hooks) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py&quot;, line 681, in _call_evaluate hooks=hooks) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py&quot;, line 292, in evaluate name=name) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py&quot;, line 638, in _evaluate_model features, labels, model_fn_lib.ModeKeys.EVAL) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py&quot;, line 545, in _call_model_fn features=features, labels=labels, **kwargs) File &quot;cifar10_main.py&quot;, line 331, in _resnet_model_fn gradvars, global_step=tf.train.get_global_step()) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/sync_replicas_optimizer.py&quot;, line 252, in apply_gradients variables.global_variables()) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py&quot;, line 170, in wrapped return _add_should_use_warning(fn(*args, **kwargs)) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py&quot;, line 139, in _add_should_use_warning wrapped = TFShouldUseWarningWrapper(x) File &quot;/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py&quot;, line 96, in __init__ stack = [s.strip() for s in traceback.format_stack()]  This should not affect your training, and should be fixed on the next releases. "},{"title":"Docker Images for TensorFlow","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF","content":"","keywords":""},{"title":"How to create docker images to run Tensorflow on YARN​","type":1,"pageTitle":"Docker Images for TensorFlow","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF#how-to-create-docker-images-to-run-tensorflow-on-yarn","content":"Dockerfile to run Tensorflow on YARN need two part: Base libraries which Tensorflow depends on 1) OS base image, for example ubuntu:18.04 2) Tensorflow depended libraries and packages. For example python, scipy. For GPU support, need cuda, cudnn, etc. 3) Tensorflow package. Libraries to access HDFS 1) JDK 2) Hadoop Here's an example of a base image (w/o GPU support) to install Tensorflow: FROM ubuntu:18.04 # Pick up some TF dependencies RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \\ build-essential \\ curl \\ libfreetype6-dev \\ libpng-dev \\ libzmq3-dev \\ pkg-config \\ python \\ python-dev \\ rsync \\ software-properties-common \\ unzip \\ &amp;&amp; \\ apt-get clean &amp;&amp; \\ rm -rf /var/lib/apt/lists/* RUN export DEBIAN_FRONTEND=noninteractive &amp;&amp; apt-get update &amp;&amp; apt-get install -yq krb5-user libpam-krb5 &amp;&amp; apt-get clean RUN curl -O https://bootstrap.pypa.io/get-pip.py &amp;&amp; \\ python get-pip.py &amp;&amp; \\ rm get-pip.py RUN pip --no-cache-dir install \\ Pillow \\ h5py \\ ipykernel \\ jupyter \\ matplotlib \\ numpy \\ pandas \\ scipy \\ sklearn \\ &amp;&amp; \\ python -m ipykernel.kernelspec RUN pip --no-cache-dir install \\ http://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.13.1-cp27-none-linux_x86_64.whl  On top of above image, add files, install packages to access HDFS RUN apt-get update &amp;&amp; apt-get install -y openjdk-8-jdk wget # Install hadoop ENV HADOOP_VERSION=&quot;2.9.2&quot; RUN wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz RUN tar zxf hadoop-${HADOOP_VERSION}.tar.gz RUN ln -s hadoop-${HADOOP_VERSION} hadoop-current RUN rm hadoop-${HADOOP_VERSION}.tar.gz  Build and push to your own docker registry: Use docker build ... and docker push ... to finish this step. "},{"title":"Use examples to build your own Tensorflow docker images​","type":1,"pageTitle":"Docker Images for TensorFlow","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF#use-examples-to-build-your-own-tensorflow-docker-images","content":"We provided following examples for you to build tensorflow docker images. For Tensorflow 1.13.1 (Precompiled to CUDA 10.x) docker/tensorflow/base/ubuntu-18.04/Dockerfile.cpu.tf_1.13.1: Tensorflow 1.13.1 supports CPU only.docker/tensorflow/with-cifar10-models/ubuntu-18.04/Dockerfile.cpu.tf_1.13.1: Tensorflow 1.13.1 supports CPU only, and included modelsdocker/tensorflow/base/ubuntu-18.04/Dockerfile.gpu.tf_1.13.1: Tensorflow 1.13.1 supports GPU, which is prebuilt to CUDA10.docker/tensorflow/with-cifar10-models/ubuntu-18.04/Dockerfile.gpu.tf_1.13.1: Tensorflow 1.13.1 supports GPU, which is prebuilt to CUDA10, with models. "},{"title":"Build Docker images​","type":1,"pageTitle":"Docker Images for TensorFlow","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF#build-docker-images","content":""},{"title":"Manually build Docker image:​","type":1,"pageTitle":"Docker Images for TensorFlow","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF#manually-build-docker-image","content":"Under docker/ directory, run build-all.sh to build Docker images. It will build following images: tf-1.13.1-gpu-base:0.0.1 for base Docker image which includes Hadoop, Tensorflow, GPU base libraries.tf-1.13.1-gpu-base:0.0.1 for base Docker image which includes Hadoop. Tensorflow.tf-1.13.1-gpu:0.0.1 which includes cifar10 modeltf-1.13.1-cpu:0.0.1 which inclues cifar10 model (cpu only). "},{"title":"Use prebuilt images​","type":1,"pageTitle":"Docker Images for TensorFlow","url":"docs/0.6.0/userDocs/yarn/WriteDockerfileTF#use-prebuilt-images","content":"(No liability) You can also use prebuilt images for convenience: hadoopsubmarine/tf-1.13.1-gpu:0.0.1hadoopsubmarine/tf-1.13.1-cpu:0.0.1 "},{"title":"YARN Runtime Quick Start Guide","type":0,"sectionRef":"#","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide","content":"","keywords":""},{"title":"Prerequisite​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#prerequisite","content":"Check out the Running Submarine on YARN "},{"title":"Build your own Docker image​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#build-your-own-docker-image","content":"When you follow the documents below, and want to build your own Docker image for Tensorflow/PyTorch/MXNet? Please check out Build your Docker image for more details. "},{"title":"Launch TensorFlow Application:​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#launch-tensorflow-application","content":""},{"title":"Without Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#without-docker","content":"You need: Build a Python virtual environment with TensorFlow 1.13.1 installedA cluster with Hadoop 2.9 or above. "},{"title":"Building a Python virtual environment with TensorFlow​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#building-a-python-virtual-environment-with-tensorflow","content":"TonY requires a Python virtual environment zip with TensorFlow and any needed Python libraries already installed. wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz # Make sure to install using Python 3, as TensorFlow only provides Python 3 artifacts python virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate pip install tensorflow==1.13.1 zip -r myvenv.zip venv deactivate  The above commands will produced a myvenv.zip and it will be used in below example. There's no need to copy it to other nodes. And it is not needed when using Docker to run the job. Note: If you require a version of TensorFlow and TensorBoard prior to 1.13.1, take a look at this issue. "},{"title":"Get the training examples​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#get-the-training-examples","content":"Get mnist_distributed.py from https://github.com/linkedin/TonY/tree/master/tony-examples/mnist-tensorflow SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name tf-job-001 \\ --framework tensorflow \\ --verbose \\ --input_path &quot;&quot; \\ --num_workers 2 \\ --worker_resources memory=1G,vcores=1 \\ --num_ps 1 \\ --ps_resources memory=1G,vcores=1 \\ --worker_launch_cmd &quot;myvenv.zip/venv/bin/python mnist_distributed.py --steps 2 --data_dir /tmp/data --working_dir /tmp/mode&quot; \\ --ps_launch_cmd &quot;myvenv.zip/venv/bin/python mnist_distributed.py --steps 2 --data_dir /tmp/data --working_dir /tmp/mode&quot; \\ --insecure \\ --conf tony.containers.resources=path-to/myvenv.zip#archive,path-to/mnist_distributed.py,path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  You should then be able to see links and status of the jobs from command line: 2019-04-22 20:30:42,611 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for ps 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 1 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi 2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: FINISHED 2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: FINISHED 2019-04-22 20:30:44,626 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: FINISHED  "},{"title":"With Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#with-docker","content":"SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name tf-job-001 \\ --framework tensorflow \\ --docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.1 \\ --input_path hdfs://pi-aw:9000/dataset/cifar-10-data \\ --worker_resources memory=3G,vcores=2 \\ --worker_launch_cmd &quot;export CLASSPATH=\\$(/hadoop-3.1.0/bin/hadoop classpath --glob) &amp;&amp; cd /test/models/tutorials/image/cifar10_estimator &amp;&amp; python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16 --train-batch-size=16 --variable-strategy=CPU --num-gpus=0 --sync&quot; \\ --env JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \\ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \\ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \\ --env HADOOP_HOME=/hadoop-3.1.0 \\ --env HADOOP_YARN_HOME=/hadoop-3.1.0 \\ --env HADOOP_COMMON_HOME=/hadoop-3.1.0 \\ --env HADOOP_HDFS_HOME=/hadoop-3.1.0 \\ --env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \\ --conf tony.containers.resources=path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  Notes:​ 1) DOCKER_JAVA_HOME points to JAVA_HOME inside Docker image. 2) DOCKER_HADOOP_HDFS_HOME points to HADOOP_HDFS_HOME inside Docker image. We removed TonY submodule after applying SUBMARINE-371 and changed to use TonY dependency directly. After Submarine v0.2.0, there is a uber jar submarine-all-${SUBMARINE_VERSION}-hadoop-${HADOOP_VERSION}.jar released together with the submarine-core-${SUBMARINE_VERSION}.jar, submarine-yarnservice-runtime-${SUBMARINE_VERSION}.jar and submarine-tony-runtime-${SUBMARINE_VERSION}.jar.  "},{"title":"Launch PyTorch Application:​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#launch-pytorch-application","content":""},{"title":"Without Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#without-docker-1","content":"You need: Build a Python virtual environment with PyTorch 0.4.0+ installedA cluster with Hadoop 2.9 or above. "},{"title":"Building a Python virtual environment with PyTorch​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#building-a-python-virtual-environment-with-pytorch","content":"TonY requires a Python virtual environment zip with PyTorch and any needed Python libraries already installed. wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz python virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate pip install pytorch==0.4.0 zip -r myvenv.zip venv deactivate  "},{"title":"Get the training examples​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#get-the-training-examples-1","content":"Get mnist_distributed.py from https://github.com/linkedin/TonY/tree/master/tony-examples/mnist-pytorch SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name PyTorch-job-001 \\ --framework pytorch --num_workers 2 \\ --worker_resources memory=3G,vcores=2 \\ --num_ps 2 \\ --ps_resources memory=3G,vcores=2 \\ --worker_launch_cmd &quot;myvenv.zip/venv/bin/python mnist_distributed.py&quot; \\ --ps_launch_cmd &quot;myvenv.zip/venv/bin/python mnist_distributed.py&quot; \\ --insecure \\ --conf tony.containers.resources=path-to/myvenv.zip#archive,path-to/mnist_distributed.py, \\ path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  You should then be able to see links and status of the jobs from command line: 2019-04-22 20:30:42,611 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: RUNNING 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for ps 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi 2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 1 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi 2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: FINISHED 2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: FINISHED 2019-04-22 20:30:44,626 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: FINISHED  "},{"title":"With Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#with-docker-1","content":"SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name PyTorch-job-001 \\ --framework pytorch --docker_image pytorch-latest-gpu:0.0.1 \\ --input_path &quot;&quot; \\ --num_workers 1 \\ --worker_resources memory=3G,vcores=2 \\ --worker_launch_cmd &quot;cd /test/ &amp;&amp; python cifar10_tutorial.py&quot; \\ --env JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \\ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.2 \\ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \\ --env HADOOP_HOME=/hadoop-3.1.2 \\ --env HADOOP_YARN_HOME=/hadoop-3.1.2 \\ --env HADOOP_COMMON_HOME=/hadoop-3.1.2 \\ --env HADOOP_HDFS_HOME=/hadoop-3.1.2 \\ --env HADOOP_CONF_DIR=/hadoop-3.1.2/etc/hadoop \\ --conf tony.containers.resources=path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  "},{"title":"Launch MXNet Application:​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#launch-mxnet-application","content":""},{"title":"Without Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#without-docker-2","content":"You need: Build a Python virtual environment with MXNet installedA cluster with Hadoop 2.9 or above. "},{"title":"Building a Python virtual environment with MXNet​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#building-a-python-virtual-environment-with-mxnet","content":"TonY requires a Python virtual environment zip with MXNet and any needed Python libraries already installed. wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz python virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate pip install mxnet==1.5.1 zip -r myvenv.zip venv deactivate  "},{"title":"Get the training examples​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#get-the-training-examples-2","content":"Get image_classification.py from this link SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name MXNet-job-001 \\ --framework mxnet --input_path &quot;&quot; \\ --num_workers 2 \\ --worker_resources memory=3G,vcores=2 \\ --worker_launch_cmd &quot;myvenv.zip/venv/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --num_ps 2 \\ --ps_resources memory=3G,vcores=2 \\ --ps_launch_cmd &quot;myvenv.zip/venv/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --num_schedulers=1 \\ --scheduler_resources memory=1G,vcores=1 \\ --scheduler_launch_cmd=&quot;myvenv.zip/venv/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --insecure \\ --conf tony.containers.resources=path-to/myvenv.zip#archive,path-to/image_classification.py, \\ path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  You should then be able to see links and status of the jobs from command line: 2020-04-16 20:23:43,834 INFO tony.TonyClient: Task status updated: [TaskInfo] name: server, index: 1, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000004/pi status: RUNNING 2020-04-16 20:23:43,834 INFO tony.TonyClient: Task status updated: [TaskInfo] name: server, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000003/pi status: RUNNING 2020-04-16 20:23:43,834 INFO tony.TonyClient: Task status updated: [TaskInfo] name: worker, index: 1, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000006/pi status: RUNNING 2020-04-16 20:23:43,834 INFO tony.TonyClient: Task status updated: [TaskInfo] name: worker, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000005/pi status: RUNNING 2020-04-16 20:23:43,834 INFO tony.TonyClient: Task status updated: [TaskInfo] name: scheduler, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000002/pi status: RUNNING 2020-04-16 20:23:43,839 INFO tony.TonyClient: Logs for scheduler 0 at: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000002/pi 2020-04-16 20:23:43,839 INFO tony.TonyClient: Logs for server 0 at: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000003/pi 2020-04-16 20:23:43,840 INFO tony.TonyClient: Logs for server 1 at: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000004/pi 2020-04-16 20:23:43,840 INFO tony.TonyClient: Logs for worker 0 at: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000005/pi 2020-04-16 20:23:43,840 INFO tony.TonyClient: Logs for worker 1 at: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000006/pi 2020-04-16 21:02:09,723 INFO tony.TonyClient: Task status updated: [TaskInfo] name: scheduler, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000002/pi status: SUCCEEDED 2020-04-16 21:02:09,736 INFO tony.TonyClient: Task status updated: [TaskInfo] name: worker, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000005/pi status: SUCCEEDED 2020-04-16 21:02:09,737 INFO tony.TonyClient: Task status updated: [TaskInfo] name: server, index: 1, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000004/pi status: SUCCEEDED 2020-04-16 21:02:09,737 INFO tony.TonyClient: Task status updated: [TaskInfo] name: worker, index: 1, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000006/pi status: SUCCEEDED 2020-04-16 21:02:09,737 INFO tony.TonyClient: Task status updated: [TaskInfo] name: server, index: 0, url: http://pi-aw:8042/node/containerlogs/container_1587037749540_0005_01_000003/pi status: SUCCEEDED  "},{"title":"With Docker​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#with-docker-2","content":"You could refer to this sample Dockerfile for building your own Docker image. SUBMARINE_VERSION=&lt;REPLACE_VERSION&gt; SUBMARINE_HADOOP_VERSION=3.1 CLASSPATH=$(hadoop classpath --glob):path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar \\ java org.apache.submarine.client.cli.Cli job run --name MXNet-job-001 \\ --framework mxnet --docker_image &lt;your_docker_image&gt; \\ --input_path &quot;&quot; \\ --num_schedulers 1 \\ --scheduler_resources memory=1G,vcores=1 \\ --scheduler_launch_cmd &quot;/usr/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --num_workers 2 \\ --worker_resources memory=2G,vcores=1 \\ --worker_launch_cmd &quot;/usr/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --num_ps 2 \\ --ps_resources memory=2G,vcores=1 \\ --ps_launch_cmd &quot;/usr/bin/python image_classification.py --dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync&quot; \\ --verbose \\ --insecure \\ --conf tony.containers.resources=path-to/image_classification.py,path-to/submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar  "},{"title":"Use YARN Service to run Submarine: Deprecated​","type":1,"pageTitle":"YARN Runtime Quick Start Guide","url":"docs/0.6.0/userDocs/yarn/YARNRuntimeGuide#use-yarn-service-to-run-submarine-deprecated","content":"Historically, Submarine supports to use YARN Service to submit deep learning jobs. Now we stop supporting it because YARN service is not actively developed by community, and extra dependencies such as RegistryDNS/ATS-v2 causes lots of issues for setup. As of now, you can still use YARN service to run Submarine, but code will be removed in the future release. We will only support use TonY when use Submarine on YARN. "},{"title":"Environment REST API","type":0,"sectionRef":"#","url":"docs/next/api/environment","content":"","keywords":""},{"title":"Create Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#create-environment","content":"POST /api/v1/environment "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#parameters","content":"Put EnvironmentSpec in request body. EnvironmentSpec​ Field Name\tType\tDescription\tRequiredname\tString\tEnvironment name.\to dockerImage\tString\tDocker image name.\to kernelSpec\tKernelSpec\tEnvironment spec.\to description\tString\tDescription of environment.\tx KernelSpec​ Field Name\tType\tDescription\tRequiredname\tString\tKernel name.\to channels\tList&lt;String&gt;\tNames of the channels.\to condaDependencies\tList&lt;String&gt;\tList of kernel conda dependencies.\to pipDependencies\tList&lt;String&gt;\tList of kernel pip dependencies.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.7.0&quot;, &quot;pyarrow==0.17.0&quot;] } } ' http://127.0.0.1:32080/api/v1/environment  Example Response { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.7.0&quot;, &quot;pyarrow==0.17.0&quot;] } } } }  "},{"title":"List environment​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#list-environment","content":"GET /api/v1/environment "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/environment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;environmentId&quot;:&quot;environment_1600862964725_0002&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-gpu-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-gpu-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1600862964725_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } } ], &quot;attributes&quot;:{} }  "},{"title":"Get environment​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#get-environment","content":"GET /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tEnvironment name.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch environment​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#patch-environment","content":"PATCH /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tEnvironment name.\to dockerImage\tString\tbody\tDocker image name.\to kernelSpec\tKernelSpec\tbody\tEnvironment spec.\to description\tString\tbody\tDescription of environment. This field is optional.\tx "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } ' http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0004&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7_updated&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  note dockerImage, &quot;name&quot; (of kernelSpec), &quot;channels&quot;, &quot;condaDependencies&quot;, &quot;pipDependencies&quot; etc can be updated using this API. &quot;name&quot; of environmentSpec is not supported. "},{"title":"Delete environment​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#delete-environment","content":"GET /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tEnvironment name.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/next/api/environment#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Experiment Template REST API","type":0,"sectionRef":"#","url":"docs/next/api/experiment-template","content":"","keywords":""},{"title":"Create experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#create-experiment-template","content":"POST /api/v1/template "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#parameters","content":"Field Name\tType\tIn\tDescriptionname\tString\tbody\tExperiment template name. This is required. author\tString\tbody\tAuthor name. description\tString\tbody\tDescription of the experiment template. parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template. experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template. ExperimentTemplateParamSpec​ Field Name\tType\tDescriptionname\tString\tParameter name. required\tBoolean\ttrue / false. Whether the parameter is required. description\tString\tDescription of the parameter. value\tString\tValue of the parameter. ExperimentSpec​ Field Name\tType\tDescriptionmeta\tExperimentMeta\tMeta data of the experiment template. environment\tEnvironmentSpec\tEnvironment of the experiment template. spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods. code\tCodeSpec\tExperiment codespec. ExperimentMeta​ Field Name\tType\tDescriptionname\tString\tExperiment Name. namespace\tString\tExperiment namespace. framework\tString\tExperiment framework. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. EnvironmentSpec​ See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescriptionreplicas\tInteger\tNumbers of replicas. resoureces\tString\tResouces of the task name\tString\tTask name. image\tString\tImage name. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. CodeSpec​ Field Name\tType\tDescriptionsyncMode\tString\tsync mode of code spec. url\tString\turl of code spec. "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#list-experiment-template","content":"GET /api/v1/template "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ [{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }], &quot;attributes&quot;:{} }  "},{"title":"Get experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#get-experiment-template","content":"GET /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tExperiment template name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1650788898882 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;:[ { &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; } ], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:null, &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{ &quot;ENV1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Patch template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#patch-template","content":"PATCH /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tExperiment template name.\to author\tString\tbody\tAuthor name.\to description\tString\tbody\tDescription of the experiment template.\tx parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template.\to experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author-new&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  note &quot;description&quot;, &quot;parameters&quot;, &quot;experimentSpec&quot;, &quot;author&quot; etc can be updated using this API. &quot;name&quot; of experiment template is not supported. "},{"title":"Delete template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#delete-template","content":"GET /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tExperiment template name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Use template to create a experiment​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#use-template-to-create-a-experiment","content":"POST /api/v1/experiment/{template_name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tExperiment template name.\to params\tMap&lt;String, String&gt;\tbody\tParameters of the experiment including experiment_name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/next/api/experiment-template#example-5","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;tf-mnist&quot;, &quot;params&quot;: { &quot;learning_rate&quot;:&quot;0.01&quot;, &quot;batch_size&quot;:&quot;150&quot;, &quot;experiment_name&quot;:&quot;newexperiment1&quot; } } ' http://127.0.0.1:32080/api/v1/experiment/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;name&quot;:&quot;newexperiment1&quot;, &quot;uid&quot;:&quot;b895985c-411c-4e89-90e0-c60a2a8a4235&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:21:31.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;newexperiment1&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Experiment REST API","type":0,"sectionRef":"#","url":"docs/next/api/experiment","content":"","keywords":""},{"title":"Create Experiment (Using Anonymous/Embedded Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#create-experiment-using-anonymousembedded-environment","content":"POST /api/v1/experiment "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#parameters","content":"Put ExperimentSpec in request body. ExperimentSpec​ Field Name\tType\tDescription\tRequiredmeta\tExperimentMeta\tMeta data of the experiment template.\to environment\tEnvironmentSpec\tEnvironment of the experiment template.\to spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods.\to code\tCodeSpec\tExperiment codespec.\tx ExperimentMeta​ Field Name\tType\tDescription\tRequiredname\tString\tExperiment name.\to namespace\tString\tExperiment namespace.\to framework\tString\tExperiemnt framework.\to cmd\tString\tCommand.\to envVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx EnvironmentSpec​ There are two types of environment: Anonymous and Predefined. Anonymous environment: only specify dockerImage in environment spec. The container will be built on the docker image.Embedded environment: specify name in environment spec. The container will be built on the existing environment (including dockerImage and kernalSpec). See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescription\tRequiredreplicas\tInteger\tNumbers of replicas.\to resoureces\tString\tResouces of the task\to name\tString\tTask name.\to image\tString\tImage name.\to cmd\tString\tCommand.\tx envVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx CodeSpec​ Currently only support pulling from github. HDFS, NFS and s3 are in development Field Name\tType\tDescription\tRequiredsyncMode\tString (git|hdfs|nfs|s3)\tsync mode of code spec.\to url\tString\turl of code spec.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647192232698-0001&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2022-03-14T16:03:10.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647192232698-0001&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#list-experiment","content":"GET /api/v1/experiment "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;cf465781-6310-46d2-92b4-d20161c77d08&quot;, &quot;status&quot;:&quot;Running&quot;, &quot;acceptedTime&quot;:&quot;2022-03-18T15:51:04.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-03-18T15:51:05.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-03-18T15:51:17.000+08:00&quot;, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } ], &quot;attributes&quot;:{} }  "},{"title":"Get experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#get-experiment","content":"GET /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;cf465781-6310-46d2-92b4-d20161c77d08&quot;, &quot;status&quot;:&quot;Running&quot;, &quot;acceptedTime&quot;:&quot;2022-03-18T15:51:04.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-03-18T15:51:05.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-03-18T15:51:17.000+08:00&quot;, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#patch-experiment","content":"PATCH /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to meta\tExperimentMeta\tbody\tMeta data of the experiment template.\to environment\tEnvironmentSpec\tbody\tEnvironment of the experiment template.\to spec\tMap&lt;String, ExperimentTaskSpec&gt;\tbody\tSpec of pods.\to code\tCodeSpec\tbody\tTODO\tx "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Succeeded&quot;, &quot;acceptedTime&quot;:&quot;2022-04-04T16:39:25.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-04-04T16:39:26.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-04-04T16:39:35.000+08:00&quot;, &quot;finishedTime&quot;:&quot;2022-04-04T16:42:25.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1649061491590-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Delete experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#delete-experiment","content":"DELETE /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Deleted&quot;, &quot;acceptedTime&quot;:null, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#list-experiment-log","content":"GET /api/v1/experiment/logs "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-5","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;logContent&quot;:[ { &quot;podName&quot;:&quot;experiment-1647574374688-0002-ps-0&quot;, &quot;podLog&quot;:[] }, { &quot;podName&quot;:&quot;experiment-1647574374688-0002-worker-0&quot;, &quot;podLog&quot;:[ ] } ] } ], &quot;attributes&quot;:{} }  "},{"title":"Get experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#get-experiment-log","content":"GET /api/v1/experiment/logs/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/next/api/experiment#example-6","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;logContent&quot;:[ { &quot;podName&quot;:&quot;experiment-1647574374688-0002-ps-0&quot;, &quot;podLog&quot;:[ &quot;WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please write your own downloading logic.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use urllib or similar directly.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;2022-03-18 07:52:07.369276: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA&quot;, &quot;Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz&quot;, &quot;Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz&quot;, &quot;Accuracy at step 0: 0.0893&quot;, &quot;Accuracy at step 10: 0.6851&quot;, &quot;Accuracy at step 20: 0.8255&quot;, &quot;Accuracy at step 30: 0.8969&quot;, &quot;Accuracy at step 40: 0.9009&quot;, &quot;Accuracy at step 50: 0.9185&quot;, &quot;Accuracy at step 60: 0.923&quot;, &quot;Accuracy at step 70: 0.9181&quot;, &quot;Accuracy at step 80: 0.9344&quot;, &quot;Accuracy at step 90: 0.9265&quot;, &quot;Adding run metadata for 99&quot;, &quot;Accuracy at step 100: 0.9375&quot;, &quot;Accuracy at step 110: 0.9414&quot;, &quot;Accuracy at step 120: 0.9402&quot;, &quot;Accuracy at step 130: 0.9466&quot;, &quot;Accuracy at step 140: 0.9412&quot;, &quot;Accuracy at step 150: 0.9497&quot;, &quot;Accuracy at step 160: 0.9477&quot;, &quot;Accuracy at step 170: 0.9465&quot;, &quot;Accuracy at step 180: 0.9546&quot;, &quot;Accuracy at step 190: 0.9485&quot;, &quot;Adding run metadata for 199&quot;, &quot;Accuracy at step 200: 0.9534&quot;, &quot;Accuracy at step 210: 0.9581&quot;, &quot;Accuracy at step 220: 0.9418&quot;, &quot;Accuracy at step 230: 0.9551&quot;, &quot;Accuracy at step 240: 0.9472&quot;, &quot;Accuracy at step 250: 0.9555&quot;, &quot;Accuracy at step 260: 0.9569&quot;, &quot;Accuracy at step 270: 0.9596&quot;, &quot;Accuracy at step 280: 0.9588&quot;, &quot;Accuracy at step 290: 0.9618&quot;, &quot;Adding run metadata for 299&quot;, &quot;Accuracy at step 300: 0.9589&quot;, &quot;Accuracy at step 310: 0.9603&quot;, &quot;Accuracy at step 320: 0.9632&quot;, &quot;Accuracy at step 330: 0.956&quot;, &quot;Accuracy at step 340: 0.9531&quot;, &quot;Accuracy at step 350: 0.9535&quot;, &quot;Accuracy at step 360: 0.9517&quot;, &quot;Accuracy at step 370: 0.9607&quot;, &quot;Accuracy at step 380: 0.9629&quot;, &quot;Accuracy at step 390: 0.9553&quot;, &quot;Adding run metadata for 399&quot;, &quot;Accuracy at step 400: 0.9623&quot;, &quot;Accuracy at step 410: 0.9627&quot;, &quot;Accuracy at step 420: 0.9614&quot;, &quot;Accuracy at step 430: 0.9604&quot;, &quot;Accuracy at step 440: 0.9663&quot;, &quot;Accuracy at step 450: 0.9665&quot;, &quot;Accuracy at step 460: 0.958&quot;, &quot;Accuracy at step 470: 0.9643&quot;, &quot;Accuracy at step 480: 0.9636&quot;, &quot;Accuracy at step 490: 0.9648&quot;, &quot;Adding run metadata for 499&quot;, &quot;Accuracy at step 500: 0.9638&quot;, &quot;Accuracy at step 510: 0.9629&quot;, &quot;Accuracy at step 520: 0.9661&quot;, &quot;Accuracy at step 530: 0.9633&quot;, &quot;Accuracy at step 540: 0.9669&quot;, &quot;Accuracy at step 550: 0.9659&quot;, &quot;Accuracy at step 560: 0.9652&quot;, &quot;Accuracy at step 570: 0.9675&quot;, &quot;Accuracy at step 580: 0.9602&quot;, &quot;Accuracy at step 590: 0.9641&quot;, &quot;Adding run metadata for 599&quot;, &quot;Accuracy at step 600: 0.9688&quot;, &quot;Accuracy at step 610: 0.9638&quot;, &quot;Accuracy at step 620: 0.9622&quot;, &quot;Accuracy at step 630: 0.9601&quot;, &quot;Accuracy at step 640: 0.9636&quot;, &quot;Accuracy at step 650: 0.9674&quot;, &quot;Accuracy at step 660: 0.9613&quot;, &quot;Accuracy at step 670: 0.9706&quot;, &quot;Accuracy at step 680: 0.9691&quot;, &quot;Accuracy at step 690: 0.9687&quot;, &quot;Adding run metadata for 699&quot;, &quot;Accuracy at step 700: 0.9671&quot;, &quot;Accuracy at step 710: 0.9659&quot;, &quot;Accuracy at step 720: 0.9693&quot;, &quot;Accuracy at step 730: 0.9698&quot;, &quot;Accuracy at step 740: 0.9681&quot;, &quot;Accuracy at step 750: 0.9678&quot;, &quot;Accuracy at step 760: 0.9595&quot;, &quot;Accuracy at step 770: 0.9697&quot;, &quot;Accuracy at step 780: 0.9671&quot;, &quot;Accuracy at step 790: 0.9658&quot;, &quot;Adding run metadata for 799&quot;, &quot;Accuracy at step 800: 0.9658&quot;, &quot;Accuracy at step 810: 0.9702&quot;, &quot;Accuracy at step 820: 0.9662&quot;, &quot;Accuracy at step 830: 0.9671&quot;, &quot;Accuracy at step 840: 0.9731&quot;, &quot;Accuracy at step 850: 0.9699&quot;, &quot;Accuracy at step 860: 0.9702&quot;, &quot;Accuracy at step 870: 0.9686&quot;, &quot;Accuracy at step 880: 0.9729&quot;, &quot;Accuracy at step 890: 0.968&quot;, &quot;Adding run metadata for 899&quot;, &quot;Accuracy at step 900: 0.9655&quot;, &quot;Accuracy at step 910: 0.9731&quot;, &quot;Accuracy at step 920: 0.9676&quot;, &quot;Accuracy at step 930: 0.9667&quot;, &quot;Accuracy at step 940: 0.9659&quot;, &quot;Accuracy at step 950: 0.9689&quot;, &quot;Accuracy at step 960: 0.9653&quot;, &quot;Accuracy at step 970: 0.9675&quot;, &quot;Accuracy at step 980: 0.974&quot;, &quot;Accuracy at step 990: 0.9723&quot;, &quot;Adding run metadata for 999&quot; ] }, { &quot;podName&quot;:&quot;experiment-1647574374688-0002-worker-0&quot;, &quot;podLog&quot;:[ &quot;WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please write your own downloading logic.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use urllib or similar directly.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;2022-03-18 07:52:07.369085: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA&quot;, &quot;Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz&quot;, &quot;Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz&quot;, &quot;Accuracy at step 0: 0.1348&quot;, &quot;Accuracy at step 10: 0.7419&quot;, &quot;Accuracy at step 20: 0.8574&quot;, &quot;Accuracy at step 30: 0.8959&quot;, &quot;Accuracy at step 40: 0.9135&quot;, &quot;Accuracy at step 50: 0.9187&quot;, &quot;Accuracy at step 60: 0.9276&quot;, &quot;Accuracy at step 70: 0.9332&quot;, &quot;Accuracy at step 80: 0.9399&quot;, &quot;Accuracy at step 90: 0.9376&quot;, &quot;Adding run metadata for 99&quot;, &quot;Accuracy at step 100: 0.9378&quot;, &quot;Accuracy at step 110: 0.9463&quot;, &quot;Accuracy at step 120: 0.9479&quot;, &quot;Accuracy at step 130: 0.9468&quot;, &quot;Accuracy at step 140: 0.9467&quot;, &quot;Accuracy at step 150: 0.9475&quot;, &quot;Accuracy at step 160: 0.947&quot;, &quot;Accuracy at step 170: 0.948&quot;, &quot;Accuracy at step 180: 0.9472&quot;, &quot;Accuracy at step 190: 0.954&quot;, &quot;Adding run metadata for 199&quot;, &quot;Accuracy at step 200: 0.9492&quot;, &quot;Accuracy at step 210: 0.9571&quot;, &quot;Accuracy at step 220: 0.954&quot;, &quot;Accuracy at step 230: 0.9557&quot;, &quot;Accuracy at step 240: 0.9557&quot;, &quot;Accuracy at step 250: 0.9591&quot;, &quot;Accuracy at step 260: 0.955&quot;, &quot;Accuracy at step 270: 0.9595&quot;, &quot;Accuracy at step 280: 0.9596&quot;, &quot;Accuracy at step 290: 0.9604&quot;, &quot;Adding run metadata for 299&quot;, &quot;Accuracy at step 300: 0.9622&quot;, &quot;Accuracy at step 310: 0.9529&quot;, &quot;Accuracy at step 320: 0.9609&quot;, &quot;Accuracy at step 330: 0.9613&quot;, &quot;Accuracy at step 340: 0.9571&quot;, &quot;Accuracy at step 350: 0.9599&quot;, &quot;Accuracy at step 360: 0.9553&quot;, &quot;Accuracy at step 370: 0.9546&quot;, &quot;Accuracy at step 380: 0.962&quot;, &quot;Accuracy at step 390: 0.96&quot;, &quot;Adding run metadata for 399&quot;, &quot;Accuracy at step 400: 0.9593&quot;, &quot;Accuracy at step 410: 0.9641&quot;, &quot;Accuracy at step 420: 0.9628&quot;, &quot;Accuracy at step 430: 0.9622&quot;, &quot;Accuracy at step 440: 0.9639&quot;, &quot;Accuracy at step 450: 0.9592&quot;, &quot;Accuracy at step 460: 0.9651&quot;, &quot;Accuracy at step 470: 0.9658&quot;, &quot;Accuracy at step 480: 0.9668&quot;, &quot;Accuracy at step 490: 0.9641&quot;, &quot;Adding run metadata for 499&quot;, &quot;Accuracy at step 500: 0.9641&quot;, &quot;Accuracy at step 510: 0.9561&quot;, &quot;Accuracy at step 520: 0.9628&quot;, &quot;Accuracy at step 530: 0.964&quot;, &quot;Accuracy at step 540: 0.9663&quot;, &quot;Accuracy at step 550: 0.9681&quot;, &quot;Accuracy at step 560: 0.968&quot;, &quot;Accuracy at step 570: 0.967&quot;, &quot;Accuracy at step 580: 0.9663&quot;, &quot;Accuracy at step 590: 0.9679&quot;, &quot;Adding run metadata for 599&quot;, &quot;Accuracy at step 600: 0.9666&quot;, &quot;Accuracy at step 610: 0.9648&quot;, &quot;Accuracy at step 620: 0.9682&quot;, &quot;Accuracy at step 630: 0.9691&quot;, &quot;Accuracy at step 640: 0.9683&quot;, &quot;Accuracy at step 650: 0.966&quot;, &quot;Accuracy at step 660: 0.9668&quot;, &quot;Accuracy at step 670: 0.9658&quot;, &quot;Accuracy at step 680: 0.9709&quot;, &quot;Accuracy at step 690: 0.9632&quot;, &quot;Adding run metadata for 699&quot;, &quot;Accuracy at step 700: 0.9697&quot;, &quot;Accuracy at step 710: 0.9632&quot;, &quot;Accuracy at step 720: 0.9641&quot;, &quot;Accuracy at step 730: 0.9659&quot;, &quot;Accuracy at step 740: 0.9654&quot;, &quot;Accuracy at step 750: 0.9694&quot;, &quot;Accuracy at step 760: 0.968&quot;, &quot;Accuracy at step 770: 0.9661&quot;, &quot;Accuracy at step 780: 0.969&quot;, &quot;Accuracy at step 790: 0.9663&quot;, &quot;Adding run metadata for 799&quot;, &quot;Accuracy at step 800: 0.9687&quot;, &quot;Accuracy at step 810: 0.9651&quot;, &quot;Accuracy at step 820: 0.9705&quot;, &quot;Accuracy at step 830: 0.9645&quot;, &quot;Accuracy at step 840: 0.9652&quot;, &quot;Accuracy at step 850: 0.9719&quot;, &quot;Accuracy at step 860: 0.9654&quot;, &quot;Accuracy at step 870: 0.964&quot;, &quot;Accuracy at step 880: 0.9645&quot;, &quot;Accuracy at step 890: 0.9615&quot;, &quot;Adding run metadata for 899&quot;, &quot;Accuracy at step 900: 0.9661&quot;, &quot;Accuracy at step 910: 0.9649&quot;, &quot;Accuracy at step 920: 0.9569&quot;, &quot;Accuracy at step 930: 0.9654&quot;, &quot;Accuracy at step 940: 0.9674&quot;, &quot;Accuracy at step 950: 0.971&quot;, &quot;Accuracy at step 960: 0.9684&quot;, &quot;Accuracy at step 970: 0.9648&quot;, &quot;Accuracy at step 980: 0.9693&quot;, &quot;Accuracy at step 990: 0.9627&quot;, &quot;Adding run metadata for 999&quot; ] } ] }, &quot;attributes&quot;:{} }  "},{"title":"Model Version REST API","type":0,"sectionRef":"#","url":"docs/next/api/model-version","content":"","keywords":""},{"title":"Create a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#create-a-model-version","content":"POST /api/v1/model-version?baseDir={baseDir} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters","content":"Field Name\tType\tIn\tDescription\tRequiredbaseDir\tString\tpath\texperiment directory path.\to name\tString\tbody\tregistered model name.\to experimentId\tString\tbody\tAdd a tag for the registered model.\to description\tString\tbody\tAdd description for the version of model.\tx tags\tList&lt;String&gt;\tbody\tAdd tags for the version of model.\tx "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example","content":""},{"title":"List model versions under a registered model​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#list-model-versions-under-a-registered-model","content":"GET /api/v1/model-version/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/model-version/register  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;List all model version instances&quot;, &quot;result&quot; : [ { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/1&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 1 }, { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/2&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 2 }, ], &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Get a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#get-a-model-version","content":"GET /api/v1/model-version/{name}/{version} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tRegistered model name.\to version\tString\tpath\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/model-version/register/1  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Get the model version instance&quot;, &quot;result&quot; : { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/1&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 1 }, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Patch a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#patch-a-model-version","content":"PATCH /api/v1/model-version "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tbody\tRegistered model name.\to version\tString\tbody\tRegistered model version.\to description\tString\tbody\tNew description.\tx currentStage\tString\tbody\tStage of the model.\tx dataset\tString\tbody\tDataset use in the model.\tx "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;register&quot;, &quot;version&quot;: 1, &quot;description&quot;: &quot;new_description&quot;, &quot;currentStage&quot;: &quot;production&quot;, &quot;dataset&quot;: &quot;new_dataset&quot; }' http://127.0.0.1:32080/api/v1/model-version  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Update the model version instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#delete-a-model-version","content":"DELETE /api/v1/model-version/{name}/{version} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tRegistered model name.\to version\tString\tpath\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/model-version/register/1  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete the model version instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Create a model version tag​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#create-a-model-version-tag","content":"POST /api/v1/model-version/tag?name={name}&amp;version={version}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-5","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tRegistered model name.\to version\tString\tquery\tRegistered model version.\to tag\tString\tquery\tTag of the registered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-5","content":"Example Request curl -X POST http://127.0.0.1:32080/api/v1/model-version/tag?name=register&amp;version=2&amp;tag=789  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Create a model version tag instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a model version tag​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#delete-a-model-version-tag","content":"DELETE /api/v1/model-version/tag?name={name}&amp;version={version}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#parameters-6","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tRegistered model name.\to version\tString\tquery\tRegistered model version.\to tag\tString\tquery\tTag of the registered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/next/api/model-version#example-6","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/model-version/tag?name=register&amp;version=2&amp;tag=789  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete a registered model tag instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Register Model REST API","type":0,"sectionRef":"#","url":"docs/next/api/register-model","content":"","keywords":""},{"title":"Create a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#create-a-registered-model","content":"POST /api/v1/registered-model "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters","content":"Field Name\tType\tDescription\tRequiredname\tString\tRegistered model name.\to description\tString\tRegistered model description.\tx tags\tList&lt;String&gt;\tRegistered model tags.\tx "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;example_name&quot;, &quot;description&quot;: &quot;example_description&quot;, &quot;tags&quot;: [&quot;123&quot;, &quot;456&quot;] } ' http://127.0.0.1:32080/api/v1/registered-model  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a registered model instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"List registered models​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#list-registered-models","content":"GET /api/v1/registered-model "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/registered-model  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;List all registered model instances&quot;, &quot;result&quot; : [ { &quot;creationTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;name&quot; : &quot;example_name&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, { &quot;creationTime&quot; : &quot;2021-12-16 10:16:25&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:16:25&quot;, &quot;name&quot; : &quot;example_name1&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;description&quot; : null, &quot;lastUpdatedTime&quot; : &quot;2021-12-14 12:49:33&quot;, &quot;name&quot; : &quot;register&quot;, &quot;tags&quot; : [] } ], &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Get a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#get-a-registered-model","content":"GET /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Get the registered model instance&quot;, &quot;result&quot; : { &quot;creationTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;name&quot; : &quot;example_name&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Patch a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#patch-a-registered-model","content":"PATCH /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to name\tString\tbody\tNew model name.\tx description\tString\tpath\tNew model description.\tx "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;new_name&quot;, &quot;description&quot;: &quot;new_description&quot; }' http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Update the registered model instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#delete-a-registered-model","content":"DELETE /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete the registered model instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Create a registered model tag​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#create-a-registered-model-tag","content":"POST /api/v1/registered-model/tag?name={name}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tregistered model name.\to tag\tString\tquery\tAdd a tag for the registered model.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-5","content":"Example Request curl -X POST http://127.0.0.1:32080/api/v1/registered-model/tag?name=example_name&amp;tag=example_tag  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a registered model tag instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Delete a registered model tag​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#delete-a-registered-model-tag","content":"DELETE /api/v1/registered-model/tag?name={name}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#parameters-5","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tregistered model name.\to tag\tString\tquery\tDelete a tag in the registered model.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/next/api/register-model#example-6","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/registered-model/tag?name=example_name&amp;tag=example_tag  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete a registered model tag instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Notebook REST API","type":0,"sectionRef":"#","url":"docs/next/api/notebook","content":"","keywords":""},{"title":"Create a notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#create-a-notebook-instance","content":"POST /api/v1/notebook "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#parameters","content":"NotebookSpec in request body. NotebookSpec​ Field Name\tType\tDescription\tRequiredmeta\tNotebookMeta\tMeta data of the notebook.\to environment\tEnvironmentSpec\tEnvironment of the experiment template.\to spec\tNotebookPodSpec\tSpec of the notebook pods.\to NotebookMeta​ Field Name\tType\tDescription\tRequiredname\tString\tNotebook name.\to namespace\tString\tNotebook namespace.\to ownerId\tString\tUser id.\to EnvironmentSpec​ See more details in environment api. NotebookPodSpec​ Field Name\tType\tDescription\tRequiredenvVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx resources\tString\tResourecs of the pod.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;test-nb&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;ownerId&quot;: &quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;: { &quot;name&quot;: &quot;notebook-env&quot; }, &quot;spec&quot;: { &quot;envVars&quot;: { &quot;TEST_ENV&quot;: &quot;test&quot; }, &quot;resources&quot;: &quot;cpu=1,memory=1.0Gi&quot; } } ' http://127.0.0.1:32080/api/v1/notebook  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;creating&quot;, &quot;reason&quot;:&quot;The notebook instance is creating&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"List notebook instances which belong to user​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#list-notebook-instances-which-belong-to-user","content":"GET /api/v1/notebook?id={user_id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tquery\tUser id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/notebook?id={user_id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;List all notebook instances&quot;, &quot;result&quot;:[ { &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:null, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;running&quot;, &quot;reason&quot;:&quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } } ], &quot;attributes&quot;:{} }  "},{"title":"Get the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#get-the-notebook-instance","content":"GET /api/v1/notebook/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tNotebook id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Get the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;running&quot;, &quot;reason&quot;:&quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Delete the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#delete-the-notebook-instance","content":"DELETE /api/v1/notebook/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tNotebook id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/next/api/notebook#example-3","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;terminating&quot;, &quot;reason&quot;:&quot;The notebook instance is terminating&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Serve REST API","type":0,"sectionRef":"#","url":"docs/next/api/serve","content":"","keywords":""},{"title":"Create a model serve​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#create-a-model-serve","content":"POST /api/v1/serve "},{"title":"Parameters​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#parameters","content":"Field Name\tType\tDescription\tRequiredmodelName\tString\tRegistered model name.\to modelVersion\tString\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#example","content":"note Make sure there is a model named simple with version 1 in the database. Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;modelName&quot;: &quot;simple&quot;, &quot;modelVersion&quot;:1, } ' http://127.0.0.1:32080/api/v1/serve  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a serve instance&quot;, &quot;result&quot;:{&quot;url&quot;:null}, &quot;attributes&quot;:{} }  "},{"title":"Delete the TensorFlow model serve​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#delete-the-tensorflow-model-serve","content":"DELETE /api/v1/serve "},{"title":"Parameters​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#parameters-1","content":"Field Name\tType\tDescription\tRequiredmodelName\tString\tRegistered model name.\to modelVersion\tString\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Serve REST API","url":"docs/next/api/serve#example-1","content":"Example Request curl -X DELETE -H &quot;Content-Type: application/json&quot; -d ' { &quot;modelName&quot;: &quot;simple&quot;, &quot;modelVersion&quot;:1, } ' http://127.0.0.1:32080/api/v1/serve  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete the model serve instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Apache Submarine Community","type":0,"sectionRef":"#","url":"docs/next/community/","content":"","keywords":""},{"title":"Communicating​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#communicating","content":"You can reach out to the community members via any one of the following ways: Slack Developer: https://join.slack.com/t/asf-submarine/shared_invite info After clicking the link above, you would join the ASF Submarine channel. Zoom: https://cloudera.zoom.us/j/97264903288 Sync Up: https://docs.google.com/document/d/16pUO3TP4SxSeLduG817GhVAjtiph9HYpRHo_JgduDvw/edit "},{"title":"Your First Contribution​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#your-first-contribution","content":"You can start by finding an existing issue with the https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE?filter=allopenissues label. These issues are well suited for new contributors. If a PR (Pull Request) submitted to the Submarine Github projects by you is approved and merged, then you become a Submarine Contributor. If you want to work on a new idea of relatively small scope: Submit an issue describing your proposed change to the repo in question. The repo owners will respond to your issue promptly. Submit a pull request of Submarine containing a tested change. Contributions are welcomed and greatly appreciated. See CONTRIBUTING for details on submitting patches and the contribution workflow. "},{"title":"How Do I Become a Committer?​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#how-do-i-become-a-committer","content":"First of all, you need to get involved and be a Contributor. Based on your track-record as a contributor, Per Apache code, PMCs vote on committership, may invite you to be a committer (after we've called a vote). When that happens, if you accept, the following process kicks into place... Note that becoming a committer is not just about submitting some patches; it‘s also about helping out on the development and user, helping with documentation and the issues. See How to become an Apache Submarine Committer and PMC for more details. "},{"title":"How to commit​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#how-to-commit","content":"See How to commit for helper doc for Submarine committers. "},{"title":"Communication​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#communication","content":"Communication within the Submarine community abides by Apache’s Code of Conduct. "},{"title":"Mailing lists​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#mailing-lists","content":"Get help using Apache Submarine or contribute to the project on our mailing lists: Users : subscribe, unsubscribe, archivesfor usage questions, help, and announcements.Dev : subscribe, unsubscribe, archivesfor people wanting to contribute to the project.Commits : subscribe, unsubscribe, archivesfor commit messages and patches. Take subscribe Dev as an example, you should send an email to dev-subscribe@submarine.apache.org. Usually, this happens when you just click the &quot;subscribe&quot; link. If this does not work, simply copy the address and paste it into the &quot;To:&quot; field of a new message. After that, you will get an email from dev-help@submarine.apache.org, follow the directives of the mail to reply, then you will subscribe dev@submarine.apache.org successfully. "},{"title":"License​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/next/community/#license","content":"Submarine source code is under the Apache 2.0 license. See the LICENSE file for details. "},{"title":"Bylaws","type":0,"sectionRef":"#","url":"docs/next/community/Bylaws","content":"Bylaws This document defines the bylaws under which the Apache Submarine project operates. It defines the roles and responsibilities of the project, who may vote, how voting works, how conflicts are resolved, etc. Submarine is a project of the Apache Software Foundation. The foundation holds the trademark on the name “Submarine” and copyright on Apache code including the code in the Submarine codebase. The foundation FAQ explains the operation and background of the foundation. Submarine is typical of Apache projects in that it operates under a set of principles, known collectively as the “Apache Way”. If you are new to Apache development, please refer to the Incubator project for more information on how Apache projects operate. Roles and Responsibilities Apache projects define a set of roles with associated rights and responsibilities. These roles govern what tasks an individual may perform within the project. The roles are defined in the following sections Users The most important participants in the project are people who use our software. The majority of our developers start out as users and guide their development efforts from the user’s perspective. Users contribute to the Apache projects by providing feedback to developers in the form of bug reports and feature suggestions. As well, users participate in the Apache community by helping other users on mailing lists and user support forums. Contributors All of the volunteers who are contributing time, code, documentation, or resources to the Submarine Project. A contributor that makes sustained, welcome contributions to the project may be invited to become a Committer, though the exact timing of such invitations depends on many factors. Committers The project’s Committers are responsible for the project’s technical management. Committers have access to all subproject subversion repositories. Committers may cast binding votes on any technical discussion regarding any subproject. Committer access is by invitation only and must be approved by consensus approval of the active PMC members. A Committer is considered emeritus by their own declaration or by not contributing in any form to the project for over six months. An emeritus committer may request reinstatement of commit access from the PMC. Such reinstatement is subject to consensus approval of active PMC members. Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. All Apache committers are required to have a signed Contributor License Agreement (CLA) on file with the Apache Software Foundation. There is a Committer FAQ which provides more details on the requirements for Committers A committer who makes a sustained contribution to the project may be invited to become a member of the PMC. The form of contribution is not limited to code. It can also include code review, helping out users on the mailing lists, documentation, testing, etc. Release Manager A Release Manager (RM) is a committer who volunteers to produce a Release Candidate according to HowToRelease. The RM shall publish a Release Plan on the common-dev@ list stating the branch from which they intend to make a Release Candidate, at least one week before they do so. The RM is responsible for building consensus around the content of the Release Candidate, in order to achieve a successful Product Release vote. Project Management Committee The Project Management Committee (PMC) for Apache Submarine was created by the Apache Board in October 2019 when Submarine moved out of Hadoop and became a top level project at Apache. The PMC is responsible to the board and the ASF for the management and oversight of the Apache Submarine codebase. The responsibilities of the PMC include Deciding what is distributed as products of the Apache Submarine project. In particular all releases must be approved by the PMCMaintaining the project’s shared resources, including the codebase repository, mailing lists, websites.Speaking on behalf of the project.Resolving license disputes regarding products of the projectNominating new PMC members and committersMaintaining these bylaws and other guidelines of the project Membership of the PMC is by invitation only and must be approved by a consensus approval of active PMC members. A PMC member is considered “emeritus” by their own declaration or by not contributing in any form to the project for over six months. An emeritus member may request reinstatement to the PMC. Such reinstatement is subject to consensus approval of the active PMC members. The chair of the PMC is appointed by the ASF board. The chair is an office holder of the Apache Software Foundation (Vice President, Apache Submarine) and has primary responsibility to the board for the management of the projects within the scope of the Submarine PMC. The chair reports to the board quarterly on developments within the Submarine project. The chair of the PMC is rotated annually. When the chair is rotated or if the current chair of the PMC resigns, the PMC votes to recommend a new chair using Single Transferable Vote (STV) voting. See https://wiki.apache.org/general/BoardVoting for specifics. The decision must be ratified by the Apache board. Decision Making Within the Submarine project, different types of decisions require different forms of approval. For example, the previous section describes several decisions which require “consensus approval” approval. This section defines how voting is performed, the types of approvals, and which types of decision require which type of approval. Voting Decisions regarding the project are made by votes on the primary project development mailing list (dev@submarine.apache.org). Where necessary, PMC voting may take place on the private Submarine PMC mailing list. Votes are clearly indicated by subject line starting with [VOTE]. Votes may contain multiple items for approval and these should be clearly separated. Voting is carried out by replying to the vote mail. Voting may take four flavors +1 “Yes,” “Agree,” or “the action should be performed.” In general, this vote also indicates a willingness on the behalf of the voter in “making it happen”+0 This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.-0 This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.-1 This is a negative vote. On issues where consensus is required, this vote counts as a veto. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an alternative course of action. All participants in the Submarine project are encouraged to show their agreement with or against a particular action by voting. For technical decisions, only the votes of active committers are binding. Non binding votes are still useful for those with binding votes to understand the perception of an action in the wider Submarine community. For PMC decisions, only the votes of PMC members are binding. Voting can also be applied to changes made to the Submarine codebase. These typically take the form of a veto (-1) in reply to the commit message sent when the commit is made. Approvals These are the types of approvals that can be sought. Different actions require different types of approvals Consensus Approval - Consensus approval requires 3 binding +1 votes and no binding vetoes.Lazy Consensus - Lazy consensus requires no -1 votes (‘silence gives assent’).Lazy Majority - A lazy majority vote requires 3 binding +1 votes and more binding +1 votes than -1 votes.Lazy 2⁄3 Majority - Lazy 2⁄3 majority votes requires at least 3 votes and twice as many +1 votes as -1 votes. Vetoes A valid, binding veto cannot be overruled. If a veto is cast, it must be accompanied by a valid reason explaining the reasons for the veto. The validity of a veto, if challenged, can be confirmed by anyone who has a binding vote. This does not necessarily signify agreement with the veto - merely that the veto is valid. If you disagree with a valid veto, you must lobby the person casting the veto to withdraw their veto. If a veto is not withdrawn, any action that has been vetoed must be reversed in a timely manner. Actions This section describes the various actions which are undertaken within the project, the corresponding approval required for that action and those who have binding votes over the action. Code Change A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc. Consensus approval of active committers, but with a minimum of one +1. The code can be committed after the first +1, unless the code change represents a merge from a branch, in which case three +1s are required. Product Release When a release of one of the project’s products is ready, a vote is required to accept the release as an official release of the project. Lazy Majority of active PMC members Adoption of New Codebase When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project Lazy 2⁄3 majority of PMC members New Branch Committer When a branch committer is proposed for the PMC Lazy consensus of active PMC members New Committer When a new committer is proposed for the project Consensus approval of active PMC members New PMC Member When a committer is proposed for the PMC Consensus approval of active PMC members Branch Committer Removal When removal of commit privileges is sought or when the branch is merged to the mainline Lazy 2⁄3 majority of active PMC members Committer Removal When removal of commit privileges is sought. Note: Such actions will also be referred to the ASF board by the PMC chair Lazy 2⁄3 majority of active PMC members (excluding the committer in question if a member of the PMC). PMC Member Removal When removal of a PMC member is sought. Note: Such actions will also be referred to the ASF board by the PMC chair. Lazy 2⁄3 majority of active PMC members (excluding the member in question) Modifying Bylaws Modifying this document. Lazy majority of active PMC members Voting Timeframes Votes are open for a period of 7 days to allow all active voters time to consider the vote. Votes relating to code changes are not subject to a strict timetable but should be made as timely as possible. Product Release - Vote Timeframe Release votes, alone, run for a period of 5 days. All other votes are subject to the above timeframe of 7 days.","keywords":""},{"title":"How To Contribute to Submarine","type":0,"sectionRef":"#","url":"docs/next/community/contributing","content":"","keywords":""},{"title":"Preface​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#preface","content":"Apache Submarine is an Apache 2.0 License Software. Contributing to Submarine means you agree to the Apache 2.0 License. Please read Code of Conduct carefully.The document How It Works can help you understand Apache Software Foundation further. "},{"title":"Build Submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#build-submarine","content":"Build From Code "},{"title":"Creating patches​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#creating-patches","content":"Submarine follows Fork &amp; Pull model. "},{"title":"Step1: Fork apache/submarine github repository (first time)​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step1-fork-apachesubmarine-github-repository-first-time","content":"Visit https://github.com/apache/submarineClick the Fork button to create a fork of the repository "},{"title":"Step2: Clone the Submarine to your local machine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step2-clone-the-submarine-to-your-local-machine","content":"# USERNAME – your Github user account name. git clone git@github.com:${USERNAME}/submarine.git # or: git clone https://github.com/${USERNAME}/submarine.git cd submarine # set upstream git remote add upstream git@github.com:apache/submarine.git # or: git remote add upstream https://github.com/apache/submarine.git # Don't push to the upstream master. git remote set-url --push upstream no_push # Check upstream/origin: # origin git@github.com:${USERNAME}/submarine.git (fetch) # origin git@github.com:${USERNAME}/submarine.git (push) # upstream git@github.com:apache/submarine.git (fetch) # upstream no_push (push) git remote -v  "},{"title":"Step3: Create a new Jira in Submarine project​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step3-create-a-new-jira-in-submarine-project","content":"New contributors need privilege to create JIRA issues. Please email kaihsun@apache.org with your Jira username. In addition, the email title should be &quot;[New Submarine Contributor]&quot;.Check Jira issue tracker for existing issues.Create a new Jira issue in Submarine project. When the issue is created, a Jira number (eg. SUBMARINE-748) will be assigned to the issue automatically. "},{"title":"Step4: Create a local branch for your contribution​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step4-create-a-local-branch-for-your-contribution","content":"cd submarine # Make your local master up-to-date git checkout master git fetch upstream git rebase upstream/master # Create a new branch fro issue SUBMARINE-${jira_number} git checkout -b SUBMARINE-${jira_number} # Example: git checkout -b SUBMARINE-748  "},{"title":"Step5: Develop & Create commits​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step5-develop--create-commits","content":"You can edit the code on the SUBMARINE-${jira_number} branch. (Coding Style: Code Convention)Create commits git add ${edited files} git commit -m &quot;SUBMARINE-${jira_number}. ${Commit Message}&quot; # Example: git commit -m &quot;SUBMARINE-748. Update Contributing guide&quot;  "},{"title":"Step6: Syncing your local branch with upstream/master​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step6-syncing-your-local-branch-with-upstreammaster","content":"# On SUBMARINE-${jira_number} branch git fetch upstream git rebase upstream/master  Please do not use git pull to synchronize your local branch. Because git pull does a merge to create merged commits, these will make commit history messy. "},{"title":"Step7: Push your local branch to your personal fork​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step7-push-your-local-branch-to-your-personal-fork","content":"git push origin SUBMARINE-${jira_number}  "},{"title":"Step8: Check GitHub Actions status of your personal commit​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step8-check-github-actions-status-of-your-personal-commit","content":"Visit https://github.com/${USERNAME}/submarine/actionsPlease make sure your new commits can pass all workflows before creating a pull request.  "},{"title":"Step9: Create a pull request on github UI​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step9-create-a-pull-request-on-github-ui","content":"Visit your fork at https://github.com/${USERNAME}/submarine.gitClick Compare &amp; Pull Request button to create pull request. Pull Request template​ Pull request templateFilling the template thoroughly can improve the speed of the review process. Example:   "},{"title":"Step10: Check GitHub Actions status of your pull request in apache/submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step10-check-github-actions-status-of-your-pull-request-in-apachesubmarine","content":"Visit https://github.com/apache/submarine/actionsPlease make sure your pull request can pass all workflows.  "},{"title":"Step11: The Review Process​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step11-the-review-process","content":"Anyone can be a reviewer and comment on the pull requests.Reviewer can indicate that a patch looks suitable for merging with a comment such as: &quot;Looks good&quot;, &quot;LGTM&quot;, &quot;+1&quot;. (PS: LGTM = Looks Good To Me)At least one indication of suitability (e.g. &quot;LGTM&quot;) from a committer is required to be merged. A committer can then initiate lazy consensus (&quot;Merge if there is no more discussion&quot;) after which the code can be merged after a particular time (usually 24 hours) if there are no more reviews.Contributors can ping reviewers (including committers) by commenting 'Ready to review'. "},{"title":"Step12: Address review comments​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#step12-address-review-comments","content":"Push new commits to SUBMARINE-${jira_number} branch. The pull request will update automatically.After you address all review comments, committers will merge the pull request. "},{"title":"Code convention​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/next/community/contributing#code-convention","content":"We are following Google Code style: Java styleShell style There are some plugins to format, lint your code in IDE (use dev-support/maven-config/checkstyle.xml as rules) Checkstyle plugin for Intellij (Setting Guide)Checkstyle plugin for Eclipse (Setting Guide) "},{"title":"How to become a Committer","type":0,"sectionRef":"#","url":"docs/next/community/HowToBecomeCommitter","content":"How to become a Committer Apache Submarine builds a community completely following Apache’s rules. Apache Committer is a term used in ASF (Apache Software Foundation) to indicate the person who submits a specific project. Apache Submarine Committer has permission to write the Submarine codebase and can merge PR. Anyone who has made enough contributions to the community and gained enough trust can become an Apache Submarine Committer. As long as anyone contributes to the Submarine project, you are the officially recognized Contributor of the Submarine project. There is no exact standard for growing from Contributor to Committer, and there is no expected timetable, but Committer candidates are generally long-term active contributors, becoming Committer does not require a huge architectural improvement contribution, or how many lines of code contribution. Contributing to the codebase, contributing to the documents, participating in the discussion of the mailing list, helping to answer questions, etc., are all ways to increase your influence. List of potential contributions (in no particular order): Submit the bugs, features, and improvements you found to the issueUpdate the official documents so that the project documents are the most recent, the best practices for writing Submarine, and various useful documents for users to analyze the features.Perform test and report test results.Actively participate in voting when the version is releasedParticipate in the discussion on the mailing list, usually there will be mails starting with [DISCUSS]Answer questions from users or developers on the mailing listReview the work of others (both code and non-code) and publish your own suggestionsReview the issues on JIRA and maintain the latest status of the issues, such as closing outdated issues, changing the issue’s error information, etc.Guide new contributors and be familiar with the community processGive speeches and blogs about Submarine, and add these to the official website of SubmarineAny contribution that is beneficial to the development of the Submarine community ...... More can refer to: ASF official documents Not everyone can complete all (or even any) items on this list. If you want to contribute in other ways, then just do it (and add them to the list). Pleasant manners and dedication are all you need to have a positive impact on the Submarine project. Inviting you to become Committer is the result of your long-term and stable interaction with the community, and the trust and recognition of the Submarine community. Committer is obliged to review and merge PRs submitted by others, test and vote on candidate versions when the version is released, participate in the discussion of feature design plans, and other types of project contributions. When you are active enough and make a bigger contribution to the community, you can be promoted to a PMC member of the Submarine project.","keywords":""},{"title":"How to vote a Committer or PMC","type":0,"sectionRef":"#","url":"docs/next/community/HowToVoteCommitterOrPMC","content":"","keywords":""},{"title":"The voting process of becoming a Submarine Committer or PMC​","type":1,"pageTitle":"How to vote a Committer or PMC","url":"docs/next/community/HowToVoteCommitterOrPMC#the-voting-process-of-becoming-a-submarine-committer-or-pmc","content":"After the PMC members of Submarine discover any valuable contributions from the community contributors and obtain the consent of the candidate, they initiate a discussion on the private mailing list of Submarine: [DISCUSS] YYYYY as a Submarine XXXXXX In the email, the source of the candidate’s contributions should be clearly stated, so that everyone can discuss and analyze. The discussion email will last at least 72 hours, and the project team members, including the mentors, will fully express their views on the proposed email. Regardless of whether there is a disagreement, after the discussion email, the vote initiator needs to initiate a Committer or PMC vote on the private mailing list of Submarine; [VOTE] YYYYY as a Submarine XXXXXX The voting mail should last for at least 72 hours, and there should be at least 3 +1 votes to pass the vote. If there are 0 votes or one -1 vote, the entire vote will fail. If voting -1, you need to clarify the question so that everyone can understand. After the voting email is over, the vote initiator should summarize it on the voting line, remind the end of voting, and send it to the voting summary email. [RESULTS][vote] YYYYY as a Submarine XXXXXX After the vote summary email is sent, if the vote passed, the vote initiator must send an invitation email to the candidate, and the invitation email needs the candidate to reply to accept or decline through the designated mailbox. [Invitation] Invitation to join Apache Submarine as a XXXXXX The email should be sent to the candidate, and the copy is sent to private@submarine.apache.org After the candidate accepts the invitation, if the candidate does not have an apache email account, the vote initiator needs to assist the candidate to create an apache account according to the guidelines. If the above content is completed, the vote initiator still needs to do the following two things: 6.1 Apply to the project leader to add project team members, and open the authority accounts for the jira and apache projects. 6.2 Send a notification email to the dev@submarine.apache.org mail group: [ANNOUNCE] New XXXXXX: YYYYY So far, the entire process is completed, then the candidate officially becomes the Committer or PMC of Submarine. "},{"title":"Guide for Apache Submarine Committers","type":0,"sectionRef":"#","url":"docs/next/community/HowToCommit","content":"","keywords":""},{"title":"New committers​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/next/community/HowToCommit#new-committers","content":"New committers are encouraged to first read Apache's generic committer documentation: Apache New Committer GuideApache Committer FAQ The first act of a new core committer is typically to add their name to the credits page. This requires changing the site source inhttps://github.com/apache/submarine-site/blob/master/community/member.md. Once done, update the Submarine website as describedhere(TLDR; don't forget to regenerate the site with hugo, and commit the generated results, too). "},{"title":"Review​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/next/community/HowToCommit#review","content":"Submarine committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they've not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected. The list of submitted patches can be found in the GitHubPull Requests page. Committers should scan the list from top-to-bottom, looking for patches that they feel qualified to review and possibly commit. For non-trivial changes, it is best to get another committer to review &amp; approve your own patches before commit. "},{"title":"Reject​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/next/community/HowToCommit#reject","content":"Patches should be rejected which do not adhere to the guidelines inContribution Guidelines. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review. "},{"title":"Commit individual patches​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/next/community/HowToCommit#commit-individual-patches","content":"Submarine uses git for source code version control. The writable repo is at -https://gitbox.apache.org/repos/asf/submarine.git It is strongly recommended to use the cicd script to merge the PRs. See the instructions athttps://github.com/apache/submarine/tree/master/dev-support/cicd "},{"title":"Adding Contributors role​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/next/community/HowToCommit#adding-contributors-role","content":"There are three roles (Administrators, Committers, Contributors) in the project. Contributors who have Contributors role can become assignee of the issues in the project.Committers who have Committers role can set arbitrary roles in addition to Contributors role.Committers who have Administrators role can edit or delete all comments, or even delete issues in addition to Committers role. How to set roles Login to ASF JIRAGo to the project page (e.g. https://issues.apache.org/jira/browse/SUBMARINE )Hit &quot;Administration&quot; tabHit &quot;Roles&quot; tab in left sideAdd Administrators/Committers/Contributors role "},{"title":"Resources","type":0,"sectionRef":"#","url":"docs/next/community/Resources","content":"Resources This document contains some resources that may help you understand more about Submarine. Conferences Apache submarine: a unified machine learning platform made simple at EuroMLSys '22 ABSTRACT As machine learning is applied more widely, it is necessary to have a machine-learning platform for both infrastructure administrators and users including expert data scientists and citizen data scientists [24] to improve their productivity. However, existing machine-learning platforms are ill-equipped to address the &quot;Machine Learning tech debts&quot; [36] such as glue code, reproducibility, and portability. Furthermore, existing platforms only take expert data scientists into consideration, and thus they are inflexible for infrastructure administrators and non-user-friendly for citizen data scientists. We propose Submarine, a unified machine-learning platform, and takes all infrastructure administrators, expert data scientists, and citizen data scientists into consideration. Submarine has been widely used in many technology companies, including Ke.com and LinkedIn. We present two use cases in Section 5.","keywords":""},{"title":"Architecture and Requirment","type":0,"sectionRef":"#","url":"docs/next/designDocs/architecture-and-requirements","content":"","keywords":""},{"title":"Terminology​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#terminology","content":"Term\tDescriptionUser\tA single data-scientist/data-engineer. User has resource quota, credentials Team\tUser belongs to one or more teams, teams have ACLs for artifacts sharing such as notebook content, model, etc. Admin\tAlso called SRE, who manages user's quotas, credentials, team, and other components. "},{"title":"Background​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#background","content":"Everybody talks about machine learning today, and lots of companies are trying to leverage machine learning to push the business to the next level. Nowadays, as more and more developers, infrastructure software companies coming to this field, machine learning becomes more and more achievable. In the last decade, the software industry has built many open source tools for machine learning to solve the pain points: It was not easy to build machine learning algorithms manually, such as logistic regression, GBDT, and many other algorithms:Answer to that: Industries have open sourced many algorithm libraries, tools, and even pre-trained models so that data scientists can directly reuse these building blocks to hook up to their data without knowing intricate details inside these algorithms and models. It was not easy to achieve &quot;WYSIWYG, what you see is what you get&quot; from IDEs: not easy to get output, visualization, troubleshooting experiences at the same place.Answer to that: Notebooks concept was added to this picture, notebook brought the experiences of interactive coding, sharing, visualization, debugging under the same user interface. There're popular open-source notebooks like Apache Zeppelin/Jupyter. It was not easy to manage dependencies: ML applications can run on one machine is hard to deploy on another machine because it has lots of libraries dependencies.Answer to that: Containerization becomes popular and a standard to packaging dependencies to make it easier to &quot;build once, run anywhere&quot;. Fragmented tools, libraries were hard for ML engineers to learn. Experiences learned in one company are not naturally migratable to another company.Answer to that: A few dominant open-source frameworks reduced the overhead of learning too many different frameworks, concepts. Data-scientist can learn a few libraries such as Tensorflow/PyTorch, and a few high-level wrappers like Keras will be able to create your machine learning application from other open-source building blocks. Similarly, models built by one library (such as libsvm) were hard to be integrated into machine learning pipeline since there's no standard format.Answer to that: Industry has built successful open-source standard machine learning frameworks such as Tensorflow/PyTorch/Keras so their format can be easily shared across. And efforts to build an even more general model format such as ONNX. It was hard to build a data pipeline that flows/transform data from a raw data source to whatever required by ML applications.Answer to that: Open source big data industry plays an important role in providing, simplify, unify processes and building blocks for data flows, transformations, etc. The machine learning industry is moving on the right track to solve major roadblocks. So what are the pain points now for companies which have machine learning needs? What can we help here? To answer this question, let's look at machine learning workflow first. "},{"title":"Machine Learning Workflows & Pain points​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#machine-learning-workflows--pain-points","content":"1) From different data sources such as edge, clickstream, logs, etc. =&gt; Land to data lakes 2) From data lake, data transformation: =&gt; Data transformations: Cleanup, remove invalid rows/columns, select columns, sampling, split train/test data-set, join table, etc. =&gt; Data prepared for training. 3) From prepared data: =&gt; Training, model hyper-parameter tuning, cross-validation, etc. =&gt; Models saved to storage. 4) From saved models: =&gt; Model assurance, deployment, A/B testing, etc. =&gt; Model deployed for online serving or offline scoring.  Typically data scientists responsible for item 2)-4), 1) typically handled by a different team (called Data Engineering team in many companies, some Data Engineering team also responsible for part of data transformation) "},{"title":"Pain #1 Complex workflow/steps from raw data to model, different tools needed by different steps, hard to make changes to workflow, and not error-proof​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#pain-1-complex-workflowsteps-from-raw-data-to-model-different-tools-needed-by-different-steps-hard-to-make-changes-to-workflow-and-not-error-proof","content":"It is a complex workflow from raw data to usable models, after talking to many different data scientists, we have learned that a typical procedure to train a new model and push to production can take months to 1-2 years. It is also a wide skill set required by this workflow. For example, data transformation needs tools like Spark/Hive for large scale and tools like Pandas for a small scale. And model training needs to be switched between XGBoost, Tensorflow, Keras, PyTorch. Building a data pipeline requires Apache Airflow or Oozie. Yes, there are great, standardized open-source tools built for many of such purposes. But how about changes need to be made for a particular part of the data pipeline? How about adding a few columns to the training data for experiments? How about training models, and push models to validation, A/B testing before rolling to production? All these steps need jumping between different tools, UIs, and very hard to make changes, and it is not error-proof during these procedures. "},{"title":"Pain #2 Dependencies of underlying resource management platform​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#pain-2-dependencies-of-underlying-resource-management-platform","content":"To make jobs/services required by a machine learning platform to be able to run, we need an underlying resource management platform. There're some choices of resource management platform, and they have distinct advantages and disadvantages. For example, there're many machine learning platform built on top of K8s. It is relatively easy to get a K8s from a cloud vendor, easy to orchestrate machine learning required services/daemons run on K8s. However, K8s doesn't offer good support jobs like Spark/Flink/Hive. So if your company has Spark/Flink/Hive running on YARN, there're gaps and a significant amount of work to move required jobs from YARN to K8s. Maintaining a separate K8s cluster is also overhead to Hadoop-based data infrastructure. Similarly, if your company's data pipelines are mostly built on top of cloud resources and SaaS offerings, asking you to install a separate YARN cluster to run a new machine learning platform doesn't make a lot of sense. "},{"title":"Pain #3 Data scientist are forced to interact with lower-level platform components​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#pain-3-data-scientist-are-forced-to-interact-with-lower-level-platform-components","content":"In addition to the above pain, we do see Data Scientists are forced to learn underlying platform knowledge to be able to build a real-world machine learning workflow. For most of the data scientists we talked with, they're experts of ML algorithms/libraries, feature engineering, etc. They're also most familiar with Python, R, and some of them understand Spark, Hive, etc. If they're asked to do interactions with lower-level components like fine-tuning a Spark job's performance; or troubleshooting job failed to launch because of resource constraints; or write a K8s/YARN job spec and mount volumes, set networks properly. They will scratch their heads and typically cannot perform these operations efficiently. "},{"title":"Pain #4 Comply with data security/governance requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#pain-4-comply-with-data-securitygovernance-requirements","content":"TODO: Add more details. "},{"title":"Pain #5 No good way to reduce routine ML code development​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#pain-5-no-good-way-to-reduce-routine-ml-code-development","content":"After the data is prepared, the data scientist needs to do several routine tasks to build the ML pipeline. To get a sense of the existing the data set, it usually needs a split of the data set, the statistics of data set. These tasks have a common duplicate part of code, which reduces the efficiency of data scientists. An abstraction layer/framework to help the developer to boost ML pipeline development could be valuable. It's better than the developer only needs to fill callback function to focus on their key logic. Submarine "},{"title":"Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#overview","content":""},{"title":"A little bit history​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#a-little-bit-history","content":"Initially, Submarine is built to solve problems of running deep learning jobs like Tensorflow/PyTorch on Apache Hadoop YARN, allows admin to monitor launched deep learning jobs, and manage generated models. It was part of YARN initially, and code resides under hadoop-yarn-applications. Later, the community decided to convert it to be a subproject within Hadoop (Sibling project of YARN, HDFS, etc.) because we want to support other resource management platforms like K8s. And finally, we're reconsidering Submarine's charter, and the Hadoop community voted that it is the time to moved Submarine to a separate Apache TLP. "},{"title":"Why Submarine?​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#why-submarine","content":"ONE PLATFORM Submarine is the ONE PLATFORM to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM means it supports Data Scientists and data engineers to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training, and tuning, and push model to production. All these steps can be completed within the ONE PLATFORM. Resource Management Independent It is also designed to be resource management independent, no matter if you have Apache Hadoop YARN, K8s, or just a container service, you will be able to run Submarine on top it. "},{"title":"Requirements and non-requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#requirements-and-non-requirements","content":""},{"title":"Notebook​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#notebook","content":"1) Users should be able to create, edit, delete a notebook. (P0) 2) Notebooks can be persisted to storage and can be recovered if failure happens. (P0) 3) Users can trace back to history versions of a notebook. (P1) 4) Notebooks can be shared with different users. (P1) 5) Users can define a list of parameters of a notebook (looks like parameters of the notebook's main function) to allow executing a notebook like a job. (P1) 6) Different users can collaborate on the same notebook at the same time. (P2) A running notebook instance is called notebook session (or session for short). "},{"title":"Experiment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#experiment","content":"Experiments of Submarine is an offline task. It could be a shell command, a Python command, a Spark job, a SQL query, or even a workflow. The primary purposes of experiments under Submarine's context is to do training tasks, offline scoring, etc. However, experiment can be generalized to do other tasks as well. Major requirement of experiment: 1) Experiments can be submitted from UI/CLI/SDK. 2) Experiments can be monitored/managed from UI/CLI/SDK. 3) Experiments should not bind to one resource management platform (K8s). Type of experiments​  There're two types of experiments:Adhoc experiments: which includes a Python/R/notebook, or even an adhoc Tensorflow/PyTorch task, etc. Predefined experiment library: This is specialized experiments, which including developed libraries such as CTR, BERT, etc. Users are only required to specify a few parameters such as input, output, hyper parameters, etc. Instead of worrying about where's training script/dependencies located. Adhoc experiment​ Requirements: Allow run adhoc scripts.Allow model engineer, data scientist to run Tensorflow/Pytorch programs on K8s/Container-cloud.Allow jobs easy access data/models in HDFS/s3, etc.Support run distributed Tensorflow/Pytorch jobs with simple configs.Support run user-specified Docker images.Support specify GPU and other resources. Predefined experiment library​ Here's an example of predefined experiment library to train deepfm model: { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  Predefined experiment libraries can be shared across users on the same platform, users can also add new or modified predefined experiment library via UI/REST API. We will also model AutoML, auto hyper-parameter tuning to predefined experiment library. Pipeline​ Pipeline is a special kind of experiment: A pipeline is a DAG of experiments.Can be also treated as a special kind of experiment.Users can submit/terminate a pipeline.Pipeline can be created/submitted via UI/API. "},{"title":"Environment Profiles​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#environment-profiles","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker or VM image (such as AMI: Amazon Machine Images) defines the base layer of the environment. On top of that, users can define a set of libraries (such as Python/R) to install. Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use. Environments can be added/listed/deleted/selected through CLI/SDK. "},{"title":"Model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#model","content":"Model management​ Model artifacts are generated by experiments or notebook.A model consists of artifacts from one or multiple files.Users can choose to save, tag, version a produced model.Once The Model is saved, Users can do the online model serving or offline scoring of the model. Model serving​ After model saved, users can specify a serving script, a model and create a web service to serve the model. We call the web service to &quot;endpoint&quot;. Users can manage (add/stop) model serving endpoints via CLI/API/UI. "},{"title":"Metrics for training job and model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#metrics-for-training-job-and-model","content":"Submarine-SDK provides tracking/metrics APIs, which allows developers to add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Deployment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#deployment","content":"Submarine Services (See architecture overview below) should be deployed easily on-prem / on-cloud. Since there're more and more public cloud offering for compute/storage management on cloud, we need to support deploy Submarine compute-related workloads (such as notebook session, experiments, etc.) to cloud-managed clusters. This also include Submarine may need to take input parameters from customers and create/manage clusters if needed. It is also a common requirement to use hybrid of on-prem/on-cloud clusters. "},{"title":"Security / Access Control / User Management / Quota Management​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#security--access-control--user-management--quota-management","content":"There're 4 kinds of objects need access-control: Assets belong to Submarine system, which includes notebook, experiments and results, models, predefined experiment libraries, environment profiles.Data security. (Who owns what data, and what data can be accessed by each users).User credentials. (Such as LDAP).Other security, such as Git repo access, etc. For the data security / user credentials / other security, it will be delegated to 3rd libraries such as Apache Ranger, IAM roles, etc. Assets belong to Submarine system will be handled by Submarine itself. Here're operations which Submarine admin can do for users / teams which can be used to access Submarine's assets. Operations for admins Admin uses &quot;User Management System&quot; to onboard new users, upload user credentials, assign resource quotas, etc.Admins can create new users, new teams, update user/team mappings. Or remove users/teams.Admin can set resource quotas (if different from system default), permissions, upload/update necessary credentials (like Kerberos keytab) of a user.A DE/DS can also be an admin if the DE/DS has admin access. (Like a privileged user). This will be useful when a cluster is exclusively shared by a user or only shared by a small team.Resource Quota Management System helps admin to manage resources quotas of teams, organizations. Resources can be machine resources like CPU/Memory/Disk, etc. It can also include non-machine resources like $$-based budgets. "},{"title":"Dataset​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#dataset","content":"There's also need to tag dataset which will be used for training and shared across the platform by different users. Like mentioned above, access to the actual data will be handled by 3rd party system like Apache Ranger / Hive Metastore which is out of the Submarine's scope. "},{"title":"Architecture Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#architecture-overview","content":""},{"title":"Architecture Diagram​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/next/designDocs/architecture-and-requirements#architecture-diagram","content":" +-----------------------------------------------------------------+ | Submarine UI / CLI / REST API / SDK | | Mini-Submarine | +-----------------------------------------------------------------+ +--------------------Submarine Server-----------------------------+ | +---------+ +---------+ +----------+ +----------+ +------------+| | |Data set | |Notebooks| |Experiment| |Models | |Servings || | +---------+ +---------+ +----------+ +----------+ +------------+| |-----------------------------------------------------------------| | | | +-----------------+ +-----------------+ +---------------------+ | | |Experiment | |Compute Resource | |Other Management | | | |Manager | | Manager | |Services | | | +-----------------+ +-----------------+ +---------------------+ | | Spark, template K8s/Docker | | TF, PyTorch, pipeline | | | + +-----------------+ + | |Submarine Meta | | | | Store | | | +-----------------+ | | | +-----------------------------------------------------------------+ (You can use http://stable.ascii-flow.appspot.com/#Draw to draw such diagrams)  Compute Resource Manager Helps to manage compute resources on-prem/on-cloud, this module can also handle cluster creation / management, etc. Experiment Manager Work with &quot;Compute Resource Manager&quot; to submit different kinds of workloads such as (distributed) Tensorflow / Pytorch, etc. Submarine SDK provides Java/Python/REST API to allow DS or other engineers to integrate into Submarine services. It also includes a mini-submarine component that launches Submarine components from a single Docker container (or a VM image). Details of Submarine Server design can be found at submarine-server-design. "},{"title":"Environments Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/environments-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#overview","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker and/or VM-image (such as, VirtualBox/VMWare images, Amazon Machine Images - AMI, Or custom image of Azure VM) defines the base layer of the environment. Please note that VM-image is different from VM instance type, On top of that, users can define a set of libraries (such as Python/R) to install, we call it kernel. Example of Environment  +-------------------+ |+-----------------+| || Python=3.7 || || Tensorflow=2.0 || |+---Exp Dependency+| |+-----------------+| ||OS=Ubuntu16.04 || ||CUDA=10.2 || ||GPU_Driver=375.. || |+---Base Library--+| +-------------------+  As you can see, There're base libraries, such as what OS, CUDA version, GPU driver, etc. They can be achieved by specifying a VM-image / Docker image. On top of that, user can bring their dependencies, such as different version of Python, Tensorflow, Pandas, etc. How users use environment? Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use.  +-------------------+ |+-----------------+| +------------+ || Python=3.7 || |User1 | || Tensorflow=2.0 || +------------+ |+---Kernel -------+| +------------+ |+-----------------+|&lt;----+ |User2 | ||OS=Ubuntu16.04 || + +------------+ ||CUDA=10.2 || | +------------+ ||GPU_Driver=375.. || | |User3 | |+---Base Library--+| | +------------+ +-----Default-Env---+ | | | +-------------------+ | |+-----------------+| | || Python=3.3 || | || Tensorflow=2.0 || | |+---kernel--------+| | |+-----------------+| | ||OS=Ubuntu16.04 || | ||CUDA=10.3 ||&lt;----+ ||GPU_Driver=375.. || |+---Base Library--+| +-----My-Customized-+  There're two environments in the above graph, &quot;Default-Env&quot; and &quot;My-Customized&quot;, which can have different combinations of libraries for different experiments/notebooks. Users can choose different environments for different experiments as they want. Environments can be added/listed/deleted/selected through CLI/SDK/UI. Implementation "},{"title":"Environment API definition​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#environment-api-definition","content":"Let look at what object definition looks like to define an environment, API of environment looks like:  name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: &lt;object of kernel&gt; description: &quot;this is the most common env used by team ABC&quot;  vm-image is optional if we don't need to launch new VM (like running a training job in a cloud-remote machine). docker-image is requiredkernel could be optional if kernel is already included by vm-image or docker-image.name of the environment should be unique in the system, so user can reference it when create a new experiment/notebook. "},{"title":"VM-image and Docker-image​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#vm-image-and-docker-image","content":"Docker-image and VM image should be prepared by system admin / SREs, it is hard for Data-Scientists to write an error-proof Dockerfile, and push/manage Docker images. This is one of the reason we hide Docker-image inside &quot;environment&quot;, we will encourage users to customize their kernels if needed, but don't have to touch Dockerfile and build/push/manage new Docker images. As a project, we will document what's the best practice and example of Dockerfiles. Dockerfile should include proper ENTRYPOINT definition which pointed to our default script, so no matter it is notebook, or an experiment, we will setup kernel (see below) and other environment variables properly. "},{"title":"Kernel Implementation​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#kernel-implementation","content":"After investigating different alternatives (such as pipenv, venv, etc.), we decided to use Conda environment which nicely replaces Python virtual env, pip, and can also support other languages. More details can be found at: https://medium.com/@krishnaregmi/pipenv-vs-virtualenv-vs-conda-environment-3dde3f6869ed When once Conda, users can easily add, remove dependency of a Conda environment. User can also easily export environment to yaml file. The yaml file of Conda environment by using conda env export looks like: name: base channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0 - anaconda-project=0.8.4=py_0 - applaunchservices=0.2.1=py_0  Including Conda kernel, the environment object may look like: name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: name: team_default_python_3.7 channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0  When launch a new experiment / notebook session using the my_submarine_env, submarine server will use defined Docker image, and Conda kernel to launch of container. "},{"title":"Storage of Environment​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#storage-of-environment","content":"Environment of Submarine is just a simple text file, so it will be persisted in Submarine metastore, which is ideally a Database. Docker image is stored inside a regular Docker registry, which will be handled outside of the system. Conda dependencies are stored in Conda channel (where referenced packages are stored), which will be handled/setuped separately. (Popular conda channels are default and conda-forge) For more detailed discussion about storage-related implementations, please refer to storage-implementation. "},{"title":"How to implement to make user can easily use Submarine environments?​","type":1,"pageTitle":"Environments Implementation","url":"docs/next/designDocs/environments-implementation#how-to-implement-to-make-user-can-easily-use-submarine-environments","content":"We like simplicities, and we don't want to leak complexities of implementations to the users. To make it happen, we have to do some works to hide complexities. There're two primary uses of environments: experiments and notebook, for both of them, users should not do works like explictily call conda active $env_name to active environments. To make it happen, what we can do is to include following parts in Dockerfile FROM ubuntu:18.04 &lt;Include whatever base-libraries like CUDA, etc.&gt; &lt;Make sure conda (with our preferred version) is installed&gt; &lt;Make sure Jupyter (with our preferred version) is installed&gt; # This is just a sample of Dockerfile, users can do more customizations if needed ENTRYPOINT [&quot;/submarine-bootstrap.sh&quot;]  When Submarine Server (this is implementation detail of Submarine Server, user will not see it at all) launch an experiment, or notebook, it will invoke following docker run command (or any other equvilant like using K8s spec): docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... python train.py --batch_size 5 (and other parameters)  Similarily, to launch a notebook: docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... jupyter  The submarine-bootstrap.sh is part of Submarine repo, and will handle --kernel argument which will invoke conda active $kernel_name before anything else. (Like run the training job). "},{"title":"Implementation Notes","type":0,"sectionRef":"#","url":"docs/next/designDocs/implementation-notes","content":"Implementation Notes Before digging into details of implementations, you should read architecture-and-requirements first to understand overall requirements and architecture. Here're sub topics of Submarine implementations: Submarine Storage: How to store metadata, logs, metrics, etc. of Submarine.Submarine Environment: How environments created, managed, stored in Submarine.Submarine Experiment: How experiments managed, stored, and how the predefined experiment template works.Submarine Notebook: How experiments managed, stored, and how the predefined experiment template works.Submarine Server: How Submarine server is designed, architecture, implementation notes, etc. Working-in-progress designs, Below are designs which are working-in-progress, we will move them to the upper section once design &amp; review is finished: Submarine services deployment module: How to deploy submarine services to k8s or cloud.","keywords":""},{"title":"Experiment Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/experiment-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#overview","content":"This document talks about implementation of experiment, flows and design considerations. Experiment consists of following components, also interact with other Submarine or 3rd-party components, showing below:  +---------------------------------------+ +----------+ | Experiment Tasks | |Run | | | |Configs | | +----------------------------------+ | +----------+ | | Experiment Runnable Code | | +-----------------+ +----------+ | | | | |Output Artifacts | |Input Data| | | (Like train-job.py) | | |(Models, etc.) | | | | +----------------------------------+ | +-----------------+ | | | +----------------------------------+ | +----------+ | | Experiment Deps (Like Python) | | +-------------+ | +----------------------------------+ | |Logs/Metrics | | +----------------------------------+ | | | | | OS, Base Libaries (Like CUDA) | | +-------------+ | +----------------------------------+ | +---------------------------------------+ ^ | (Launch Task with resources) + +---------------------------------+ |Resource Manager (K8s/Cloud)| +---------------------------------+  As showing in the above diagram, Submarine experiment consists of the following items: On the left side, there're input data and run configs.In the middle box, they're experiment tasks, it could be multiple tasks when we run distributed training, pipeline, etc. There're main runnable code, such as train.py for the training main entry point.The two boxes below: experiment dependencies and OS/Base libraries we called Submarine Environment Profile or Environment for short. Which defined what is the basic libraries to run the main experiment code.Experiment tasks are launched by Resource Manager, such as K8s/Cloud or just launched locally. There're resources constraints for each experiment tasks. (e.g. how much memory, cores, GPU, disk etc. can be used by tasks). On the right side, they're artifacts generated by experiments: Output artifacts: Which are main output of the experiment, it could be model(s), or output data when we do batch prediction.Logs/Metrics for further troubleshooting or understanding of experiment's quality. For the rest of the design doc, we will talk about how we handle environment, code, and manage output/logs, etc. "},{"title":"API of Experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#api-of-experiment","content":"This is not a full definition of experiment, for more details, please reference to experiment API. Here's just an example of experiment object which help developer to understand what included in an experiment. experiment: name: &quot;abc&quot;, type: &quot;script&quot;, environment: &quot;team-default-ml-env&quot; code: sync_mode: s3 url: &quot;s3://bucket/training-job.tar.gz&quot; parameter: &gt; python training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; timeout: &quot;30 mins&quot;  This defined a &quot;script&quot; experiment, which has a name &quot;abc&quot;, the name can be used to track the experiment. There's environment &quot;team-default-ml-env&quot; defined to make sure dependencies of the job can be downloaded properly before executing the job. code defined where the experiment code will be downloaded, we will support a couple of sync_mode like s3 (or abfs/hdfs), git, etc. Different types of experiments will have different specs, for example distributed Tensorflow spec may look like: experiment: name: &quot;abc-distributed-tf&quot;, type: &quot;distributed-tf&quot;, ps: environment: &quot;team-default-ml-cpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=0&quot; worker: environment: &quot;team-default-ml-gpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot; parameter: &gt; python /code/training-job/training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output tensorboard: enabled timeout: &quot;30 mins&quot;  Since we have different Docker image, one is using GPU and one is not using GPU, we can specify different environment and resource constraint. "},{"title":"Manage environments for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#manage-environments-for-experiment","content":"Please refer to environment-implementation.md for more details "},{"title":"Manage storages for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#manage-storages-for-experiment","content":"There're different types of storage, such as logs, metrics, dependencies (environments). For more details. Please refer to storage-implementations for more details. This also includes how to manage code for experiment code. "},{"title":"Manage Pre-defined experiment libraries​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#manage-pre-defined-experiment-libraries","content":""},{"title":"Flow: Submit an experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#flow-submit-an-experiment","content":""},{"title":"Submit via SDK Flows.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#submit-via-sdk-flows","content":"To better understand experiment implementation, It will be good to understand what is the steps of experiment submission. Please note that below code is just pseudo code, not official APIs. "},{"title":"Specify what environment to use​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#specify-what-environment-to-use","content":"Before submit the environment, you have to choose what environment to choose. Environment defines dependencies, etc. of an experiment or a notebook. might looks like below: conda_environment = &quot;&quot;&quot; name: conda-env channels: - defaults dependencies: - asn1crypto=1.3.0=py37_0 - blas=1.0=mkl - ca-certificates=2020.1.1=0 - certifi=2020.4.5.1=py37_0 - cffi=1.14.0=py37hb5b8e2f_0 - chardet=3.0.4=py37_1003 prefix: /opt/anaconda3/envs/conda-env &quot;&quot;&quot; # This environment can be different from notebook's own environment environment = create_environment { DockerImage = &quot;ubuntu:16&quot;, CondaEnvironment = conda_environment }  To better understand how environment works, please refer to environment-implementation. "},{"title":"Create experiment, specify where's training code located, and parameters.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#create-experiment-specify-wheres-training-code-located-and-parameters","content":"For ad-hoc experiment (code located at S3), assume training code is part of the training-job.tar.gz and main class is train.py. When the job is launched, whatever specified in the localize_artifacts will be downloaded. experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/training-job.tar.gz&quot; ], name = &quot;abc&quot;, parameter = &quot;python training.py --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run notebook file in offline mode​ It is possible we want to run a notebook file in offline mode, to do that, here's code to use to run a notebook code experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/folder/notebook-123.ipynb&quot; ], name = &quot;abc&quot;, parameter = &quot;runipy training.ipynb --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run pre-defined experiment library​ experiment = create_experiment { # Here you can use default environment of library Environment = environment, ExperimentConfig = { type = &quot;template&quot;, name = &quot;abc&quot;, # A unique name of template template = &quot;deepfm_ctr&quot;, # yaml file defined what is the parameters need to be specified. parameter = { Input: &quot;S3://.../input&quot;, Output: &quot;S3://.../output&quot; Training: { &quot;batch_size&quot;: 512, &quot;l2_reg&quot;: 0.01, ... } } } } experiment.run() experiment.wait_for_finish(print_output=True)  "},{"title":"Summarize: Experiment v.s. Notebook session​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#summarize-experiment-vs-notebook-session","content":"There's a common misunderstanding about what is the differences between running experiment v.s. running task from a notebook session. We will talk about differences and commonalities: Differences \tExperiment\tNotebook SessionRun mode\tOffline\tInteractive Output Artifacts (a.k.a model)\tPersisted in a shared storage (like S3/NFS)\tLocal in the notebook session container, could be ephemeral Run history (meta, logs, metrics)\tMeta/logs/metrics can be traced from experiment UI (or corresponding API)\tNo run history can be traced from Submarine UI/API. Can view the current running paragraph's log/metrics, etc. What to run?\tCode from Docker image or shared storage (like Tarball on S3, Github, etc.)\tLocal in the notebook's paragraph Commonalities \tExperiment &amp; Notebook SessionEnvironment\tThey can share the same Environment configuration "},{"title":"Experiment-related modules inside Submarine-server​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#experiment-related-modules-inside-submarine-server","content":"(Please refer to architecture of submarine server for more details) "},{"title":"Experiment Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#experiment-manager","content":"The experiment manager receives the experiment requests, persisting the experiment metas in a database(e.g. MySQL), will invoke subsequence modules to submit and monitor the experiment's execution. "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#compute-cluster-manager","content":"After experiment accepted by experiment manager, based on which cluster the experiment intended to run (like mentioned in the previous sections, Submarine supports to manage multiple compute clusters), compute cluster manager will returns credentials to access the compute cluster. It will also be responsible to create a new compute cluster if needed. For most of the on-prem use cases, there's only one cluster involved, for such cases, ComputeClusterManager returns credentials to access local cluster if needed. "},{"title":"Experiment Submitter​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#experiment-submitter","content":"Experiment Submitter handles different kinds of experiments to run (e.g. ad-hoc script, distributed TF, MPI, pre-defined templates, Pipeline, AutoML, etc.). And such experiments can be managed by different resource management systems (e.g. K8s, container cloud, etc.) To meet the requirements to support variant kinds of experiments and resource managers, we choose to use plug-in modules to support different submitters (which requires jars to submarine-server’s classpath). To avoid jars and dependencies of plugins break the submarine-server, the plug-ins manager, or both. To solve this issue, we can instantiate submitter plug-ins using a classloader that is different from the system classloader. Submitter Plug-ins​ Each plug-in uses a separate module under the server-submitter module. As the default implements, we provide for K8s. The submitter-k8s plug-in is used to submit the job to Kubernetes cluster and use the operator as the runtime. The submitter-k8s plug-in implements the operation of CRD object and provides the java interface. In the beginning, we use the tf-operator for the TensorFlow. If Submarine want to support the other resource management system in the future, such as submarine-docker-cluster (submarine uses the Raft algorithm to create a docker cluster on the docker runtime environment on multiple servers, providing the most lightweight resource scheduling system for small-scale users). We should create a new plug-in module named submitter-docker under the server-submitter module. "},{"title":"Experiment Monitor​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#experiment-monitor","content":"The monitor tracks the experiment life cycle and records the main events and key info in runtime. As the experiment run progresses, the metrics are needed for evaluation of the ongoing success or failure of the execution progress. Due to adapt the different cluster resource management system, so we need a generic metric info structure and each submitter plug-in should inherit and complete it by itself. "},{"title":"Invoke flows of experiment-related components​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#invoke-flows-of-experiment-related-components","content":" +-----------------+ +----------------+ +----------------+ +-----------------+ |Experiments | |Compute Cluster | |Experiment | | Experiment | |Mgr | |Mgr | |Submitter | | Monitor | +-----------------+ +----------------+ +----------------+ +-----------------+ + + + + User | | | | Submit |+-------------------------------------&gt;+ + Xperiment| Use submitter.validate(spec) | | | to validate spec and create | | | experiment object (state- | | | machine). | | | | | | The experiment manager will | | | persist meta-data to Database| | | | | | | | + + |+-----------------&gt; + | | | Submit Experiments| | | | To ComputeCluster| | | | Mgr, get existing|+----------------&gt;| | | cluster, or | Use Submitter | | | create a new one.| to submit |+---------------&gt; | | | Different kinds | Once job is | | | of experiments | submitted, use |+----+ | | to k8s, etc| monitor to get | | | | | status updates | | | | | | | Monitor | | | | | Xperiment | | | | | status | | | | | |&lt;--------------------------------------------------------+| | | | | | | | Update Status back to Experiment | | | | Manager | |&lt;----+ | | | | | | | | | | | | v v v v  TODO: add more details about template, environment, etc. "},{"title":"Common modules of experiment/notebook-session/model-serving​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#common-modules-of-experimentnotebook-sessionmodel-serving","content":"Experiment/notebook-session/model-serving share a lot of commonalities, all of them are: Some workloads running on K8s.Need persist meta data to DB.Need monitor task/service running status from resource management system. We need to make their implementation are loose-coupled, but at the same time, share some building blocks as much as possible (e.g. submit PodSpecs to K8s, monitor status, get logs, etc.) to reduce duplications. "},{"title":"Support Predefined-experiment-templates​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#support-predefined-experiment-templates","content":"Predefined Experiment Template is just a way to save data-scientists time to repeatedly entering parameters which is not error-proof and user experience is also bad. "},{"title":"Predefined-experiment-template API to run experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#predefined-experiment-template-api-to-run-experiment","content":"Predefined experiment template consists a list of parameters, each of the parameter has 4 properties: Key\tRequired\tDefault Value\tDescriptionName of the key\ttrue/false\tWhen required = false, a default value can be provided by the template\tDescription of the parameter For the example of deepfm CTR training experiment mentioned in the architecture-and-requirements.md { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  The template will be (in yaml format): # deepfm.ctr template name: deepfm.ctr author: description: &gt; This is a template to run CTR training using deepfm algorithm, by default it runs single node TF job, you can also overwrite training parameters to use distributed training. parameters: - name: input.train_data required: true description: &gt; train data is expected in SVM format, and can be stored in HDFS/S3 ... - name: training.batch_size required: false default: 32 description: This is batch size of training  The batch format can be used in UI/API. "},{"title":"Handle Predefined-experiment-template from server side​","type":1,"pageTitle":"Experiment Implementation","url":"docs/next/designDocs/experiment-implementation#handle-predefined-experiment-template-from-server-side","content":"Please note that, the conversion of predefined-experiment-template will be always handled by server. The invoke flow looks like:  +------------Submarine Server -----------------------+ +--------------+ | +-----------------+ | |Client |+-------&gt;|Experimment Mgr | | | | | | | | +--------------+ | +-----------------+ | | + | Submit | +-------v---------+ Get Experiment Template | Template | |Experiment |&lt;-----+From pre-registered | Parameters | |Template Registry| Templates | to Submarine | +-------+---------+ | Server | | | | +-------v---------+ +-----------------+ | | |Deepfm CTR Templ-| |Experiment- | | | |ate Handler +------&gt;|Tensorflow | | | +-----------------+ +--------+--------+ | | | | | | | | +--------v--------+ | | |Experiment | | | |Submitter | | | +--------+--------+ | | | | | | | | +--------v--------+ | | | | | | | ...... | | | +-----------------+ | | | +----------------------------------------------------+  Basically, from Client, it submitted template parameters to Submarine Server, inside submarine server, it finds the corresponding template handler based on the name. And the template handler converts input parameters to an actual experiment, such as a distributed TF experiment. After that, it goes the similar route to validate experiment spec, compute cluster manager, etc. to get the experiment submitted and monitored. Predefined-experiment-template is able to create any kind of experiment, it could be a pipeline:  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | | | | v +--------------------+ +------------------+ | +-----------------+| | Predefined | | | Split Train/ ||&lt;----+| Pipeline | | | Test data || +------------------+ | +-------+---------+| | | | | +-------v---------+| | | Spark Job ETL || | | || | +-------+---------+| | | | | +-------v---------+| | | Train using || | | XGBoost || | +-------+---------+| | | | | +-------v---------+| | | Validate Train || | | Results || | +-----------------+| | | +--------------------+  Template can be also chained to reuse other template handlers  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | v +------------------+ +------------------+ |Distributed | | ABC Template | |TF Experiment |&lt;----+| Handler | +------------------+ +------------------+  Template Handler is a callable class inside Submarine Server with a standard interface defined like. interface ExperimentTemplateHandler { ExperimentSpec createExperiment(TemplatedExperimentParameters param) }  We should avoid users to do coding when they want to add new template, we should have several standard template handler to deal with most of the template handling. Experiment templates can be registered/updated/deleted via Submarine Server's REST API, which need to be discussed separately in the doc. (TODO) "},{"title":"Notebook Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/notebook-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#overview","content":""},{"title":"User's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#users-interaction","content":"Users can start N (N &gt;= 0) number of Notebook sessions, a notebook session is a running notebook instance. Notebook session can be launched by Submarine UI (P0), and Submarine CLI (P2). When launch notebook session, users can choose T-shirt size of notebook session (how much mem/cpu/gpu resources, or resource profile such as small, medium, large, etc.). (P0)And user can choose an environment for notebook. More details please refer to environment implementation (P0)When start a notebook, user can choose what code to be initialized, similar to experiment. (P1)Optionally, users can choose to attach a persistent volume to a notebook session. (P2) Users can get a list of notebook sessions belongs to themselves, and connect to notebook session. User can choose to terminate a running notebook session. "},{"title":"Admin's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#admins-interaction","content":"How many concurrent notebook sessions can be launched by each user is determined by resource quota limits of each user, and maximum concurrent notebook sessions can be launched by each user. (P2) "},{"title":"Relationship with other components​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#relationship-with-other-components","content":""},{"title":"Metadata store​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#metadata-store","content":"Running notebook sessions' metadata need persistented in Submarine's metadata store (Database). "},{"title":"Submarine Server​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#submarine-server","content":" +--------------+ +--------Submarine Server--------------------+ |Submarine UI | | +-------------------+ | | |+---&gt; Submarine | | | Notebook | | | Notebook REST API| | +--------------+ | | | | | +--------+----------+ +--------------+ | | | +-&gt;|Metastore | | | +--------v----------+ | |DB | | | | Submarine +--+ +--------------+ | | | Notebook Mgr | | | | | | | | | | | +--------+----------+ | | | | +----------|---------------------------------+ | +--------------+ +--------v---------+ | Notebook Session | | | | instance | | | +------------------+  Once user use Submarine UI to launch a notebook session, Submarine notebook manager inside Submarine Server will persistent notebook session's metadata, and launch a new notebook session instance. "},{"title":"Resource manager​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#resource-manager","content":"When using K8s as resource manager, Submarine notebook session will run as a new POD. "},{"title":"Storage​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#storage","content":"There're several different types of storage requirements for Submarine notebook. For code, environment, etc, storage, please refer to storage implementation, check &quot;Localization of experiment/notebook/model-serving code&quot;. When there're needs to attach volume (such as user's home folder) to Submarine notebook session, please check storage implementation, check &quot;Attachable volume&quot;. "},{"title":"Environment​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#environment","content":"Submarine notebook's environment should be used to run experiment, model serving, etc. Please check environment implementation. (More specific to notebook, please check &quot;How to implement to make user can easily use Submarine environments&quot;) Please note that notebook's Environment should include right version of notebook libraries, and admin should follow the guidance to build correct Docker image, Conda libraries to correctly run Notebook. "},{"title":"Submarine SDK (For Experiment, etc.)​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#submarine-sdk-for-experiment-etc","content":"Users can run new experiment, access metrics information, or do model operations using Submarine SDK. Submarine SDK is a Python library which can talk to Submarine Server which need Submarine Server's endpoint as well as user credentials. To ensure better experience, we recommend always install proper version of Submarine SDK from environment which users can use Submarine SDK directly from commandline. (We as Submarine community can provide sample Dockerfile or Conda environment which have correct base libraries installed for Submarine SDK). Submarine Server IP will be configured automatically by Submarine Server, and added as an envar when Submarine notebook session got launched. "},{"title":"Security​","type":1,"pageTitle":"Notebook Implementation","url":"docs/next/designDocs/notebook-implementation#security","content":"Please refer to Security Implementation Once user accessed to a running notebook session, the user can also access resources of the notebook, capability of submit new experiment, and access data. This is also very dangerous so we have to protect it. A simple solution is to use token-based authentication https://jupyter-notebook.readthedocs.io/en/stable/security.html. A more common way is to use solutions like KNOX to support SSO. We need expand this section to more details. (TODO). "},{"title":"Storage Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/storage-implementation","content":"","keywords":""},{"title":"ML-related objects and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#ml-related-objects-and-their-storages","content":"First let's look at what user will interact for most of the time: Notebook ExperimentModel Servings  +---------+ +------------+ |Logs |&lt;--+|Notebook | +----------+ +---------+ +------------+ +----------------+ |Trackings | &lt;-+|Experiment |&lt;--+&gt;|Model Artifacts | +----------+ +-----------------+ +------------+ +----------------+ +----------+&lt;---+|ML-related Metric|&lt;--+Servings | |tf.events | +-----------------+ +------------+ +----------+ ^ +-----------------+ + | Environments | +----------------------+ | | +-----------------+ | Submarine Metastore | | Dependencies | |Code | +----------------------+ | | +-----------------+ |Experiment Meta | | Docker Images | +----------------------+ +-----------------+ |Model Store Meta | +----------------------+ |Model Serving Meta | +----------------------+ |Notebook meta | +----------------------+ |Experiment Templates | +----------------------+ |Environments Meta | +----------------------+  First of all, all the notebook-sessions / experiments / model-serving instances) are more or less interact with following storage objects: Logs for these tasks for troubleshooting. ML-related metrics such as loss, epoch, etc. (in contrast of system metrics such as CPU/memory usage, etc.) There're different types of ML-related metrics, for Tensorflow/pytorch, they can use tf.events and get visualizations on tensorboard. Or they can use tracking APIs (such as Submarine tracking, mlflow tracking, etc.) to output customized tracking results for non TF/Pytorch workloads. Training jobs of experiment typically generate model artifacts (files) which need persisted, and both of notebook, model serving needs to load model artifacts from persistent storage. There're various of meta information, such as experiment meta, model registry, model serving, notebook, experiment, environment, etc. We need be able to read these meta information back.We also have code for experiment (like training/batch-prediction), notebook (ipynb), and model servings.And notebook/experiments/model-serving need depend on environments (dependencies such as pip, and Docker Images). "},{"title":"Implementation considerations for ML-related objects​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#implementation-considerations-for-ml-related-objects","content":"Object Type\tCharacteristics\tWhere to storeMetrics: tf.events\tTime series data with k/v, appendable to file\tLocal/EBS, HDFS, Cloud Blob Storage Metrics: other tracking metrics\tTime series data with k/v, appendable to file\tLocal, HDFS, Cloud Blob Storage, Database Logs\tLarge volumes, #files are potentially huge.\tLocal (temporary), HDFS (need aggregation), Cloud Blob Storage Submarine Metastore\tCRUD operations for small meta data.\tDatabase Model Artifacts\tSize varies for model (from KBs to GBs). #files are potentially huge.\tHDFS, Cloud Blob Storage Code\tNeed version control. (Please find detailed discussions below for code storage and localization)\tTarball on HDFS/Cloud Blog Storage, or Git Environment (Dependencies, Docker Image) Public/private environment repo (like Conda channel), Docker registry. "},{"title":"Detailed discussions​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#detailed-discussions","content":"Store code for experiment/notebook/model-serving​ There're following ways to get experiment code: 1) Code is part of Git repo: (Recommended) This is our recommended approach, once code is part of Git, it will be stored in version control, any change will be tracked, and much easier for users to trace back what change triggered a new bug, etc. 2) Code is part of Docker image: This is an anti-pattern and we will NOT recommend you to use it, Docker image can be used to include ANYTHING, like dependencies, the code you will execute, or even data. But this doesn't mean you should do it. We recommend to use Docker image ONLY for libraries/dependencies. Making code to be part of Docker image makes hard to edit code (if you want to update a value in your Python file, you will have to recreate the Docker image, push it and rerun it). 3) Code is part of S3/HDFS/ABFS: User may want to store their training code to a tarball on a shared storage. Submarine need to download code from remote storage to the launched container before running the code. Localization of experiment/notebook/model-serving code​ To make user experiences keeps same across different environment, we will localize code to a same folder after the container is launched, preferably /code For example, there's a git repo need to be synced up for an experiment/notebook/model-serving (example above): experiment: #Or notebook, model-serving name: &quot;abc&quot;, environment: &quot;team-default-ml-env&quot; ... (other fields) code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot;  After localize, training-job/ will be placed under /code When we running on K8s environment, we can use K8s's initContainer and emptyDir to do these things for us. K8s POD spec (generated by Submarine server instead of user, user should NEVER edit K8s spec, that's too unfriendly to data-scientists): apiVersion: v1 kind: Pod metadata: name: experiment-abc spec: containers: - name: experiment-task image: training-job volumeMounts: - name: code-dir mountPath: /code initContainers: - name: git-localize image: git-sync command: &quot;git clone .. /code/&quot; volumeMounts: - name: code-dir mountPath: /code volumes: - name: code-dir emptyDir: {}  The above K8s spec create a code-dir and mount it to /code to launched containers. The initContainer git-localize uses https://github.com/kubernetes/git-sync to do the sync up. (If other storages are used such as s3, we can use similar initContainer approach to download contents) "},{"title":"System-related metrics/logs and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#system-related-metricslogs-and-their-storages","content":"Other than ML-related objects, we have system-related objects, including: Daemon logs (like logs of Submarine server). Logs for other dependency components (like Kubernetes logs when running on K8s). System metrics (Physical resource usages by daemons, launched training containers, etc.).  All these information should be handled by 3rd party system, such as Grafana, Prometheus, etc. And system admins are responsible to setup these infrastructures, dashboard. Users of submarine should NOT interact with system related metrics/logs. It is system admin's responsibility. "},{"title":"Attachable Volumes​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#attachable-volumes","content":"It is possible user has needs to have an attachable volume for their experiment / notebook, this is especially useful for notebook storage, since contents of notebook can be automatically saved, and it can be used as user's home folder. Downside of attachable volume is, it is not versioned, even notebook is mainly used for adhoc exploring tasks, an unversioned notebook file can lead to maintenance issues in the future. Since this is a common requirement, we can consider to support attachable volumes in Submarine in a long run, but with relatively lower priority. "},{"title":"In-scope / Out-of-scope​","type":1,"pageTitle":"Storage Implementation","url":"docs/next/designDocs/storage-implementation#in-scope--out-of-scope","content":"Describe what Submarine project should own and what Submarine project should NOT own. "},{"title":"Submarine Server Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/submarine-server/architecture","content":"","keywords":""},{"title":"Architecture Overview​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#architecture-overview","content":" +---------------Submarine Server ---+ | | | +------------+ +------------+ | | |Web Svc/Prxy| |Backend Svc | | +--Submarine Asset + | +------------+ +------------+ | |Project/Notebook | | ^ ^ | |Model/Metrics | +---|---------|---------------------+ |Libraries/Dataset | | | +------------------+ | | | +--|-Compute Cluster 1---+ +--Image Registry--+ + | | | | User's Images | User / | + | | | Admin | User Notebook Instance | +------------------+ | Experiment Runs | +------------------------+ +-Data Storage-----+ | S3/HDFS, etc. | +----Compute Cluster 2---+ | | +------------------+ ...  Here's a diagram to illustrate the Submarine's deployment. Submarine Server consists of web service/proxy, and backend services. They're like &quot;control planes&quot; of Submarine, and users will interact with these services.Submarine server could be a microservice architecture and can be deployed to one of the compute clusters. (see below, this will be useful when we only have one cluster).There're multiple compute clusters that could be used by Submarine service. For user's running notebook instance, jobs, etc. they will be placed to one of the compute clusters by user's preference or defined policies.Submarine's asset includes project/notebook(content)/models/metrics/dataset-meta, etc. can be stored inside Submarine's own database.Datasets can be stored in various locations such as S3/HDFS.Users can push container (such as Docker) images to a preconfigured registry in Submarine, so Submarine service can know how to pull required container images.Image Registry/Data-Storage, etc. are outside of Submarine server's scope and should be managed by 3rd party applications. "},{"title":"Submarine Server and its APIs​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#submarine-server-and-its-apis","content":"Submarine server is designed to allow data scientists to access notebooks, submit/manage jobs, manage models, create model training workflows, access datasets, etc. Submarine Server exposed UI and REST API. Users can also use CLI / SDK to manage assets inside Submarine Server.  +----------+ | CLI |+---+ +----------+ v +----------------+ +--------------+ | Submarine | +----------+ | REST API | | | | SDK |+&gt;| |+&gt; Server | +----------+ +--------------+ | | ^ +----------------+ +----------+ | | UI |+---+ +----------+  REST API will be used by the other 3 approaches. (CLI/SDK/UI) The REST API Service handles HTTP requests and is responsible for authentication. It acts as the caller for the JobManager component. The REST component defines the generic job spec which describes the detailed info about job. For more details, refer to here. (Please note that we're converting REST endpoint description from Java-based REST API to swagger definition, once that is done, we should replace the link with swagger definition spec). "},{"title":"Proposal​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#proposal","content":" +-----------+ | | | workbench +---+ +----------------------------------+ | | | | +------+ +---------------------+ | +-----------+ | | | | | +-------+ | | +---------------------+ | | | | | | K8s | | | | +--------+ +----+ | +-----------+ | | | | | +-------+ | | | | +--&gt;+job1| | | | | | | | | submitter | | | | | +----+ | | CLI +------&gt;+ | REST | +---------------------+ +----&gt;+ |operator| +----+ | | | | | | | +---------------------+ | | | +--&gt;+job2| | +-----------+ | | | | | +-------+ +-------+ | | | +--------+ +----+ | | | | | | |PlugMgr| |monitor| | | | K8s Cluster | +-----------+ | | | | | +-------+ +-------+ | | +---------------------+ | | | | | | | JobManager | | | SDK +---+ | +------+ +---------------------+ | | | +----------------------------------+ +-----------+ client server  We propose to split the original core module in the old layout into two modules, CLI and server as shown in FIG. The submarine-client calls the REST APIs to submit and retrieve the job info. The submarine-server provides the REST service, job management, submitting the job to cluster, and running job in different clusters through the corresponding runtime. "},{"title":"Submarine Server Components​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#submarine-server-components","content":" +----------------------Submarine Server--------------------------------+ | +-----------------+ +------------------+ +--------------------+ | | | Experiment | |Notebook Session | |Environment Mgr | | | | Mgr | |Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | Model Registry | |Model Serving Mgr | |Compute Cluster Mgr | | | | | | | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | DataSet Mgr | |User/Team | |Metadata Mgr | | | | | |Permission Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | +----------------------------------------------------------------------+  "},{"title":"Experiment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#experiment-manager","content":"TODO "},{"title":"Notebook Sessions Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#notebook-sessions-manager","content":"TODO "},{"title":"Environment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#environment-manager","content":"TODO "},{"title":"Model Registry​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#model-registry","content":"TODO "},{"title":"Model Serving Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#model-serving-manager","content":"TODO "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#compute-cluster-manager","content":"TODO "},{"title":"Dataset Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#dataset-manager","content":"TODO "},{"title":"User/team permissions manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#userteam-permissions-manager","content":"TODO "},{"title":"Metadata Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#metadata-manager","content":"TODO "},{"title":"Components/services outside of Submarine Server's scope​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/next/designDocs/submarine-server/architecture#componentsservices-outside-of-submarine-servers-scope","content":"TODO: Describe what are the out-of-scope components, which should be handled and managed outside of Submarine server. Candidates are: Identity management, data storage, metastore storage, etc. "},{"title":"Generic Experiment Spec","type":0,"sectionRef":"#","url":"docs/next/designDocs/submarine-server/experimentSpec","content":"","keywords":""},{"title":"Motivation​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#motivation","content":"As the machine learning platform, the submarine should support multiple machine learning frameworks, such as Tensorflow, Pytorch etc. But different framework has different distributed components for the training experiment. So that we designed a generic experiment spec to abstract the training experiment across different frameworks. In this way, the submarine-server can hide the complexity of underlying infrastructure differences and provide a cleaner interface to manager experiments "},{"title":"Proposal​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#proposal","content":"Considering the Tensorflow and Pytorch framework, we propose one spec which consists of library spec, submitter spec and task specs etc. Such as: name: &quot;mnist&quot; librarySpec: name: &quot;TensorFlow&quot; version: &quot;2.1.0&quot; image: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; cmd: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot; envVars: ENV_1: &quot;ENV1&quot; submitterSpec: type: &quot;k8s&quot; namespace: &quot;submarine&quot; taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  "},{"title":"Library Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#library-spec","content":"The library spec describes the info about machine learning framework. All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tNO\tMachine Learning Framework name. Only &quot;tensorflow&quot; and &quot;pytorch&quot; is supported. It doesn't matter if the value is uppercase or lowercase. version\tstring\tNO\tThe version of ML framework. Such as: 2.1.0 image\tstring\tNO\tThe public image used for each task if not specified. Such as: apache/submarine cmd\tstring\tYES\tThe public entry cmd for the task if not specified. envVars\tkey/value\tYES\tThe public env vars for the task if not specified. "},{"title":"Submitter Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#submitter-spec","content":"It describes the info of submitter which the user specified, such as k8s. All the fields as below: field\ttype\toptional\tdescriptiontype\tstring\tNO\tThe submitter type, supports k8s now configPath\tstring\tYES\tThe config path of the specified resource manager. You can set it in submarine-site.xml if run submarine-server locally namespace\tstring\tNO\tIt's known as namespace in Kubernetes. kind\tstring\tYES\tIt's used for k8s submitter, supports TFJob and PyTorchJob apiVersion\tstring\tYES\tIt should pair with the kind, such as the TFJob's api version is kubeflow.org/v1 "},{"title":"Task Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#task-spec","content":"It describes the task info, the tasks make up the experiment. So it must be specified when submit the experiment. All the tasks should putted into the key value collection. Such as: taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tYES\tThe experiment name, if not specify using the library name image\tstring\tYES\tThe experiment docker image cmd\tstring\tYES\tThe entry command for running task envVars\tkey/value\tYES\tThe environment variables for the task resources\tstring\tNO\tThe limit resource for the task. Formatter: cpu=%s,memory=%s,nvidia.com/gpu=%s "},{"title":"Implements​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/next/designDocs/submarine-server/experimentSpec#implements","content":"For more info see SUBMARINE-321 "},{"title":"Security Implementation","type":0,"sectionRef":"#","url":"docs/next/designDocs/wip-designs/security-implementation","content":"","keywords":""},{"title":"Handle User's Credential​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#handle-users-credential","content":"Users credential includes Kerberoes Keytabs, Docker registry credentials, Github ssh-keys, etc. User's credential must be stored securitely, for example, via KeyCloak or K8s Secrets. (More details TODO) "},{"title":"Authentication​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#authentication","content":"We use pac4j as the secure authentication component of submarine-server. Based on pac4j, we plan to support popular authentication services such as OAuth2/OpenID Connect (OIDC), LDAP, SAML, CAS, etc. and use a token-based method to handle external request services and internal message communication. In the initial version we will first integrate OAuth2/OIDC, LDAP, and a simple login mode that does not rely on other authentication services. There are already some PRs in the community to try to integrate some authentication services into submarine( New SSO function based on OIDC and Create rest api to authenticate user from LDAP ), We will try to do combines on the basis of these PRs together. "},{"title":"Supported authentication types​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#supported-authentication-types","content":"None​ When supported authentication, we will also support a way to turn off authentication and call the service directly, so that previous versions of submarine that not support authentication can call the service. Authentication is provided by default in submarine, but we can also turn off authentication by manually setting submarine.auth.type to none. Simple​ Provides a simple way for authentication. When users log in to the system, the username and password entered will be matched against the sys_user table within the system, and if the form is met a token will be generated and returned to the frontend. All services will need to carry the token in the request header to confirm the user's identity. Authorization: Bearer &lt;token&gt;  OAuth2​ Supports OAuth2 as a user authentication service, requiring a jump to a third-party authentication platform for single sign-on services when logging into submarine.Submarine requires an OAuth2 token as an authentication credential, including the refresh token. If the logged-in user is not in submarine, the user data will be created automatically. OIDC​ OIDC is similar to OAuth2, except that submarine.auth.oidc.discover.uri is required to support OpenID Connect Discovery, where an OpenID server publishes its metadata at a well-known URL, typically https://server.com/.well-known/openid-configuration  This URL returns a JSON listing of the OpenID/OAuth endpoints, supported scopes and claims, public keys used to sign the tokens, and other details. The pac4j can use this information to construct a request to the OpenID server. The field names and values are defined in the OpenID Connect Discovery Specification. Here is an example of data returned: { &quot;issuer&quot;: &quot;https://example.com/&quot;, &quot;authorization_endpoint&quot;: &quot;https://example.com/authorize&quot;, &quot;token_endpoint&quot;: &quot;https://example.com/token&quot;, &quot;userinfo_endpoint&quot;: &quot;https://example.com/userinfo&quot;, &quot;jwks_uri&quot;: &quot;https://example.com/.well-known/jwks.json&quot;, &quot;scopes_supported&quot;: [ &quot;pets_read&quot;, &quot;pets_write&quot;, &quot;admin&quot; ], &quot;response_types_supported&quot;: [ &quot;code&quot;, &quot;id_token&quot;, &quot;token id_token&quot; ], &quot;token_endpoint_auth_methods_supported&quot;: [ &quot;client_secret_basic&quot; ], ... }  LDAP​ [TODO] SAML​ [TODO] CAS​ [TODO] "},{"title":"Configuration​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#configuration","content":"Attribute\tDescription\tType\tDefault\tCommentsubmarine.auth.type\tSupported authentication types, currently available are: none, simple, oauth2/oidc, ldap, kerberos, saml, cas\tstring\tnone\tOnly one authentication method can be supported at any one time submarine.auth.token.maxAge\tExpiry time of the token (minute)\tint\t1 day submarine.auth.refreshToken.maxAge\tExpiry time of the refresh token (minute)\tint\t1 hour submarine.cookie.http.only\tHttpOnly Cookie\tboolean\tfalse submarine.cookie.secure\tSecure Cookie\tboolean\tfalse submarine.cookie.samesite\tSameSite Cookie, can be Lax, Strict, None(or empty)\tstring https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite submarine.auth.oauth2.client.id\tOAuth2 client id\tstring submarine.auth.oauth2.client.secret\tOAuth2 client secret\tstring submarine.auth.oauth2.client.flows\tOAuth2 flows, can be: authorizationCode, implicit, password or clientCredentials\tstring submarine.auth.oauth2.scopes\tThe available scopes for the OAuth2 security scheme. A map between the scope name and a short description for it.\tstring submarine.auth.oauth2.token.uri\tOAuth2 access token uri\tstring submarine.auth.oauth2.refresh.uri\tOAuth2 refresh token uri\tstring submarine.auth.oauth2.authorization.uri\tOAuth2 authorization uri\tstring submarine.auth.oauth2.logout.uri\tOAuth2 logout uri\tstring submarine.auth.oidc.client.id\tOIDC client id\tstring submarine.auth.oidc.client.secret\tOIDC client Secret\tstring submarine.auth.oidc.discover.uri\tOIDC discovery uri\tstring submarine.auth.ladp.provider.uri\tLDAP provider uri\tstring submarine.auth.ladp.baseDn\tLDAP base DN\tstring base DN is the base LDAP distinguished name for your LDAP server. For example, ou=dev,dc=xyz,dc=com submarine.auth.ladp.domain\tLDAP AD domain\tstring AD domain is the domain name of the AD server. For example, corp.domain.com "},{"title":"Design and implementation​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#design-and-implementation","content":"We use javax.servlet.Filter in the server to determine if authentication information exists for a user. The Filter is implemented for each authentication type and is configured according to the implementation of the type specified by pac4j. Also, a SecurityFactory class is provided that instantiates the specified Filter class into Jetty's filter based on submarine.auth.type. Except in the case of submarine.auth.type being none, and some APIs necessary for authentication (login requests, etc.), we will require the token to be included in the header. The token is generated and verified based on pac4j and processed inside the Filter class, incorrect token or no token will return a 401 HTTP code. When a token expires, it can be regenerated by calling the refresh token method. The default token expiry time is now set to 1 day (by modifying submarine.auth.token.maxAge) and the refresh token expiry time is 1 hour. "},{"title":"Users​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#users","content":"Describe the design of relevant user tables, user registration/modification/deletion processes, and the processing logic associated with authenticated login (including the mapping of attributes for automatically registered users when integrating with other authentication platforms, etc.). We use sys_user table to store user information for submarines. When submarine.auth.type is simple, the user's login operation will match user_name and password (encrypted) in sys_user. Only when the user name and password match will the login succeed. When submarine.auth.type is ldap, the user's login will operation request the LDAP and verify that the username and password are correct. A new record will be added to the sys_user table if the logged-in user does not exist. When logging in using other third-party authentication (OAuth2/OpenID Connect (OIDC), SAML, CAS etc.), the login page will automatically jump to the third-party service and revert back to the submarine after a successful login. A new record will be added to the sys_user table if the logged-in user does not exist. Department​ [TODO] Role​ [TODO] "},{"title":"RBAC​","type":1,"pageTitle":"Security Implementation","url":"docs/next/designDocs/wip-designs/security-implementation#rbac","content":"[TODO] "},{"title":"Submarine Launcher","type":0,"sectionRef":"#","url":"docs/next/designDocs/wip-designs/submarine-launcher","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#introduction","content":"Submarine is built and run in Cloud Native, taking advantage of the cloud computing model. To give full play to the advantages of cloud computing. These applications are characterized by rapid and frequent build, release, and deployment. Combined with the features of cloud computing, they are decoupled from the underlying hardware and operating system, and can easily meet the requirements of scalability, availability, and portability. And provide better economy. In the enterprise data center, submarine can support k8s/docker three resource scheduling systems; in the public cloud environment, submarine can support these cloud services in GCE/AWS/Azure; "},{"title":"Requirement​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#requirement","content":""},{"title":"Cloud-Native Service​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#cloud-native-service","content":"The submarine server is a long-running services in the daemon mode. The submarine server is mainly used by algorithm engineers to provide online front-end functions such as algorithm development, algorithm debugging, data processing, and workflow scheduling. And submarine server also mainly used for back-end functions such as scheduling and execution of jobs, tracking of job status, and so on. Through the ability of rolling upgrades, we can better provide system stability. For example, we can upgrade or restart the workbench server without affecting the normal operation of submitted jobs. You can also make full use of system resources. For example, when the number of current developers or job tasks increases, The number of submarine server instances can be adjusted dynamically. In addition, submarine will provide each user with a completely independent workspace container. This workspace container has already deployed the development tools and library files commonly used by algorithm engineers including their operating environment. Algorithm engineers can work in our prepared workspaces without any extra work. Each user's workspace can also be run through a cloud service. "},{"title":"Service discovery​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#service-discovery","content":"With the cluster function of submarine, each service only needs to run in the container, and it will automatically register the service in the submarine cluster center. Submarine cluster management will automatically maintain the relationship between service and service, service and user. "},{"title":"Design​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#design","content":" "},{"title":"Launcher​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher","content":"The submarine launcher module defines the complete interface. By using this interface, you can run the submarine server, and workspace in k8s / docker / Rancher / OpenShift / AWS / GCE / Azure. "},{"title":"Launcher On Docker​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-docker","content":"In order to allow some small and medium-sized users without k8s to use submarine, we support running the submarine system in docker mode. Users only need to provide several servers with docker runtime environment. The submarine system can automatically cluster these servers into clusters, manage all the hardware resources of the cluster, and run the service or workspace container in this cluster through scheduling algorithms. "},{"title":"Launcher On Kubernetes​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-kubernetes","content":"submarine operator "},{"title":"Launcher On Rancher​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-rancher","content":"This section is currently described based on the Rancher Desktop. Since we have replaced Traefik with Istio from 0.8.0, we need to turn off the Traefik in Kubernetes Settings. At the same time, we need to set kubernetes version to 1.21+, the minimum CPUs to 4, and the minimum Memory to 8G. Rancher Desktop use Local Path Provisioner as the provisioner for StorageClass by default, so we need to modify the relevant configuration of StorageClass when using Helm to install Submarine. storageClass: volumeBindingMode: WaitForFirstConsumer provisioner: rancher.io/local-path  For other installation, please refer to Launch submarine in the cluster. In addition, we can use kube-explorer to open Rancher Dashboard. "},{"title":"Launcher On OpenShift​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-openshift","content":"[TODO] "},{"title":"Launcher On AWS​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-aws","content":"[TODO] "},{"title":"Launcher On GCP​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-gcp","content":"[TODO] "},{"title":"Launcher On Azure​","type":1,"pageTitle":"Submarine Launcher","url":"docs/next/designDocs/wip-designs/submarine-launcher#launcher-on-azure","content":"[TODO] "},{"title":"Project Architecture","type":0,"sectionRef":"#","url":"docs/next/devDocs/","content":"","keywords":""},{"title":"1. Introduction​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#1-introduction","content":"This document mainly describes the structure of each module of the Submarine project, the development and test description of each module. "},{"title":"2. Submarine Project Structure​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#2-submarine-project-structure","content":""},{"title":"2.1. submarine-client​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#21-submarine-client","content":"Provide the CLI interface for submarine user. (Currently only support YARN service (deprecated)) "},{"title":"2.2. submarine-cloud-v2​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#22-submarine-cloud-v2","content":"The operator for Submarine application. For details, please see the README on github. "},{"title":"2.3. submarine-commons​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#23-submarine-commons","content":"Define utility function used in multiple packages, mainly related to hadoop. "},{"title":"2.4. submarine-dist​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#24-submarine-dist","content":"Store the pre-release files. "},{"title":"2.5. submarine-sdk​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#25-submarine-sdk","content":"Provide Python SDK for submarine user. "},{"title":"2.6. submarine-server​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#26-submarine-server","content":"Include core server, restful api, and k8s submitter. "},{"title":"2.7. submarine-test​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#27-submarine-test","content":"Provide end-to-end and k8s test for submarine. "},{"title":"2.8. submarine-workbench​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#28-submarine-workbench","content":"workbench-server: is a Jetty-based web server service. Workbench-server provides RESTful interface and Websocket interface. The RESTful interface provides workbench-web with management capabilities for databases such as project, department, user, and role.workbench-web: is a web front-end service based on Angular.js framework. With workbench-web users can manage Submarine project, department, user, role through browser. You can also use the notebook to develop machine learning algorithms, model release and other lifecycle management. "},{"title":"2.9 dev-support​","type":1,"pageTitle":"Project Architecture","url":"docs/next/devDocs/#29-dev-support","content":"mini-submarine: by using the docker image provided by Submarine, you can experience all the functions of Submarine in a single docker environment, while mini-submarine also provides developers with a development and testing environment, Avoid the hassle of installing and deploying the runtime environment.submarine-installer: submarine-installer is our submarine runtime environment installation tool for yarn-3.1+ and above.By using submarine-installer, it is easy to install and deploy system services such asdocker, nvidia-docker, nvidia driver, ETCD, Calico network etc. required by yarn-3.1+. "},{"title":"Dependencies for Submarine","type":0,"sectionRef":"#","url":"docs/next/devDocs/Dependencies","content":"","keywords":""},{"title":"Kubernetes​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#kubernetes","content":"Kubernetes Version\tSupport?1.18.x (or earlier)\tX 1.19.x - 1.21.x\t√ 1.22.x (or later)\tX "},{"title":"KinD​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#kind","content":"KinD Version\tSupport?0.5.x (or earlier)\tX 0.6.x - 0.17.x\t√ "},{"title":"Java​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#java","content":"JDK Version\tSupport?8\t√ 11\t√ 17\tX "},{"title":"Maven​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#maven","content":"3.3 or later ( &lt; 3.8.1 ) "},{"title":"Docker​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#docker","content":"Latest "},{"title":"Helm​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#helm","content":"Version 3 "},{"title":"NodeJS​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#nodejs","content":"14 (or later) "},{"title":"Go​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#go","content":"Go Version\tSupport?1.15\tX 1.16\t√ 1.17\t√ 1.18 (or later)\tTo be verified "},{"title":"Python​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/next/devDocs/Dependencies#python","content":"Python Version\tSupport?3.6 (or earlier)\tX 3.7\t√ 3.8\t√ 3.9\t√ 3.10\t√ "},{"title":"How to Build Submarine","type":0,"sectionRef":"#","url":"docs/next/devDocs/BuildFromCode","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"How to Build Submarine","url":"docs/next/devDocs/BuildFromCode#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Quick Start​","type":1,"pageTitle":"How to Build Submarine","url":"docs/next/devDocs/BuildFromCode#quick-start","content":""},{"title":"Build Your Custom Submarine Docker Images​","type":1,"pageTitle":"How to Build Submarine","url":"docs/next/devDocs/BuildFromCode#build-your-custom-submarine-docker-images","content":"Submarine provides default Docker image in the release artifacts, sometimes you would like to do some modifications on the images. You can rebuild Docker image after you make changes. Note that you need to make sure the images built above can be accessed in k8s Usually this needs to rename and push to a proper Docker registry. mvn clean package -DskipTests  Build submarine server image: ./dev-support/docker-images/submarine/build.sh  Build submarine database image: ./dev-support/docker-images/database/build.sh  "},{"title":"Checking releases for licenses​","type":1,"pageTitle":"How to Build Submarine","url":"docs/next/devDocs/BuildFromCode#checking-releases-for-licenses","content":"mvn clean org.apache.rat:apache-rat-plugin:check  "},{"title":"Building source code / binary distribution with Maven Wrapper​","type":1,"pageTitle":"How to Build Submarine","url":"docs/next/devDocs/BuildFromCode#building-source-code--binary-distribution-with-maven-wrapper","content":"Maven Wrapper (Optional): Maven Wrapper can help you avoid dependencies problem about Maven version. # Setup Maven Wrapper (Maven 3.6.1) mvn -N io.takari:maven:0.7.7:wrapper -Dmaven=3.6.1 # Check Maven Wrapper ./mvnw -version # Replace 'mvn' with 'mvnw'. Example: ./mvnw clean package -DskipTests  "},{"title":"Development Guide","type":0,"sectionRef":"#","url":"docs/next/devDocs/Development","content":"","keywords":""},{"title":"Video​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#video","content":"From this Video, you will know how to deal with the configuration of Submarine and be able to contribute to it via Github. "},{"title":"Develop server​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#develop-server","content":""},{"title":"Prerequisites​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Setting up checkstyle in IDE​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#setting-up-checkstyle-in-ide","content":"Checkstyle plugin may help to detect violations directly from the IDE. Install Checkstyle+IDEA plugin from Preference -&gt; PluginsOpen Preference -&gt; Tools -&gt; Checkstyle Set Checkstyle version: Checkstyle version: 8.0 Add (+) a new Configuration File Description: SubmarineUse a local checkstyle ${SUBMARINE_HOME}/dev-support/maven-config/checkstyle.xml Open the Checkstyle Tool Window, select the Submarine rule and execute the check "},{"title":"Testing​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#testing","content":"Unit Test For each class, there is a corresponding testClass. For example, SubmarineServerTest is used for testing SubmarineServer. Whenever you add a funtion in classes, you must write a unit test to test it. Integration Test: IntegrationTestK8s.md "},{"title":"Build from source​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#build-from-source","content":"Before building We assume the developer use minikube as a local kubernetes cluster.Make sure you have installed the submarine helm-chart in the cluster. Package the Submarine server into a new jar file mvn install -DskipTests Build the new server docker image in minikube # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/submarine/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Delete the server deployment and the operator will create a new one using the new image kubectl delete deployment submarine-server -n submarine-user-test  "},{"title":"Develop workbench​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#develop-workbench","content":"Deploy the Submarine Follow Getting Started/Quickstart, and make sure you can connect to http://localhost:32080 in the browser. Install the dependencies cd submarine-workbench/workbench-web npm install Run the workbench based on proxy server npm run start The request sent to http://localhost:4200 will be redirected to http://localhost:32080.Open http://localhost:4200 in browser to see the real-time change of workbench. Frontend E2E test: IntegrationTestE2E.md "},{"title":"Develop database​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#develop-database","content":"Build the docker image # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/database/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Deploy new pods in the cluster helm upgrade --set submarine.database.dev=true submarine ./helm-charts/submarine  "},{"title":"Develop operator​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#develop-operator","content":"For details, please check out the README and Developer Guide on GitHub. "},{"title":"Develop Submarine Website​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#develop-submarine-website","content":"Submarine website is built using Docusaurus 2, a modern static website generator. We store all the website content in markdown format in the submarine/website/docs. When committing a new patch to the submarine repo, Docusaurus will help us generate the html and javascript files and push them to https://github.com/apache/submarine-site/tree/asf-site. To update the website, click “Edit this page” on the website.  "},{"title":"Add a new page​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#add-a-new-page","content":"If you want to add a new page to the website, make sure to add the file path to sidebars.js. "},{"title":"Installation​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#installation","content":"We use the yarn package manager to install all dependencies for the website yarn install  "},{"title":"Build​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#build","content":"Make sure you can successfully build the website before creating a pull request. yarn build  "},{"title":"Local Development​","type":1,"pageTitle":"Development Guide","url":"docs/next/devDocs/Development#local-development","content":"This command starts a local development server and open up a browser window. Most changes are reflected live without having to restart the server. yarn start  "},{"title":"How to Verify","type":0,"sectionRef":"#","url":"docs/next/devDocs/HowToVerify","content":"","keywords":""},{"title":"Verification of the release candidate​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#verification-of-the-release-candidate","content":""},{"title":"1. Download the candidate version to be released to the local environment​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#1-download-the-candidate-version-to-be-released-to-the-local-environment","content":"svn co https://dist.apache.org/repos/dist/dev/submarine/${release_version}-${rc_version}/  "},{"title":"2. Verify whether the uploaded version is compliant​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#2-verify-whether-the-uploaded-version-is-compliant","content":"Begin the verification process, which includes but is not limited to the following content and forms. "},{"title":"2.1 Check if the release package is complete​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#21-check-if-the-release-package-is-complete","content":"The package uploaded to dist must include the source code package, and the binary package is optional. Whether it includes the source code package.Whether it includes the signature of the source code package.Whether it includes the sha512 of the source code package.If the binary package is uploaded, also check the contents listed in (2)-(4). "},{"title":"2.2 Check gpg signature​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#22-check-gpg-signature","content":"Import the public key curl https://dist.apache.org/repos/dist/dev/submarine/KEYS &gt; KEYS # Download KEYS gpg --import KEYS # Import KEYS to local  Trust the public key Trust the KEY used in this version.  gpg --edit-key xxxxxxxxxx # The KEY used in this version gpg (GnuPG) 2.2.21; Copyright (C) 2020 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Secret key is available. sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; gpg&gt; trust sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; Please decide how far you trust this user to correctly verify other users' keys (by looking at passports, checking fingerprints from different sources, etc.) 1 = I don't know or won't say 2 = I do NOT trust 3 = I trust marginally 4 = I trust fully 5 = I trust ultimately m = back to the main menu Your decision? 5 #choose 5 Do you really want to set this key to ultimate trust? (y/N) y # choose y sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; gpg&gt; sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt;  Use the following command to check the signature. for i in *.tar.gz; do echo $i; gpg --verify $i.asc $i ; done #Or gpg --verify apache-submarine-${release_version}-src.tar.gz.asc apache-submarine-${release_version}-src.tar.gz # If you upload a binary package, you also need to check whether the signature of the binary package is correct. gpg --verify apache-submarine-server-${release_version}-bin.tar.gz.asc apache-submarine-server-${release_version}-bin.tar.gz gpg --verify apache-submarine-client-${release_version}-bin.tar.gz.asc apache-submarine-client-${release_version}-bin.tar.gz  Check the result If something like the following appears, it means that the signature is correct. The keyword：Good signature apache-submarine-${release_version}-src.tar.gz gpg: Signature made Sat May 30 11:45:01 2020 CST gpg: using RSA key 9B12C2228BDFF4F4CFE849445EF3A66D57EC647A gpg: Good signature from &quot;XXX YYYZZZ &lt;yourAccount@apache.org&gt;&quot; [ultimate]gular2  "},{"title":"2.3 Check sha512 hash​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#23-check-sha512-hash","content":"After calculating the sha512 hash locally, verify whether it is consistent with the one on dist. for i in *.tar.gz; do echo $i; gpg --print-md SHA512 $i; done #Or gpg --print-md SHA512 apache-submarine-${release_version}-src.tar.gz # If you upload a binary package, you also need to check the sha512 hash of the binary package. gpg --print-md SHA512 apache-submarine-server-${release_version}-bin.tar.gz gpg --print-md SHA512 apache-submarine-client-${release_version}-bin.tar.gz # 或者 for i in *.tar.gz.sha512; do echo $i; sha512sum -c $i; done  "},{"title":"2.4. Check the file content of the source package.​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#24-check-the-file-content-of-the-source-package","content":"Unzip apache-submarine-${release_version}-src.tar.gz and check as follows: Whether the DISCLAIMER file exists and whether the content is correct.Whether the LICENSE and NOTICE file exists and whether the content is correct.Whether all files have ASF License header.Whether the source code can be compiled normally.Whether the single test is passed..... "},{"title":"2.5 Check the binary package (if the binary package is uploaded)​","type":1,"pageTitle":"How to Verify","url":"docs/next/devDocs/HowToVerify#25-check-the-binary-package-if-the-binary-package-is-uploaded","content":"Unzip apache-submarine-client-${release_version}-src.tar.gz and apache-submarine-server-${release_version}-src.tar.gz, then check as follows: Whether the DISCLAIMER file exists and whether the content is correct.Whether the LICENSE and the NOTICE file exists and whether the content is correct.Whether the deployment is successful.Deploy a test environment to verify whether production and consumption can run normally.Verify what you think might go wrong. "},{"title":"How to Run Frontend Integration Test","type":0,"sectionRef":"#","url":"docs/next/devDocs/IntegrationTestE2E","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/next/devDocs/IntegrationTestE2E#introduction","content":"The test cases under the directory test-e2e are integration tests to ensure the correctness of the Submarine Workbench. These test cases can be run either locally or on GitHub Actions. "},{"title":"Run E2E test locally​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/next/devDocs/IntegrationTestE2E#run-e2e-test-locally","content":"Ensure you have setup the submarine locally. If not, you can refer to Quickstart. Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80 Modify run_frontend_e2e.sh You need to modify the port and the URL in this script to where you run the workbench on. Example: If your Submarine workbench is running on 127.0.0.1:4200, you should modify the WORKBENCH_PORT to 4200. # at submarine-test/test_e2e/run_frontend_e2e.sh ... # ======= Modifiable Variables ======= # # Note: URL must start with &quot;http&quot; # (Ref: https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/WebDriver.html#get(java.lang.String)) WORKBENCH_PORT=8080 #&lt;= modify this URL=&quot;http://127.0.0.1&quot; #&lt;=modify this # ==================================== # ... Run run_frontend_e2e.sh (Run a specific test case) This script will check whether the port can be accessed or not, and run the test case. # at submarine-test/test_e2e ./run_fronted_e2e.sh ${TESTCASE} # TESTCASE is the IT you want to run, ex: loginIT, experimentIT... Run all test cases Following commands will compile all files and run all files ending with &quot;IT&quot; in the directory. # Make sure the Submarine workbench is running on 127.0.0.1:8080 cd submarine/submarine-test/test-e2e # Method 1: mvn verify # Method 2: mvn clean install -U  "},{"title":"Run E2E test in GitHub Actions​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/next/devDocs/IntegrationTestE2E#run-e2e-test-in-github-actions","content":"Each time a commit is pushed, GitHub Actions will be triggered automatically. "},{"title":"Add a new frontend E2E test case​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/next/devDocs/IntegrationTestE2E#add-a-new-frontend-e2e-test-case","content":"WARNING You MUST read the document carefully, and understand the difference between explicit wait, implicit wait, and fluent wait.Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. We define many useful functions in AbstractSubmarineIT.java. "},{"title":"How to Release","type":0,"sectionRef":"#","url":"docs/next/devDocs/HowToRelease","content":"","keywords":""},{"title":"0. Preface​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#0-preface","content":"Source Release is the focus of Apache’s attention and it is also a required content for release. Binary Release is optional, Submarine can choose whether to release the binary package to the Apache warehouse or to the Maven central warehouse. Please refer to the following link to find more details about release guidelines: How to Release Submarine Release Guidelines "},{"title":"1. Add GPG KEY​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#1-add-gpg-key","content":"Main references in this chapter:https://infra.apache.org/openpgp.html &gt; This chapter is only needed for the first release manager of the project. "},{"title":"1.1 Install gpg​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#11-install-gpg","content":"Detailed installation documents can refer to tutorial, The environment configuration of Mac OS is as follows: $ brew install gpg $ gpg --version #Check the version，should be 2.x  "},{"title":"1.2 generate gpg Key​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#12-generate-gpg-key","content":"Need to pay attention to the following points：​ When entering the name, it is better to be consistent with the Full name registered in ApacheThe mailbox used should be apache mailboxIt’s better to use pinyin or English for the name, otherwise there will be garbled characters Follow the hint，generate a key​ ➜ ~ gpg --full-gen-key gpg (GnuPG) 2.2.20; Copyright (C) 2020 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) (14) Existing key from card Your selection? 1 # enter 1 here RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 4096 # enter 4096 here Requested keysize is 4096 bits Please specify how long the key should be valid. 0 = key does not expire &lt;n&gt; = key expires in n days &lt;n&gt;w = key expires in n weeks &lt;n&gt;m = key expires in n months &lt;n&gt;y = key expires in n years Key is valid for? (0) 0 # enter 0 here Key does not expire at all Is this correct? (y/N) y # enter y here GnuPG needs to construct a user ID to identify your key. Real name: Guangxu Cheng # enter your name here Email address: gxcheng@apache.org # enter your mailbox here Comment: # enter some comment here (Optional) You selected this USER-ID: &quot;Guangxu Cheng &lt;gxcheng@apache.org&gt;&quot; Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O #enter O here We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. # A dialog box will pop up, asking you to enter the key for this gpg. ┌──────────────────────────────────────────────────────┐ │ Please enter this passphrase │ │ │ │ Passphrase: _______________________________ │ │ │ │ &lt;OK&gt; &lt;Cancel&gt; │ └──────────────────────────────────────────────────────┘ # After entering the secret key, it will be created. And it will output the following information. gpg: key 2DD587E7B10F3B1F marked as ultimately trusted gpg: revocation certificate stored as '/Users/cheng/.gnupg/openpgp-revocs.d/41936314E25F402D5F7D73152DD587E7B10F3B1F.rev' public and secret key created and signed. pub rsa4096 2020-05-19 [SC] 41936314E25F402D5F7D73152DD587E7B10F3B1F uid Guangxu Cheng &lt;gxcheng@apache.org&gt; sub rsa4096 2020-05-19 [E]  "},{"title":"1.3 Upload the generated key to the public server​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#13-upload-the-generated-key-to-the-public-server","content":"➜ ~ gpg --list-keys ------------------------------- pub rsa4096 2020-05-18 [SC] 5931F8CFD04B37A325E4465D8C0D31C4149B3A87 uid [ultimate] Guangxu Cheng &lt;gxcheng@apache.org&gt; sub rsa4096 2020-05-18 [E] # Send public key to keyserver via key id $ gpg --keyserver pgpkeys.mit.edu --send-key &lt;key id&gt; # Among them, pgpkeys.mit.edu is a randomly selected keyserver, and the keyserver list is: https://sks-keyservers.net/status/, which is automatically synchronized with each other, you can choose any one.  "},{"title":"1.4 Check whether the key is created successfully​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#14-check-whether-the-key-is-created-successfully","content":"Through the following URL, use the email to check whether the upload is successful or not. It will take about a minute to find out. When searching, check the show full-key hashes under advance on http://keys.gnupg.net. The query results are as follows: "},{"title":"1.5 Add your gpg public key to the KEYS file​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#15-add-your-gpg-public-key-to-the-keys-file","content":"SVN is required for this step The svn library of the DEV branch is https://dist.apache.org/repos/dist/dev/submarine The SVN library of the Release branch is https://dist.apache.org/repos/dist/release/submarine 1.5.1 Add the public key to KEYS in the dev branch to release the RC version​ ➜ ~ svn co https://dist.apache.org/repos/dist/dev/submarine /tmp/submarine-dist-dev # This step is relatively slow, and all versions will be copied. If the network is disconnected, use svn cleanup to delete the lock and re-execute it, and the transfer will be resumed. ➜ ~ cd submarine-dist-dev ➜ submarine-dist-dev ~ (gpg --list-sigs YOUR_NAME@apache.org &amp;&amp; gpg --export --armor YOUR_NAME@apache.org) &gt;&gt; KEYS # Append the KEY you generated to the file KEYS, it is best to check if it is correct after appending. ➜ submarine-dist-dev ~ svn add . # If there is a KEYS file before, it is not needed. ➜ submarine-dist-dev ~ svn ci -m &quot;add gpg key for YOUR_NAME&quot; # Next, you will be asked to enter a username and password, just use your apache username and password.  1.5.2 Add the public key to KEYS in the release branch to release the official version​ ➜ ~ svn co https://dist.apache.org/repos/dist/release/submarine /tmp/submarine-dist-release ➜ ~ cd submarine-dist-release ➜ submarine-dist-release ~ (gpg --list-sigs YOUR_NAME@apache.org &amp;&amp; gpg --export --armor YOUR_NAME@apache.org) &gt;&gt; KEYS # Append the KEY you generated to the file KEYS, it is best to check if it is correct after appending. ➜ submarine-dist-release ~ svn add . # If there is a KEYS file before, it is not needed. ➜ submarine-dist-release ~ svn ci -m &quot;add gpg key for YOUR_NAME&quot; # Next, you will be asked to enter a username and password, just use your apache username and password.  "},{"title":"1.6 Upload GPG public key to Github account​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#16-upload-gpg-public-key-to-github-account","content":"Go to https://github.com/settings/keys and add GPG KEYS.If you find &quot;unverified&quot; is written after the key after adding it, remember to bind the mailbox used in the GPG key to your github account (https://github.com/settings/emails). "},{"title":"2. Set maven settings​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#2-set-maven-settings","content":"Skip if it has already been set In the maven configuration file ~/.m2/settings.xml, add the following &lt;server&gt; item &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;settings xsi:schemaLocation=&quot;http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd&quot; xmlns=&quot;http://maven.apache.org/SETTINGS/1.1.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;servers&gt; &lt;!-- Apache Repo Settings --&gt; &lt;server&gt; &lt;id&gt;apache.snapshots.https&lt;/id&gt; &lt;username&gt;{user-id}&lt;/username&gt; &lt;password&gt;{user-pass}&lt;/password&gt; &lt;/server&gt; &lt;server&gt; &lt;id&gt;apache.releases.https&lt;/id&gt; &lt;username&gt;{user-id}&lt;/username&gt; &lt;password&gt;{user-pass}&lt;/password&gt; &lt;/server&gt; &lt;/servers&gt; &lt;profiles&gt; &lt;profile&gt; &lt;id&gt;apache-release&lt;/id&gt; &lt;properties&gt; &lt;gpg.keyname&gt;Your KEYID&lt;/gpg.keyname&gt;&lt;!-- Your GPG Keyname here --&gt; &lt;!-- Use an agent: Prevents being asked for the password during the build --&gt; &lt;gpg.useagent&gt;true&lt;/gpg.useagent&gt; &lt;gpg.passphrase&gt;Your password of the private key&lt;/gpg.passphrase&gt; &lt;/properties&gt; &lt;/profile&gt; &lt;/profiles&gt; &lt;/settings&gt;  "},{"title":"3. Compile and package​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#3-compile-and-package","content":""},{"title":"3.1 Prepare a branch​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#31-prepare-a-branch","content":"Pull the new branch from the main branch as a release branch, release-${release_version} Update CHANGES.md Check whether the code is normal, including successful compilation, all unit tests, successful RAT check, etc. # build check $ mvn clean package -Dmaven.javadoc.skip=true # RAT check $ mvn apache-rat:check Change the version number "},{"title":"3.2 Create the tag​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#32-create-the-tag","content":"Before creating the tag, make sure that the code has been checked for errors, including: successful compilation, all unit tests, and successful RAT checks, etc. Create a tag with signature $ git_tag=${release_version}-${rc_version} $ git tag -s $git_tag -m &quot;Tagging the ${release_version} first Releae Candidate (Candidates start at zero)&quot; # If a error happened like gpg: signing failed: secret key not available, set the private key first. $ git config user.signingkey ${KEY_ID}  "},{"title":"3.3 Package the source code​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#33-package-the-source-code","content":"After the tag is successfully created, the tag source code should be packaged into a tar package. mkdir /tmp/apache-submarine-${release_version}-${rc_version} git archive --format=tar.gz --output=&quot;/tmp/apache-submarine-${release_version}-${rc_version}/apache-submarine-${release_version}-src.tar.gz&quot; --prefix=&quot;apache-submarine-${release_version}/&quot; $git_tag  "},{"title":"3.4 Packaged binary package​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#34-packaged-binary-package","content":"Compile the source code packaged in the previous step cd /tmp/apache-submarine-${release_version}-${rc_version} # Enter the source package directory. tar xzvf apache-submarine-${release_version}-src.tar.gz # Unzip the source package. cd apache-submarine-${release_version} # Enter the source directory. mvn compile clean install package -DskipTests # Compile. cp ./submarine-distribution/target/apache-submarine-${release_version}-bin.tar.gz /tmp/apache-submarine-${release_version}-${rc_version}/ # Copy the binary package to the source package directory to facilitate signing the package in the next step.  "},{"title":"3.5 Sign the source package/binary package/sha512​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#35-sign-the-source-packagebinary-packagesha512","content":"for i in *.tar.gz; do echo $i; gpg --print-md SHA512 $i &gt; $i.sha512 ; done # Calculate SHA512 for i in *.tar.gz; do echo $i; gpg --armor --output $i.asc --detach-sig $i ; done # Calculate the signature  "},{"title":"3.6 Check whether the generated signature/sha512 is correct​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#36-check-whether-the-generated-signaturesha512-is-correct","content":"For example, verify that the signature is correct as follows: for i in *.tar.gz; do echo $i; gpg --verify $i.asc $i ; done  "},{"title":"4. Prepare for Apache release​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#4-prepare-for-apache-release","content":""},{"title":"4.1 Publish the jar package to the Apache Nexus repository​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#41-publish-the-jar-package-to-the-apache-nexus-repository","content":"cd /tmp/apache-submarine-${release_version}-${rc_version} # Enter the source package directory tar xzvf apache-submarine-${release_version}-src.tar.gz # Unzip the source package cd apache-submarine-${release_version} mvn -DskipTests deploy -Papache-release -Dmaven.javadoc.skip=true # Start upload  "},{"title":"4.2 Upload the tag to git repository​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#42-upload-the-tag-to-git-repository","content":"git push origin ${release_version}-${rc_version}  "},{"title":"4.3 Upload the compiled file to dist​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#43-upload-the-compiled-file-to-dist","content":"This step requires the use of SVN, the svn library of the DEV branch is https://dist.apache.org/repos/dist/dev/submarine "},{"title":"4.3.1 Checkout Submarine to a local directory​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#431-checkout-submarine-to-a-local-directory","content":"# This step may be slow, and all versions will be tested. If the network is broken, use svn cleanup to delete the lock and re-execute it, and the upload will be resumed. svn co https://dist.apache.org/repos/dist/dev/submarine /tmp/submarine-dist-dev  "},{"title":"4.3.2 Add the public key to the KEYS file and submit it to the SVN repository​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#432-add-the-public-key-to-the-keys-file-and-submit-it-to-the-svn-repository","content":"cd /tmp/submarine-dist-dev mkdir ${release_version}-${rc_version} # Create version directory # Copy the source code package and signed package here. cp /tmp/apache-submarine-${release_version}-${rc_version}/*tar.gz* ${release_version}-${rc_version}/ svn status # Check svn status. svn add ${release_version}-${rc_version} # Add to svn version. svn status # Check svn status. svn commit -m &quot;prepare for ${release_version} ${rc_version}&quot; # Submit to svn remote server.  "},{"title":"4.4 Shut down the Apache Staging repository​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#44-shut-down-the-apache-staging-repository","content":"Please make sure all artifacts are fine. Log in http://repository.apache.org , with Apache accountClick on Staging repositories on the left.Search for Submarine keywords and select the repository you uploaded recently.Click the Close button above, and a series of checks will be performed during this process.After the check is passed, a link will appear on the Summary tab below. Please save this link and put it in the next voting email. The link should look like: https://repository.apache.org/content/repositories/orgapachesubmarine-xxxx WARN: Please note that clicking Close may fail, please check the reason for the failure and deal with it. "},{"title":"5. Enter voting​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#5-enter-voting","content":"To vote in the Submarine community, send an email to:dev@submarine.apache.org "},{"title":"Vote in the Submarine community​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#vote-in-the-submarine-community","content":"Voting template​ Title：[VOTE] Submarine-${release_version}-${rc_version} is ready for a vote! Content： Hi folks, Thanks to everyone's help on this release. I've created a release candidate (${rc_version}) for submarine ${release_version}. The highlighted features are as follows: 1. AAA 2. BBB 3. CCC The mini-submarine image is here: docker pull apache/submarine:mini-${release_version}-${rc_version} The RC tag in git is here: https://github.com/apache/submarine/releases/tag/release-${release_version}-${rc_version} The RC release artifacts are available at: http://home.apache.org/~pingsutw/submarine-${release_version}-${rc_version} The Maven staging repository is here: https://repository.apache.org/content/repositories/orgapachesubmarine-1030 My public key is here: https://dist.apache.org/repos/dist/release/submarine/KEYS *This vote will run for 7 days, ending on DDDD/EE/FF at 11:59 pm PST.* For the testing, I have verified the 1. Build from source, Install Submarine on minikube 2. Workbench UI (Experiment / Notebook / Template / Environment) 3. Experiment / Notebook / Template / Environment REST API My +1 to start. Thanks! BR, XXX  Announce voting results template​ Title：[RESULT][VOTE] Release Apache Submarine ${release_version} ${rc_version} Content： Hello Apache Submarine PMC and Community, The vote closes now as 72hr have passed. The vote PASSES with xx (+1 non-binding) votes from the PMC, xx (+1 non-binding) vote from the rest of the developer community, and no further 0 or -1 votes. The vote thread:{vote_mail_address} Thank you for your support. Your Submarine Release Manager  "},{"title":"6. Officially released​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#6-officially-released","content":""},{"title":"6.1 Merge the changes from the release-${release_version} branch to the master branch​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#61-merge-the-changes-from-the-release-release_version-branch-to-the-master-branch","content":""},{"title":"6.2 Release the version in the Apache Staging repository​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#62-release-the-version-in-the-apache-staging-repository","content":"Please make sure all artifacts are fine. Log in to http://repository.apache.org with your Apache account.Click on Staging repositories on the left.Search for Submarine keywords, select your recently uploaded repository, the repository specified in the voting email.Click the Release button above, and a series of checks will be carried out during this process.It usually takes 24 hours to wait for the repository to synchronize to other data sources "},{"title":"6.3 Update official website link​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#63-update-official-website-link","content":""},{"title":"6.4. Send an email todev@submarine.apache.org​","type":1,"pageTitle":"How to Release","url":"docs/next/devDocs/HowToRelease#64-send-an-email-todevsubmarineapacheorg","content":"Please make sure that the repository in 6.4 has been successfully released, generally the email is sent 24 hours after 6.4 Announce release email template: Title： [ANNOUNCE] Apache Submarine ${release_version} release! Content： Hi folks, It's a great honor for me to announce that the Apache Submarine Community has released Apache Submarine ${release_version}! The highlighted features are: 1. AAA 2. BBB 3. CCC Tons of thanks to our contributors and community! Let's keep fighting! *Apache Submarine ${release_version} released*: https://submarine.apache.org/docs/next/releases/submarine-release-${release_version} BR, XXXX  "},{"title":"How to Run Integration K8s Test","type":0,"sectionRef":"#","url":"docs/next/devDocs/IntegrationTestK8s","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/next/devDocs/IntegrationTestK8s#introduction","content":"The test cases under the directory test-k8s are integration tests to ensure the correctness of the Submarine RESTful API. You can run these tests either locally or on GitHub Actions. Before running the tests, the minikube (KinD) cluster must be created. Then, compile and package the submarine project in submarine-dist directory for building a docker image. In addition, the 8080 port in submarine-traefik should be forwarded. "},{"title":"Run k8s test locally​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/next/devDocs/IntegrationTestK8s#run-k8s-test-locally","content":"Ensure you have setup the KinD cluster or minikube cluster. If you haven't, follow this minikube tutorial Build the submarine from source and upgrade the server pod through this guide Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 8080:80 Install the latest package &quot;submarine-server-core&quot; into the local repository, for use as a dependency in the module test-k8s mvn install -DskipTests Execute the test command mvn verify -DskipRat -pl :submarine-test-k8s -Phadoop-2.9 -B   "},{"title":"Run k8s test in GitHub Actions​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/next/devDocs/IntegrationTestK8s#run-k8s-test-in-github-actions","content":"Each time a code is submitted, GitHub Actions is triggered automatically. "},{"title":"下载 Apache Submarine","type":0,"sectionRef":"#","url":"docs/next/download","content":"","keywords":""},{"title":"验证文件完整性​","type":1,"pageTitle":"下载 Apache Submarine","url":"docs/next/download#验证文件完整性","content":"您必须使用 PGP 或 MD5 签名来 验证 下载文件的完整性。 此签名应与 KEYS 文件匹配。 gpg --import KEYS gpg --verify submarine-dist-X.Y.Z-src.tar.gz.asc  "},{"title":"旧版本​","type":1,"pageTitle":"下载 Apache Submarine","url":"docs/next/download#旧版本","content":"Apache Submarine 0.6.0 于2021年10月21日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.6.0-hadoop-2.9.tar.gz (518 MB, checksum, signature) 源代码:submarine-dist-0.6.0-src.tar.gz (8.3 MB, checksum, signature)) Docker 镜像: mini-submarine docker pull apache/submarine:mini-0.6.0submarine server docker pull apache/submarine:server-0.6.0submarine database docker pull apache/submarine:database-0.6.0submarine jupyter-notebook docker pull apache/submarine:jupyter-notebook-0.6.0submarine quickstart docker pull apache/submarine:quickstart-0.6.0submarine serve docker pull apache/submarine:serve-0.6.0submarine mlflow docker pull apache/submarine:mlflow-0.6.0submarine operator docker pull apache/submarine:operator-0.6.0 SDK: PySubmarine pip install apache-submarine==0.6.0 Apache Submarine 0.5.0 于2020年12月17日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.5.0-hadoop-2.9.tar.gz (505 MB, checksum, signature)源代码:submarine-dist-0.5.0-src.tar.gz (5.0 MB, checksum, signature))Docker 镜像: mini-submarine docker pull apache/submarine:mini-0.5.0submarine server docker pull apache/submarine:server-0.5.0submarine database docker pull apache/submarine:database-0.5.0submarine jupyter-notebook docker pull apache/submarine:jupyter-notebook-0.5.0 SDK: PySubmarine pip install apache-submarine==0.5.0 Apache Submarine 0.4.0于2020年7月5日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.4.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)源代码:submarine-dist-0.4.0-src.tar.gz (6 MB,checksum,signature)Docker 镜像:mini-submarine (guide) Apache Submarine 0.3.0 于2020年2月1日发布 (发布公告) (git tag) submarine 二进制部署包:submarine-dist-0.3.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)源代码:submarine-dist-0.3.0-src.tar.gz (6 MB,checksum,signature)Docker 镜像:mini-submarine (guide) Apache Submarine 0.2.0 于2019年7月2日发布 submarine 二进制部署包:hadoop-submarine-0.2.0.tar.gz (111 MB,checksum,signature,Announcement) 源代码:hadoop-submarine-0.2.0-src.tar.gz (1.4 MB,checksum,signature) Apache Submarine 0.1.0 于2019年1月16日发布 submarine 二进制部署包:submarine-0.2.0-bin-all.tgz (97 MB,checksum,signature,Announcement) 源代码:submarine-hadoop-3.2.0-src.tar.gz (1.1 MB,checksum,signature) "},{"title":"Custom Configuation","type":0,"sectionRef":"#","url":"docs/next/gettingStarted/helm","content":"","keywords":""},{"title":"Helm Chart Volume Type​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#helm-chart-volume-type","content":"Submarine can support various volume types, currently including hostPath (default) and NFS. It can be easily configured in the ./helm-charts/submarine/values.yaml, or you can override the default values in values.yaml by helm CLI. hostPath​ In hostPath, you can store data directly in your node.Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable hostPath storage, set .storage.type to host.To set the root path for your storage, set .storage.host.root to &lt;any-path&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: host host: root: /tmp  NFS (Network File System)​ In NFS, it allows multiple clients to access a shared space.Prerequisite: A pre-existing NFS server. You have two options. Create NFS server kubectl create -f ./dev-support/nfs-server/nfs-server.yaml It will create a nfs-server pod in kubernetes cluster, and expose nfs-server ip at 10.96.0.2Use your own NFS server Install NFS dependencies in your nodes Ubuntu apt-get install -y nfs-common CentOS yum install nfs-util Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable NFS storage, set .storage.type to nfs.To set the ip for NFS server, set .storage.nfs.ip to &lt;any-ip&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: nfs nfs: ip: 10.96.0.2  "},{"title":"Access to Submarine Server​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#access-to-submarine-server","content":"Submarine server by default expose 8080 port within K8s cluster. After Submarine v0.5 uses Traefik as reverse-proxy by default. If you don't want to use Traefik, you can modify below value to false in ./helm-charts/submarine/values.yaml. # Use Traefik by default traefik: enabled: true  To access the server from outside of the cluster, we use Traefik ingress controller and NodePort for external access.\\ Please refer to ./helm-charts/submarine/charts/traefik/values.yaml and Traefik docsfor more details if you want to customize the default value for Traefik. Notice:If you use kind to run local Kubernetes cluster, please refer to this docsand set the configuration &quot;extraPortMappings&quot; when creating the k8s cluster. kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraPortMappings: - containerPort: 32080 hostPort: [the port you want to access]  # Use nodePort and Traefik ingress controller by default. # To access the submarine server, open the following URL in your browser. http://127.0.0.1:32080  If minikube is installed, use the following command to find the URL to the Submarine server. $ minikube service submarine-traefik --url  "},{"title":"Kubernetes Dashboard (optional)​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#kubernetes-dashboard-optional","content":""},{"title":"Deploy​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#deploy","content":"To deploy Dashboard, execute the following command: kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml  "},{"title":"Create RBAC​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#create-rbac","content":"Run the following commands to grant the cluster access permission of dashboard: kubectl create serviceaccount dashboard-admin-sa kubectl create clusterrolebinding dashboard-admin-sa --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin-sa  "},{"title":"Get access token (optional)​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#get-access-token-optional","content":"If you want to use the token to login the dashboard, run the following commands to get key: kubectl get secrets # select the right dashboard-admin-sa-token to describe the secret kubectl describe secret dashboard-admin-sa-token-6nhkx  "},{"title":"Start dashboard service​","type":1,"pageTitle":"Custom Configuation","url":"docs/next/gettingStarted/helm#start-dashboard-service","content":"kubectl proxy  Now access Dashboard at: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ Dashboard screenshot:  "},{"title":"Jupyter Notebook","type":0,"sectionRef":"#","url":"docs/next/gettingStarted/notebook","content":"","keywords":""},{"title":"Working with notebooks​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/next/gettingStarted/notebook#working-with-notebooks","content":"We recommend using Web UI to manage notebooks. "},{"title":"Notebooks Web UI​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/next/gettingStarted/notebook#notebooks-web-ui","content":"Notebooks can be started from the Web UI. You can click the “Notebook” tab in the left-hand panel to manage your notebooks.  To create a new notebook server, click “New Notebook”. You should see a form for entering details of your new notebook server. Notebook Name : Name of the notebook server. It should follow the rules below. Contain at most 63 characters.Contain only lowercase alphanumeric characters or '-'.Start with an alphabetic character.End with an alphanumeric character. Environment : It defines a set of libraries and docker image.CPU and MemoryGPU (optional)EnvVar (optional) : Injects environment variables into the notebook. If you want to use notebook-gpu-env, you should set up the gpu environment in your kubernetes. You can install NVIDIA/k8s-device-plugin. The list of prerequisites for running the NVIDIA device plugin is described below NVIDIA drivers ~= 384.81nvidia-docker version &gt; 2.0docker configured with nvidia as the default runtimeKubernetes version &gt;= 1.10 If you’re not sure which environment you need, please choose the environment “notebook-env” for the new notebook.  You should see your new notebook server. Click the name of your notebook server to connect to it.  "},{"title":"Experiment with your notebook​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/next/gettingStarted/notebook#experiment-with-your-notebook","content":"The environment “notebook-env” includes Submarine Python SDK which can talk to Submarine Server to create experiments, as the example below: from __future__ import print_function import submarine from submarine.client.models.environment_spec import EnvironmentSpec from submarine.client.models.experiment_spec import ExperimentSpec from submarine.client.models.experiment_task_spec import ExperimentTaskSpec from submarine.client.models.experiment_meta import ExperimentMeta from submarine.client.models.code_spec import CodeSpec # Create Submarine Client submarine_client = submarine.ExperimentClient() # Define TensorFlow experiment spec environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0') experiment_meta = ExperimentMeta(name='mnist-dist', namespace='default', framework='Tensorflow', cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100', env_vars={'ENV1': 'ENV1'}) worker_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) ps_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) code_spec = CodeSpec(sync_mode='git', url='https://github.com/apache/submarine.git') experiment_spec = ExperimentSpec(meta=experiment_meta, environment=environment, code=code_spec, spec={'Ps' : ps_spec,'Worker': worker_spec}) # Create experiment experiment = submarine_client.create_experiment(experiment_spec=experiment_spec)  You can create a new notebook, paste the above code and run it. Or, you can find the notebook submarine_experiment_sdk.ipynb inside the launched notebook session. You can open it, try it out. After experiment submitted to Submarine server, you can find the experiment jobs on the UI. "},{"title":"Submarine Python SDK","type":0,"sectionRef":"#","url":"docs/next/gettingStarted/python-sdk","content":"","keywords":""},{"title":"Prepare Python Environment to run Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#prepare-python-environment-to-run-submarine-sdk","content":"Submarine SDK requires Python3.7+. It's better to use a new Python environment created by Anoconda or Python virtualenv to try this to avoid trouble to existing Python environment. A sample Python virtual env can be setup like this: wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz # Make sure to install using Python 3 python3 virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate  "},{"title":"Install Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#install-submarine-sdk","content":""},{"title":"Install SDK from pypi.org (recommended)​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#install-sdk-from-pypiorg-recommended","content":"Starting from 0.4.0, Submarine provides Python SDK. Please change it to a proper version needed. More detail: https://pypi.org/project/apache-submarine/ # Install latest stable version pip install apache-submarine # Install specific version pip install apache-submarine==&lt;REPLACE_VERSION&gt;  "},{"title":"Install SDK from source code​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#install-sdk-from-source-code","content":"Please first clone code from github or go to http://submarine.apache.org/download.html to download released source code. git clone https://github.com/apache/submarine.git # (optional) chackout specific branch or release git checkout &lt;correct release tag/branch&gt; cd submarine/submarine-sdk/pysubmarine pip install .  "},{"title":"Manage Submarine Experiment​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#manage-submarine-experiment","content":"Assuming you've installed submarine on K8s and forward the traefik service to localhost, now you can open a Python shell, Jupyter notebook or any tools with Submarine SDK installed. Follow SDK experiment example to run an experiment. "},{"title":"Training a DeepFM model​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/next/gettingStarted/python-sdk#training-a-deepfm-model","content":"The Submarine also supports users to train an easy-to-use CTR model with a few lines of code and a configuration file, so they don’t need to reimplement the model by themself. In addition, they can train the model on both local on distributed systems, such as Hadoop or Kubernetes. Follow SDK DeepFM example to try the model. "},{"title":"MLflow UI","type":0,"sectionRef":"#","url":"docs/next/userDocs/others/mlflow","content":"","keywords":""},{"title":"Usage​","type":1,"pageTitle":"MLflow UI","url":"docs/next/userDocs/others/mlflow#usage","content":"MLflow UI shows the tracking result of the experiments. When we use the log_param or log_metric in ModelClient API, we could view the result in MLflow UI. Below is the example of the usage of MLflow UI. "},{"title":"Example​","type":1,"pageTitle":"MLflow UI","url":"docs/next/userDocs/others/mlflow#example","content":"Run the following code in the cluster from submarine import ModelsClient import random import time if __name__ == &quot;__main__&quot;: modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_param(&quot;learning_rate&quot;, random.random()) for i in range(100): time.sleep(1) modelClient.log_metric(&quot;mse&quot;, random.random() * 100, i) modelClient.log_metric(&quot;acc&quot;, random.random(), i)  In the MLflow UI page, you can see the log_param and the log_metric result. You can also compare the training between different workers.  "},{"title":"Tensorboard","type":0,"sectionRef":"#","url":"docs/next/userDocs/others/tensorboard","content":"","keywords":""},{"title":"Write to LogDirs by the environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/next/userDocs/others/tensorboard#write-to-logdirs-by-the-environment-variable","content":""},{"title":"Environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/next/userDocs/others/tensorboard#environment-variable","content":"SUBMARINE_TENSORBOARD_LOG_DIR: Exist in every experiment container. You just need to direct your logs to $(SUBMARINE_TENSORBOARD_LOG_DIR) (NOTICE: it is () not {}), and you can inspect the process on the tensorboard webpage. "},{"title":"Example​","type":1,"pageTitle":"Tensorboard","url":"docs/next/userDocs/others/tensorboard#example","content":"{ &quot;meta&quot;: { &quot;name&quot;: &quot;tensorflow-tensorboard-dist-mnist&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=$(SUBMARINE_TENSORBOARD_LOG_DIR) --learning_rate=0.01 --batch_size=20&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=512M&quot; } } }  "},{"title":"Connect to the tensorboard webpage​","type":1,"pageTitle":"Tensorboard","url":"docs/next/userDocs/others/tensorboard#connect-to-the-tensorboard-webpage","content":"Open the experiment page in the workbench, and Click the TensorBoard button.  Inspect the process on tensorboard page.  "},{"title":"Quickstart","type":0,"sectionRef":"#","url":"docs/next/gettingStarted/quickstart","content":"","keywords":""},{"title":"Installation​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#installation","content":""},{"title":"Prepare a Kubernetes cluster​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#prepare-a-kubernetes-cluster","content":"Prerequisite Check dependency page for the compatible versionkubectlhelm (Helm v3 is minimum requirement.)minikube.istioctl Start minikube cluster and install Istio Start minikube # You can go to https://minikube.sigs.k8s.io/docs/start/ and follow the tutorial to install minikube. # Then you can start kubernetes with minikube: minikube start --vm-driver=docker --cpus 8 --memory 8192 --kubernetes-version v1.24.12 # Or if you want to support Pod Security Policy (https://minikube.sigs.k8s.io/docs/tutorials/using_psp), you can use the following command to start cluster minikube start --extra-config=apiserver.enable-admission-plugins=PodSecurityPolicy --addons=pod-security-policy --vm-driver=docker --cpus 8 --memory 8192 --kubernetes-version v1.24.12  Install Istio, there are two ways to install: Command-Istioctl-based, or Helm-based # You can go to the https://github.com/istio/istio/releases/ to download the istioctl for your k8s version # e.g. we can execute the following command to download the istio version adapted to k8s 1.24.12 # wget https://github.com/istio/istio/releases/download/1.17.1/istio-1.17.1-linux-amd64.tar.gz istioctl install -y # Alternatively, you can use istio's helm to install # This is the link: https://istio.io/latest/docs/setup/install/helm/ ## Add istio repo helm repo add istio https://istio-release.storage.googleapis.com/charts helm repo update ## Create istio-system namespace kubectl create namespace istio-system ## Install istio resources helm install istio-base istio/base -n istio-system helm install istiod istio/istiod -n istio-system helm install istio-ingressgateway istio/gateway -n istio-system  "},{"title":"Launch submarine in the cluster​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#launch-submarine-in-the-cluster","content":"Clone the project git clone https://github.com/apache/submarine.git cd submarine  Create necessary namespaces kubectl create namespace submarine kubectl create namespace submarine-user-test kubectl label namespace submarine istio-injection=enabled kubectl label namespace submarine-user-test istio-injection=enabled  Install the submarine operator and dependencies by helm chart # Update helm dependency. helm dependency update ./helm-charts/submarine # Install submarine operator in namespace submarine. helm install submarine ./helm-charts/submarine --set seldon-core-operator.istio.gateway=submarine/seldon-gateway -n submarine  Create a Submarine custom resource and the operator will create the submarine server, database, etc. for us. kubectl apply -f submarine-cloud-v3/config/samples/_v1_submarine.yaml -n submarine-user-test  "},{"title":"Ensure submarine is ready​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#ensure-submarine-is-ready","content":"$ kubectl get pods -n submarine NAME READY STATUS RESTARTS AGE notebook-controller-deployment-66d85984bf-x562z 1/1 Running 0 7h7m training-operator-6dcd5b9c64-nxwr2 1/1 Running 0 7h7m submarine-operator-9cb7bc84d-brddz 1/1 Running 0 7h7m $ kubectl get pods -n submarine-user-test NAME READY STATUS RESTARTS AGE submarine-database-0 1/1 Running 0 7h6m submarine-minio-686b8777ff-zg4d2 2/2 Running 0 7h6m submarine-mlflow-68c5559dcb-lkq4g 2/2 Running 0 7h6m submarine-server-7c6d7bcfd8-5p42w 2/2 Running 0 9m33s submarine-tensorboard-57c5b64778-t4lww 2/2 Running 0 7h6m  "},{"title":"Connect to workbench​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#connect-to-workbench","content":"Exposing service kubectl port-forward --address 0.0.0.0 -n istio-system service/istio-ingressgateway 32080:80  View workbench Go to http://0.0.0.0:32080 "},{"title":"Example: Submit a mnist distributed example​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#example-submit-a-mnist-distributed-example","content":"We put the code of this example here. train.py is our training script, and build.sh is the script to build a docker image. "},{"title":"1. Write a python script for distributed training​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#1-write-a-python-script-for-distributed-training","content":"Take a simple mnist tensorflow script as an example. We choose MultiWorkerMirroredStrategy as our distributed strategy. &quot;&quot;&quot; ./dev-support/examples/quickstart/train.py Reference: https://github.com/kubeflow/training-operator/blob/master/examples/tensorflow/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py &quot;&quot;&quot; import tensorflow as tf import tensorflow_datasets as tfds from packaging.version import Version from tensorflow.keras import layers, models import submarine def make_datasets_unbatched(): BUFFER_SIZE = 10000 # Scaling MNIST data from (0, 255] to (0., 1.] def scale(image, label): image = tf.cast(image, tf.float32) image /= 255 return image, label # If we use tensorflow_datasets &gt; 3.1.0, we need to disable GCS # https://github.com/tensorflow/datasets/issues/2761#issuecomment-1187413141 if Version(tfds.__version__) &gt; Version(&quot;3.1.0&quot;): tfds.core.utils.gcs_utils._is_gcs_disabled = True datasets, _ = tfds.load(name=&quot;mnist&quot;, with_info=True, as_supervised=True) return datasets[&quot;train&quot;].map(scale).cache().shuffle(BUFFER_SIZE) def build_and_compile_cnn_model(): model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation=&quot;relu&quot;, input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation=&quot;relu&quot;)) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation=&quot;relu&quot;)) model.add(layers.Flatten()) model.add(layers.Dense(64, activation=&quot;relu&quot;)) model.add(layers.Dense(10, activation=&quot;softmax&quot;)) model.summary() model.compile(optimizer=&quot;adam&quot;, loss=&quot;sparse_categorical_crossentropy&quot;, metrics=[&quot;accuracy&quot;]) return model def main(): strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy( communication=tf.distribute.experimental.CollectiveCommunication.AUTO ) BATCH_SIZE_PER_REPLICA = 4 BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync with strategy.scope(): ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat() options = tf.data.Options() options.experimental_distribute.auto_shard_policy = ( tf.data.experimental.AutoShardPolicy.DATA ) ds_train = ds_train.with_options(options) # Model building/compiling need to be within `strategy.scope()`. multi_worker_model = build_and_compile_cnn_model() class MyCallback(tf.keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): # monitor the loss and accuracy print(logs) submarine.log_metric(&quot;loss&quot;, logs[&quot;loss&quot;], epoch) submarine.log_metric(&quot;accuracy&quot;, logs[&quot;accuracy&quot;], epoch) multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70, callbacks=[MyCallback()]) # save model submarine.save_model(multi_worker_model, &quot;tensorflow&quot;) if __name__ == &quot;__main__&quot;: main()  "},{"title":"2. Prepare an environment compatible with the training​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#2-prepare-an-environment-compatible-with-the-training","content":"Build a docker image equipped with the requirement of the environment. eval $(minikube docker-env) ./dev-support/examples/quickstart/build.sh  "},{"title":"3. Submit the experiment​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#3-submit-the-experiment","content":"Open submarine workbench and click + New Experiment Choose Define your experiment Fill the form accordingly. Here we set 3 workers. Step 1Step 2Step 3The experiment is successfully submitted In the meantime, we have built this image in docker hub and you can run this experiment directly if you choose quickstart in From predefined experiment library. "},{"title":"4. Monitor the process​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#4-monitor-the-process","content":"In our code, we use submarine from submarine-sdk to record the metrics. To see the result, click corresponding experiment with name mnist-example in the workbench.To see the metrics of each worker, you can select a worker from the left top list.  "},{"title":"5. Serve the model​","type":1,"pageTitle":"Quickstart","url":"docs/next/gettingStarted/quickstart#5-serve-the-model","content":"Before serving, we need to register a new model.  And then, check the output model in experiment page.  Click the button and register the model.  Go to the model page and deploy our model for serving.  We can run the following commands to get the VirtualService and Endpoint that use istio for external port forward or ingress. ## get VirtualService with your model name kubectl describe VirtualService -n submarine-user-test -l model-name=tf-mnist Name: submarine-model-1-2508dd65692740b18ff5c6c6c162b863 Namespace: submarine-user-test Labels: model-id=2508dd65692740b18ff5c6c6c162b863 model-name=tf-mnist model-version=1 Annotations: &lt;none&gt; API Version: networking.istio.io/v1beta1 Kind: VirtualService Metadata: Creation Timestamp: 2022-09-18T05:26:38Z Generation: 1 Managed Fields: ... Spec: Gateways: submarine/seldon-gateway Hosts: * Http: Match: Uri: Prefix: /seldon/submarine-user-test/1/1/ Rewrite: Uri: / Route: Destination: Host: submarine-model-1-2508dd65692740b18ff5c6c6c162b863 Port: Number: 8000 Events: &lt;none&gt;  To confirm that the serving endpoint is available, try using the swagger address to confirm the availability of the interface. In our example, the address of the swagger is: http://localhost:32080/seldon/submarine-user-test/1/1/api/v1.0/doc/ More details can be found in the official seldon documentation: https://docs.seldon.io/projects/seldon-core/en/latest/workflow/serving.html#generated-documentation-swagger-ui After successfully serving the model, we can test the results of serving using the test python code serve_predictions.py  "},{"title":"Submarine-SDK","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/","content":"","keywords":""},{"title":"Summary​","type":1,"pageTitle":"Submarine-SDK","url":"docs/next/userDocs/submarine-sdk/#summary","content":"Support Python, Scala, R language for algorithm development Support tracking/metrics APIs which allows developers add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Python SDK Development","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#prerequisites","content":"This is required for developing &amp; testing changes, we recommend installing pysubmarine in its own conda environment by running the following conda create --name submarine-dev python=3.7 conda activate submarine-dev # Install auto-format and lints from current checkout pip install -r ./dev-support/style-check/python/lint-requirements.txt # Install mypy from current checkout pip install -r ./dev-support/style-check/python/mypy-requirements.txt # test-requirements.txt from current checkout pip install -r ./submarine-sdk/pysubmarine/github-actions/test-requirements.txt # Installs pysubmarine from current checkout pip install -e ./submarine-sdk/pysubmarine  "},{"title":"PySubmarine Docker​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#pysubmarine-docker","content":"We also use docker to provide build environments for CI, development, generate python sdk from swagger. ./run-pysubmarine-ci.sh  The script does the following things: Start an interactive bash sessionMount submarine directory to /workspace and set it as homeSwitch user to be the same user that calls the run-pysubmarine-ci.sh "},{"title":"Coding Style​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#coding-style","content":"Use isort to sort the Python imports and black to format Python codeBoth style is configured in pyproject.tomlTo autoformat code ./dev-support/style-check/python/auto-format.sh  Use flake8 to verify the linter, its' configure is in .flake8.Also, we are using mypy to check the static type in submarine-sdk/pysubmarine/submarine.Verify linter pass before submitting a pull request by running: ./dev-support/style-check/python/lint.sh  If you encouter a unexpected format, use the following method # fmt: off &quot;Unexpected format, formated by yourself&quot; # fmt: on  "},{"title":"Unit Testing​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#unit-testing","content":"We are using pytest to develop our unit test suite. After building the project (see below) you can run its unit tests like so: cd submarine-sdk/pysubmarine  Run unit test pytest --cov=submarine -vs -m &quot;not e2e&quot;  Run integration test pytest --cov=submarine -vs -m &quot;e2e&quot;  Before run this command in local, you should make sure the submarine server is running. "},{"title":"Generate python SDK from swagger​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#generate-python-sdk-from-swagger","content":"We use open-api generatorto generate pysubmarine client API that used to communicate with submarine server. To generate different API Component, please change the code in Bootstrap.java. If just updating java code for NotebookRestApi , ExperimentRestApi or EnvironmentRestApi, please skip step 1. SwaggerConfiguration oasConfig = new SwaggerConfiguration() .openAPI(oas) .resourcePackages(Stream.of(&quot;org.apache.submarine.server.rest&quot;) .collect(Collectors.toSet())) .resourceClasses(Stream.of(&quot;org.apache.submarine.server.rest.NotebookRestApi&quot;, &quot;org.apache.submarine.server.rest.ExperimentRestApi&quot;, &quot;org.apache.submarine.server.rest.EnvironmentRestApi&quot;) .collect(Collectors.toSet())); After starting the server, http://localhost:8080/v1/openapi.json will includes API specs for NotebookRestApi, ExperimentRestApi and EnvironmentRestApi swagger_config.json defines the import path for python SDK Ex: For submarine.client { &quot;packageName&quot; : &quot;submarine.client&quot;, &quot;projectName&quot; : &quot;submarine.client&quot;, &quot;packageVersion&quot;: &quot;0.8.0-SNAPSHOT&quot; } Usage: import submarine.client... Execute ./dev-support/pysubmarine/gen-sdk.sh to generate latest version of SDK. Notice: Please install required package before running the script: lint-requirements.txt In submarine/submarine-sdk/pysubmarine/client/api_client.py line 74 Please change &quot;long&quot;: int if six.PY3 else long, # noqa: F821 to &quot;long&quot;: int,  "},{"title":"Model Management Model Development​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#model-management-model-development","content":"For local development, we can access cluster's service easily thanks to telepresence. To elaborate, we can develop the sdk in local but can reach out to database and minio server by proxy. Install telepresence follow the instruction.Start proxy pod telepresence --new-deployment submarine-dev  You can develop as if in the cluster. "},{"title":"Upload package to PyPi​","type":1,"pageTitle":"Python SDK Development","url":"docs/next/userDocs/submarine-sdk/pysubmarine/development#upload-package-to-pypi","content":"For Apache Submarine committer and PMCs to do a new release. Change the version from 0.x.x-SNAPSHOT to 0.x.x in setup.pyInstall Python packages cd submarine-sdk/pysubmarine pip install -r github-actions/pypi-requirements.txt  Compiling Your Package It will create build, dist, and project.egg.infoin your local directory python setup.py bdist_wheel  Upload python package to TestPyPI for testing python -m twine upload --repository testpypi dist/*  Upload python package to PyPi python -m twine upload --repository-url https://upload.pypi.org/legacy/ dist/*  "},{"title":"Experiment Client","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/experiment-client","content":"","keywords":""},{"title":"class ExperimentClient()​","type":1,"pageTitle":"Experiment Client","url":"docs/next/userDocs/submarine-sdk/experiment-client#class-experimentclient","content":"Client of a submarine server that creates and manages experients and logs. create_experiment(experiment_spec) -&gt; dict​ Create an experiment. Param\tType\tDescription\tDefault Valueexperiment_spec\tDict\tSubmarine experiment spec. More detailed information can be found at Experiment API\tx Returns The detailed info about the submarine experiment. Example from submarine import * client = ExperimentClient() client.create_experiment({ &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })   patch_experiment(id, experiment_spec) -&gt; dict​ Patch an experiment. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx experiment_spec\tDict\tSubmarine experiment spec. More detailed information of Submarine experiment spec can be found at Experiment API.\tx Returns The detailed info about the submarine experiment. Example client.patch_experiment(&quot;experiment_1626160071451_0008&quot;, { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })   get_experiment(id) -&gt; dict​ Get the experiment's detailed info by id. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx Returns The detailed info about the submarine experiment. Example experiment = client.get_experiment(&quot;experiment_1626160071451_0008&quot;)   list_experiments(status) -&gt; list[dict]​ List all experiment for the user. Param\tType\tDescription\tDefault Valuestatus\tOptional[str]\tAccepted, Created, Running, Succeeded, Deleted.\tNone Returns List of submarine experiments. Example experiments = client.list_experiments()   delete_experiment(id) -&gt; dict​ Delete the submarine experiment. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx Returns The detailed info about the deleted submarine experiment. Example client.delete_experiment(&quot;experiment_1626160071451_0008&quot;)   get_log(id, onlyMaster)​ Print training logs of all pod of the experiment. By default print all the logs of Pod. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx onlyMaster\tOptional[bool]\tBy default include pod log of &quot;master&quot; which might be Tensorflow PS/Chief or PyTorch master.\tx Return The info of pod logs Example client.get_log(&quot;experiment_1626160071451_0009&quot;)   list_log(status)​ List experiment log. Param\tType\tDescription\tDefault Valuestatus\tString\tAccepted, Created, Running, Succeeded, Deleted.\tx Returns List of submarine experiment logs. Example logs = client.list_log(&quot;Succeeded&quot;)   wait_for_finish(id, polling_interval)​ Waits until the experiment is finished or failed. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx polling_interval\tOptional[int]\tHow many seconds between two polls for the status of the experiment.\t10 Returns Submarine experiment logs. Example logs = client.wait_for_finish(&quot;experiment_1626160071451_0009&quot;, 5)   "},{"title":"Submarine CLI","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/submarine-cli","content":"","keywords":""},{"title":"Config​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#config","content":"You can set your CLI settings by this command "},{"title":"Init​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#init","content":"submarine config init  Return Submarine CLI Config initialized  Restore CLI config to default (hostname=localhost,port=32080) "},{"title":"Show current config​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#show-current-config","content":"submarine config list  For example : return ╭──────────────────── SubmarineCliConfig ─────────────────────╮ │ { │ │ &quot;connection&quot;: { │ │ &quot;hostname&quot;: &quot;localhost&quot;, │ │ &quot;port&quot;: 32080 │ │ } │ │ } │ ╰─────────────────────────────────────────────────────────────╯  "},{"title":"Set config​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#set-config","content":"submarine config set &lt;parameter_path&gt; &lt;value&gt;  For example, Set connection port to 8080: submarine config set connection.port 8080  "},{"title":"Get config​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#get-config","content":"submarine config get &lt;parameter_path&gt;  For example, submarine config get connection.port  Return connection.port=8080  "},{"title":"Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#notebooks","content":""},{"title":"List Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#list-notebooks","content":"submarine list notebook  "},{"title":"Get Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#get-notebooks","content":"submarine get notebook &lt;notebook id&gt;  you can get notebook id by using list command "},{"title":"Delete Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#delete-notebooks","content":"submarine delete notebook &lt;notebook id&gt;  "},{"title":"Experiments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#experiments","content":""},{"title":"List Experiments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#list-experiments","content":"submarine list experiment  "},{"title":"Get Experiment​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#get-experiment","content":"submarine get experiment &lt;experiment id&gt;  you can get experiment id by using list command "},{"title":"Delete Experiment​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#delete-experiment","content":"submarine delete experiment &lt;experiment id&gt; [--wait/--no-wait]  --wait/--no-wait: blocking or non blocking (default no wait) "},{"title":"Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#environments","content":""},{"title":"List Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#list-environments","content":"submarine list environment  "},{"title":"Get Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#get-environments","content":"submarine get environment &lt;environment name&gt;  "},{"title":"Delete Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/next/userDocs/submarine-sdk/submarine-cli#delete-environments","content":"submarine delete experiment &lt;environment name&gt;  "},{"title":"Submarine Client","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/submarine-client","content":"","keywords":""},{"title":"class SubmarineClient()​","type":1,"pageTitle":"Submarine Client","url":"docs/next/userDocs/submarine-sdk/submarine-client#class-submarineclient","content":"Client of submarine to log metric/param, save model and create/delete serve. log_metric(job_id, key, value, worker_index, timestamp, step) -&gt; None​ Log a single key-value metric with job id and worker index. The value must always be a number. Param\tType\tDescription\tDefault Valuejob_id\tString\tThe job name to which the metric should be logged.\tx key\tString\tMetric name.\tx value\tFloat\tMetric worker_index.\tx worker_index\tString\tParameter worker_index.\tx timestamp\tDatetime\tTime when this metric was calculated. Defaults to the current system time.\tdatetime.now() step\tInteger\tA single integer step at which to log the specified Metrics, by default it's 0.\t0  log_param(job_id, key, value, worker_index) -&gt; None​ Log a single key-value parameter with job id and worker index. The key and value are both strings. Param\tType\tDescription\tDefault Valuejob_id\tString\tThe job name to which the parameter should be logged.\tx key\tString\tParameter name.\tx value\tString\tParameter value.\tx worker_index\tString\tParameter worker_index.\tx  save_model(model, model_type, registered_model_name, input_dim, output_dim) -&gt; None​ Save a model into the minio pod. Param\tType\tDescription\tDefault Valuemodel\tObject\tModel artifact.\tx model_type\tString\tVersion of a registered model.\tx registered_model_name\tString\tIf it is not None, the model will be registered into the model registry with this name.\tNone input_dim\tList&lt;String&gt;\tThe input dimension of the model.\tNone output_dim\tList&lt;String&gt;\tThe output dimension of the model.\tNone  create_serve(self, model_name, model_version, async_req = True) -&gt; dict​ Create serve of a model through Seldon Core. Param\tType\tDescription\tDefault Valuemodel_name\tString\tName of a registered model.\tx model_version\tInteger\tVersion of a registered model.\tx async_req\tBoolean\tExecute request asynchronously.\tTrue  ReturnsReturn a dictionary with inference url. delete_serve(self, model_name, model_version, async_req) -&gt; None​ Delete a serving model. Param\tType\tDescription\tDefault Valuemodel_name\tString\tName of a registered model.\tx model_version\tInteger\tVersion of a registered model.\tx async_req\tBoolean\tExecute request asynchronously.\tTrue "},{"title":"Tracking","type":0,"sectionRef":"#","url":"docs/next/userDocs/submarine-sdk/tracking","content":"","keywords":""},{"title":"Functional api​","type":1,"pageTitle":"Tracking","url":"docs/next/userDocs/submarine-sdk/tracking#functional-api","content":"submarine.get_tracking_uri() -&gt; str​ Get the tracking URI. If none has been specified, check the environmental variables. If uri is still none, return the default submarine jdbc url. Returns The tracking URI.  submarine.set_tracking_uri(uri) -&gt; None​ set the tracking URI. You can also set the SUBMARINE_TRACKING_URI environment variable to have Submarine find a URI from there. The URI should be database connection string. Param\tType\tDescription\tDefault Valueuri\tString\tSubmarine record data to Mysql server. The database URL is expected in the format &lt;dialect&gt;+&lt;driver&gt;://&lt;username&gt;:&lt;password&gt;@&lt;host&gt;:&lt;port&gt;/&lt;database&gt;.By default it's mysql+pymysql://submarine:password@submarine-database:3306/submarine. More detail : SQLAlchemy docs\tx  submarine.log_param(key: str, value: str) -&gt; None​ log a single key-value parameter. The key and value are both strings. Param\tType\tDescription\tDefault Valuekey\tString\tParameter name.\tx value\tString\tParameter value.\tx  submarine.log_metric(key, value, step=0) -&gt; None​ log a single key-value metric. The value must always be a number. Param\tType\tDescription\tDefault Valuekey\tString\tMetric name.\tx value\tFloat\tMetric value.\tx step\tInteger\tA single integer step at which to log the specified Metrics.\t0  submarine.save_model(model_type, model, registered_model_name, input_dim, output_dim) -&gt; None​ Save a model into the minio pod. Param\tType\tDescription\tDefault Valuemodel_type\tString\tThe type of model. Only support pytorch and tensorflow.\tx model\tObject\tModel artifact.\tx registered_model_name\tString\tIf it is not None, the model will be registered into the model registry with this name.\tNone input_dim\tList&lt;Integer&gt;\tThe input dimension of the model.\tNone output_dim\tList&lt;Integer&gt;\tThe output dimension of the model.\tNone  "},{"title":"Environment REST API","type":0,"sectionRef":"#","url":"docs/api/environment","content":"","keywords":""},{"title":"Create Environment​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#create-environment","content":"POST /api/v1/environment "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#parameters","content":"Put EnvironmentSpec in request body. EnvironmentSpec​ Field Name\tType\tDescription\tRequiredname\tString\tEnvironment name.\to dockerImage\tString\tDocker image name.\to kernelSpec\tKernelSpec\tEnvironment spec.\to description\tString\tDescription of environment.\tx KernelSpec​ Field Name\tType\tDescription\tRequiredname\tString\tKernel name.\to channels\tList&lt;String&gt;\tNames of the channels.\to condaDependencies\tList&lt;String&gt;\tList of kernel conda dependencies.\to pipDependencies\tList&lt;String&gt;\tList of kernel pip dependencies.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.7.0&quot;, &quot;pyarrow==0.17.0&quot;] } } ' http://127.0.0.1:32080/api/v1/environment  Example Response { &quot;status&quot;: &quot;OK&quot;, &quot;code&quot;: 200, &quot;result&quot;: { &quot;environmentId&quot;: &quot;environment_1586156073228_0001&quot;, &quot;environmentSpec&quot;: { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;, &quot;anaconda=2020.02=py37_0&quot;, &quot;anaconda-client=1.7.2=py37_0&quot;, &quot;anaconda-navigator=1.9.12=py37_0&quot;], &quot;pipDependencies&quot; : [&quot;apache-submarine==0.7.0&quot;, &quot;pyarrow==0.17.0&quot;] } } } }  "},{"title":"List environment​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#list-environment","content":"GET /api/v1/environment "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/environment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;environmentId&quot;:&quot;environment_1600862964725_0002&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-gpu-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-gpu-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, { &quot;environmentId&quot;:&quot;environment_1600862964725_0001&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } } ], &quot;attributes&quot;:{} }  "},{"title":"Get environment​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#get-environment","content":"GET /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tEnvironment name.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch environment​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#patch-environment","content":"PATCH /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tEnvironment name.\to dockerImage\tString\tbody\tDocker image name.\to kernelSpec\tKernelSpec\tbody\tEnvironment spec.\to description\tString\tbody\tDescription of environment. This field is optional.\tx "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-submarine-env&quot;, &quot;dockerImage&quot; : &quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot; : { &quot;name&quot; : &quot;team_default_python_3.7_updated&quot;, &quot;channels&quot; : [&quot;defaults&quot;], &quot;condaDependencies&quot; : [&quot;_ipyw_jlab_nb_ext_conf=0.1.0=py37_0&quot;, &quot;alabaster=0.7.12=py37_0&quot;], &quot;pipDependencies&quot; : [] } } ' http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0004&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7_updated&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  note dockerImage, &quot;name&quot; (of kernelSpec), &quot;channels&quot;, &quot;condaDependencies&quot;, &quot;pipDependencies&quot; etc can be updated using this API. &quot;name&quot; of environmentSpec is not supported. "},{"title":"Delete environment​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#delete-environment","content":"GET /api/v1/environment/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tEnvironment name.\to "},{"title":"Example​","type":1,"pageTitle":"Environment REST API","url":"docs/api/environment#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/environment/my-submarine-env  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;environmentId&quot;:&quot;environment_1647192232698_0003&quot;, &quot;environmentSpec&quot;:{ &quot;name&quot;:&quot;my-submarine-env&quot;, &quot;dockerImage&quot;:&quot;continuumio/anaconda3&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;team_default_python_3.7&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[ &quot;_ipyw_jlab_nb_ext_conf\\u003d0.1.0\\u003dpy37_0&quot;, &quot;alabaster\\u003d0.7.12\\u003dpy37_0&quot;, &quot;anaconda\\u003d2020.02\\u003dpy37_0&quot;, &quot;anaconda-client\\u003d1.7.2\\u003dpy37_0&quot;, &quot;anaconda-navigator\\u003d1.9.12\\u003dpy37_0&quot; ], &quot;pipDependencies&quot;:[ &quot;apache-submarine\\u003d\\u003d0.7.0&quot;, &quot;pyarrow\\u003d\\u003d0.17.0&quot; ] }, &quot;description&quot;:null, &quot;image&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Experiment REST API","type":0,"sectionRef":"#","url":"docs/api/experiment","content":"","keywords":""},{"title":"Create Experiment (Using Anonymous/Embedded Environment)​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#create-experiment-using-anonymousembedded-environment","content":"POST /api/v1/experiment "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#parameters","content":"Put ExperimentSpec in request body. ExperimentSpec​ Field Name\tType\tDescription\tRequiredmeta\tExperimentMeta\tMeta data of the experiment template.\to environment\tEnvironmentSpec\tEnvironment of the experiment template.\to spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods.\to code\tCodeSpec\tExperiment codespec.\tx ExperimentMeta​ Field Name\tType\tDescription\tRequiredname\tString\tExperiment name.\to namespace\tString\tExperiment namespace.\to framework\tString\tExperiemnt framework.\to cmd\tString\tCommand.\to envVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx EnvironmentSpec​ There are two types of environment: Anonymous and Predefined. Anonymous environment: only specify dockerImage in environment spec. The container will be built on the docker image.Embedded environment: specify name in environment spec. The container will be built on the existing environment (including dockerImage and kernalSpec). See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescription\tRequiredreplicas\tInteger\tNumbers of replicas.\to resoureces\tString\tResouces of the task\to name\tString\tTask name.\to image\tString\tImage name.\to cmd\tString\tCommand.\tx envVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx CodeSpec​ Currently only support pulling from github. HDFS, NFS and s3 are in development Field Name\tType\tDescription\tRequiredsyncMode\tString (git|hdfs|nfs|s3)\tsync mode of code spec.\to url\tString\turl of code spec.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647192232698-0001&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2022-03-14T16:03:10.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647192232698-0001&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#list-experiment","content":"GET /api/v1/experiment "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;cf465781-6310-46d2-92b4-d20161c77d08&quot;, &quot;status&quot;:&quot;Running&quot;, &quot;acceptedTime&quot;:&quot;2022-03-18T15:51:04.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-03-18T15:51:05.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-03-18T15:51:17.000+08:00&quot;, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } ], &quot;attributes&quot;:{} }  "},{"title":"Get experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#get-experiment","content":"GET /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;cf465781-6310-46d2-92b4-d20161c77d08&quot;, &quot;status&quot;:&quot;Running&quot;, &quot;acceptedTime&quot;:&quot;2022-03-18T15:51:04.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-03-18T15:51:05.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-03-18T15:51:17.000+08:00&quot;, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Patch experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#patch-experiment","content":"PATCH /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to meta\tExperimentMeta\tbody\tMeta data of the experiment template.\to environment\tEnvironmentSpec\tbody\tEnvironment of the experiment template.\to spec\tMap&lt;String, ExperimentTaskSpec&gt;\tbody\tSpec of pods.\to code\tCodeSpec\tbody\tTODO\tx "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=2048M&quot; } } } ' http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Succeeded&quot;, &quot;acceptedTime&quot;:&quot;2022-04-04T16:39:25.000+08:00&quot;, &quot;createdTime&quot;:&quot;2022-04-04T16:39:26.000+08:00&quot;, &quot;runningTime&quot;:&quot;2022-04-04T16:39:35.000+08:00&quot;, &quot;finishedTime&quot;:&quot;2022-04-04T16:42:25.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1649061491590-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"Delete experiment​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#delete-experiment","content":"DELETE /api/v1/experiment/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/experiment/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;uid&quot;:&quot;b0ae271b-a01a-43ad-9877-4b8ecbc45de4&quot;, &quot;status&quot;:&quot;Deleted&quot;, &quot;acceptedTime&quot;:null, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;name&quot;:&quot;tf-mnist-json&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{ &quot;ENV_1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:2, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d2048M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;2048M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#list-experiment-log","content":"GET /api/v1/experiment/logs "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-5","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:[ { &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;logContent&quot;:[ { &quot;podName&quot;:&quot;experiment-1647574374688-0002-ps-0&quot;, &quot;podLog&quot;:[] }, { &quot;podName&quot;:&quot;experiment-1647574374688-0002-worker-0&quot;, &quot;podLog&quot;:[ ] } ] } ], &quot;attributes&quot;:{} }  "},{"title":"Get experiment Log​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#get-experiment-log","content":"GET /api/v1/experiment/logs/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tExperiment id.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment REST API","url":"docs/api/experiment#example-6","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs/experiment-1647574374688-0002  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment-1647574374688-0002&quot;, &quot;logContent&quot;:[ { &quot;podName&quot;:&quot;experiment-1647574374688-0002-ps-0&quot;, &quot;podLog&quot;:[ &quot;WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please write your own downloading logic.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use urllib or similar directly.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;2022-03-18 07:52:07.369276: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA&quot;, &quot;Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz&quot;, &quot;Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz&quot;, &quot;Accuracy at step 0: 0.0893&quot;, &quot;Accuracy at step 10: 0.6851&quot;, &quot;Accuracy at step 20: 0.8255&quot;, &quot;Accuracy at step 30: 0.8969&quot;, &quot;Accuracy at step 40: 0.9009&quot;, &quot;Accuracy at step 50: 0.9185&quot;, &quot;Accuracy at step 60: 0.923&quot;, &quot;Accuracy at step 70: 0.9181&quot;, &quot;Accuracy at step 80: 0.9344&quot;, &quot;Accuracy at step 90: 0.9265&quot;, &quot;Adding run metadata for 99&quot;, &quot;Accuracy at step 100: 0.9375&quot;, &quot;Accuracy at step 110: 0.9414&quot;, &quot;Accuracy at step 120: 0.9402&quot;, &quot;Accuracy at step 130: 0.9466&quot;, &quot;Accuracy at step 140: 0.9412&quot;, &quot;Accuracy at step 150: 0.9497&quot;, &quot;Accuracy at step 160: 0.9477&quot;, &quot;Accuracy at step 170: 0.9465&quot;, &quot;Accuracy at step 180: 0.9546&quot;, &quot;Accuracy at step 190: 0.9485&quot;, &quot;Adding run metadata for 199&quot;, &quot;Accuracy at step 200: 0.9534&quot;, &quot;Accuracy at step 210: 0.9581&quot;, &quot;Accuracy at step 220: 0.9418&quot;, &quot;Accuracy at step 230: 0.9551&quot;, &quot;Accuracy at step 240: 0.9472&quot;, &quot;Accuracy at step 250: 0.9555&quot;, &quot;Accuracy at step 260: 0.9569&quot;, &quot;Accuracy at step 270: 0.9596&quot;, &quot;Accuracy at step 280: 0.9588&quot;, &quot;Accuracy at step 290: 0.9618&quot;, &quot;Adding run metadata for 299&quot;, &quot;Accuracy at step 300: 0.9589&quot;, &quot;Accuracy at step 310: 0.9603&quot;, &quot;Accuracy at step 320: 0.9632&quot;, &quot;Accuracy at step 330: 0.956&quot;, &quot;Accuracy at step 340: 0.9531&quot;, &quot;Accuracy at step 350: 0.9535&quot;, &quot;Accuracy at step 360: 0.9517&quot;, &quot;Accuracy at step 370: 0.9607&quot;, &quot;Accuracy at step 380: 0.9629&quot;, &quot;Accuracy at step 390: 0.9553&quot;, &quot;Adding run metadata for 399&quot;, &quot;Accuracy at step 400: 0.9623&quot;, &quot;Accuracy at step 410: 0.9627&quot;, &quot;Accuracy at step 420: 0.9614&quot;, &quot;Accuracy at step 430: 0.9604&quot;, &quot;Accuracy at step 440: 0.9663&quot;, &quot;Accuracy at step 450: 0.9665&quot;, &quot;Accuracy at step 460: 0.958&quot;, &quot;Accuracy at step 470: 0.9643&quot;, &quot;Accuracy at step 480: 0.9636&quot;, &quot;Accuracy at step 490: 0.9648&quot;, &quot;Adding run metadata for 499&quot;, &quot;Accuracy at step 500: 0.9638&quot;, &quot;Accuracy at step 510: 0.9629&quot;, &quot;Accuracy at step 520: 0.9661&quot;, &quot;Accuracy at step 530: 0.9633&quot;, &quot;Accuracy at step 540: 0.9669&quot;, &quot;Accuracy at step 550: 0.9659&quot;, &quot;Accuracy at step 560: 0.9652&quot;, &quot;Accuracy at step 570: 0.9675&quot;, &quot;Accuracy at step 580: 0.9602&quot;, &quot;Accuracy at step 590: 0.9641&quot;, &quot;Adding run metadata for 599&quot;, &quot;Accuracy at step 600: 0.9688&quot;, &quot;Accuracy at step 610: 0.9638&quot;, &quot;Accuracy at step 620: 0.9622&quot;, &quot;Accuracy at step 630: 0.9601&quot;, &quot;Accuracy at step 640: 0.9636&quot;, &quot;Accuracy at step 650: 0.9674&quot;, &quot;Accuracy at step 660: 0.9613&quot;, &quot;Accuracy at step 670: 0.9706&quot;, &quot;Accuracy at step 680: 0.9691&quot;, &quot;Accuracy at step 690: 0.9687&quot;, &quot;Adding run metadata for 699&quot;, &quot;Accuracy at step 700: 0.9671&quot;, &quot;Accuracy at step 710: 0.9659&quot;, &quot;Accuracy at step 720: 0.9693&quot;, &quot;Accuracy at step 730: 0.9698&quot;, &quot;Accuracy at step 740: 0.9681&quot;, &quot;Accuracy at step 750: 0.9678&quot;, &quot;Accuracy at step 760: 0.9595&quot;, &quot;Accuracy at step 770: 0.9697&quot;, &quot;Accuracy at step 780: 0.9671&quot;, &quot;Accuracy at step 790: 0.9658&quot;, &quot;Adding run metadata for 799&quot;, &quot;Accuracy at step 800: 0.9658&quot;, &quot;Accuracy at step 810: 0.9702&quot;, &quot;Accuracy at step 820: 0.9662&quot;, &quot;Accuracy at step 830: 0.9671&quot;, &quot;Accuracy at step 840: 0.9731&quot;, &quot;Accuracy at step 850: 0.9699&quot;, &quot;Accuracy at step 860: 0.9702&quot;, &quot;Accuracy at step 870: 0.9686&quot;, &quot;Accuracy at step 880: 0.9729&quot;, &quot;Accuracy at step 890: 0.968&quot;, &quot;Adding run metadata for 899&quot;, &quot;Accuracy at step 900: 0.9655&quot;, &quot;Accuracy at step 910: 0.9731&quot;, &quot;Accuracy at step 920: 0.9676&quot;, &quot;Accuracy at step 930: 0.9667&quot;, &quot;Accuracy at step 940: 0.9659&quot;, &quot;Accuracy at step 950: 0.9689&quot;, &quot;Accuracy at step 960: 0.9653&quot;, &quot;Accuracy at step 970: 0.9675&quot;, &quot;Accuracy at step 980: 0.974&quot;, &quot;Accuracy at step 990: 0.9723&quot;, &quot;Adding run metadata for 999&quot; ] }, { &quot;podName&quot;:&quot;experiment-1647574374688-0002-worker-0&quot;, &quot;podLog&quot;:[ &quot;WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please write your own downloading logic.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use urllib or similar directly.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use tf.data to implement this functionality.&quot;, &quot;WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.&quot;, &quot;Instructions for updating:&quot;, &quot;Please use alternatives such as official/mnist/dataset.py from tensorflow/models.&quot;, &quot;2022-03-18 07:52:07.369085: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA&quot;, &quot;Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz&quot;, &quot;Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz&quot;, &quot;Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.&quot;, &quot;Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz&quot;, &quot;Accuracy at step 0: 0.1348&quot;, &quot;Accuracy at step 10: 0.7419&quot;, &quot;Accuracy at step 20: 0.8574&quot;, &quot;Accuracy at step 30: 0.8959&quot;, &quot;Accuracy at step 40: 0.9135&quot;, &quot;Accuracy at step 50: 0.9187&quot;, &quot;Accuracy at step 60: 0.9276&quot;, &quot;Accuracy at step 70: 0.9332&quot;, &quot;Accuracy at step 80: 0.9399&quot;, &quot;Accuracy at step 90: 0.9376&quot;, &quot;Adding run metadata for 99&quot;, &quot;Accuracy at step 100: 0.9378&quot;, &quot;Accuracy at step 110: 0.9463&quot;, &quot;Accuracy at step 120: 0.9479&quot;, &quot;Accuracy at step 130: 0.9468&quot;, &quot;Accuracy at step 140: 0.9467&quot;, &quot;Accuracy at step 150: 0.9475&quot;, &quot;Accuracy at step 160: 0.947&quot;, &quot;Accuracy at step 170: 0.948&quot;, &quot;Accuracy at step 180: 0.9472&quot;, &quot;Accuracy at step 190: 0.954&quot;, &quot;Adding run metadata for 199&quot;, &quot;Accuracy at step 200: 0.9492&quot;, &quot;Accuracy at step 210: 0.9571&quot;, &quot;Accuracy at step 220: 0.954&quot;, &quot;Accuracy at step 230: 0.9557&quot;, &quot;Accuracy at step 240: 0.9557&quot;, &quot;Accuracy at step 250: 0.9591&quot;, &quot;Accuracy at step 260: 0.955&quot;, &quot;Accuracy at step 270: 0.9595&quot;, &quot;Accuracy at step 280: 0.9596&quot;, &quot;Accuracy at step 290: 0.9604&quot;, &quot;Adding run metadata for 299&quot;, &quot;Accuracy at step 300: 0.9622&quot;, &quot;Accuracy at step 310: 0.9529&quot;, &quot;Accuracy at step 320: 0.9609&quot;, &quot;Accuracy at step 330: 0.9613&quot;, &quot;Accuracy at step 340: 0.9571&quot;, &quot;Accuracy at step 350: 0.9599&quot;, &quot;Accuracy at step 360: 0.9553&quot;, &quot;Accuracy at step 370: 0.9546&quot;, &quot;Accuracy at step 380: 0.962&quot;, &quot;Accuracy at step 390: 0.96&quot;, &quot;Adding run metadata for 399&quot;, &quot;Accuracy at step 400: 0.9593&quot;, &quot;Accuracy at step 410: 0.9641&quot;, &quot;Accuracy at step 420: 0.9628&quot;, &quot;Accuracy at step 430: 0.9622&quot;, &quot;Accuracy at step 440: 0.9639&quot;, &quot;Accuracy at step 450: 0.9592&quot;, &quot;Accuracy at step 460: 0.9651&quot;, &quot;Accuracy at step 470: 0.9658&quot;, &quot;Accuracy at step 480: 0.9668&quot;, &quot;Accuracy at step 490: 0.9641&quot;, &quot;Adding run metadata for 499&quot;, &quot;Accuracy at step 500: 0.9641&quot;, &quot;Accuracy at step 510: 0.9561&quot;, &quot;Accuracy at step 520: 0.9628&quot;, &quot;Accuracy at step 530: 0.964&quot;, &quot;Accuracy at step 540: 0.9663&quot;, &quot;Accuracy at step 550: 0.9681&quot;, &quot;Accuracy at step 560: 0.968&quot;, &quot;Accuracy at step 570: 0.967&quot;, &quot;Accuracy at step 580: 0.9663&quot;, &quot;Accuracy at step 590: 0.9679&quot;, &quot;Adding run metadata for 599&quot;, &quot;Accuracy at step 600: 0.9666&quot;, &quot;Accuracy at step 610: 0.9648&quot;, &quot;Accuracy at step 620: 0.9682&quot;, &quot;Accuracy at step 630: 0.9691&quot;, &quot;Accuracy at step 640: 0.9683&quot;, &quot;Accuracy at step 650: 0.966&quot;, &quot;Accuracy at step 660: 0.9668&quot;, &quot;Accuracy at step 670: 0.9658&quot;, &quot;Accuracy at step 680: 0.9709&quot;, &quot;Accuracy at step 690: 0.9632&quot;, &quot;Adding run metadata for 699&quot;, &quot;Accuracy at step 700: 0.9697&quot;, &quot;Accuracy at step 710: 0.9632&quot;, &quot;Accuracy at step 720: 0.9641&quot;, &quot;Accuracy at step 730: 0.9659&quot;, &quot;Accuracy at step 740: 0.9654&quot;, &quot;Accuracy at step 750: 0.9694&quot;, &quot;Accuracy at step 760: 0.968&quot;, &quot;Accuracy at step 770: 0.9661&quot;, &quot;Accuracy at step 780: 0.969&quot;, &quot;Accuracy at step 790: 0.9663&quot;, &quot;Adding run metadata for 799&quot;, &quot;Accuracy at step 800: 0.9687&quot;, &quot;Accuracy at step 810: 0.9651&quot;, &quot;Accuracy at step 820: 0.9705&quot;, &quot;Accuracy at step 830: 0.9645&quot;, &quot;Accuracy at step 840: 0.9652&quot;, &quot;Accuracy at step 850: 0.9719&quot;, &quot;Accuracy at step 860: 0.9654&quot;, &quot;Accuracy at step 870: 0.964&quot;, &quot;Accuracy at step 880: 0.9645&quot;, &quot;Accuracy at step 890: 0.9615&quot;, &quot;Adding run metadata for 899&quot;, &quot;Accuracy at step 900: 0.9661&quot;, &quot;Accuracy at step 910: 0.9649&quot;, &quot;Accuracy at step 920: 0.9569&quot;, &quot;Accuracy at step 930: 0.9654&quot;, &quot;Accuracy at step 940: 0.9674&quot;, &quot;Accuracy at step 950: 0.971&quot;, &quot;Accuracy at step 960: 0.9684&quot;, &quot;Accuracy at step 970: 0.9648&quot;, &quot;Accuracy at step 980: 0.9693&quot;, &quot;Accuracy at step 990: 0.9627&quot;, &quot;Adding run metadata for 999&quot; ] } ] }, &quot;attributes&quot;:{} }  "},{"title":"Model Version REST API","type":0,"sectionRef":"#","url":"docs/api/model-version","content":"","keywords":""},{"title":"Create a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#create-a-model-version","content":"POST /api/v1/model-version?baseDir={baseDir} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters","content":"Field Name\tType\tIn\tDescription\tRequiredbaseDir\tString\tpath\texperiment directory path.\to name\tString\tbody\tregistered model name.\to experimentId\tString\tbody\tAdd a tag for the registered model.\to description\tString\tbody\tAdd description for the version of model.\tx tags\tList&lt;String&gt;\tbody\tAdd tags for the version of model.\tx "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example","content":""},{"title":"Experiment Template REST API","type":0,"sectionRef":"#","url":"docs/api/experiment-template","content":"","keywords":""},{"title":"Create experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#create-experiment-template","content":"POST /api/v1/template "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#parameters","content":"Field Name\tType\tIn\tDescriptionname\tString\tbody\tExperiment template name. This is required. author\tString\tbody\tAuthor name. description\tString\tbody\tDescription of the experiment template. parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template. experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template. ExperimentTemplateParamSpec​ Field Name\tType\tDescriptionname\tString\tParameter name. required\tBoolean\ttrue / false. Whether the parameter is required. description\tString\tDescription of the parameter. value\tString\tValue of the parameter. ExperimentSpec​ Field Name\tType\tDescriptionmeta\tExperimentMeta\tMeta data of the experiment template. environment\tEnvironmentSpec\tEnvironment of the experiment template. spec\tMap&lt;String, ExperimentTaskSpec&gt;\tSpec of pods. code\tCodeSpec\tExperiment codespec. ExperimentMeta​ Field Name\tType\tDescriptionname\tString\tExperiment Name. namespace\tString\tExperiment namespace. framework\tString\tExperiment framework. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. EnvironmentSpec​ See more details in environment api. ExperimentTaskSpec​ Field Name\tType\tDescriptionreplicas\tInteger\tNumbers of replicas. resoureces\tString\tResouces of the task name\tString\tTask name. image\tString\tImage name. cmd\tString\tCommand. envVars\tMap&lt;String, String&gt;\tEnvironmental variables. CodeSpec​ Field Name\tType\tDescriptionsyncMode\tString\tsync mode of code spec. url\tString\turl of code spec. "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"List experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#list-experiment-template","content":"GET /api/v1/template "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ [{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;,&quot; value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }], &quot;attributes&quot;:{} }  "},{"title":"Get experiment template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#get-experiment-template","content":"GET /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tExperiment template name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:1, &quot;serverTimestamp&quot;:1650788898882 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;:[ { &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; } ], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;experimentId&quot;:null, &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{ &quot;ENV1&quot;:&quot;ENV1&quot; }, &quot;tags&quot;:[] }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{ &quot;memory&quot;:&quot;1024M&quot;, &quot;cpu&quot;:&quot;1&quot; } } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Patch template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#patch-template","content":"PATCH /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tExperiment template name.\to author\tString\tbody\tAuthor name.\to description\tString\tbody\tDescription of the experiment template.\tx parameters\tList&lt;ExperimentTemplateParamSpec&gt;\tbody\tParameters of the experiment template.\to experimentSpec\tExperimentSpec\tbody\tSpec of the experiment template.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;my-tf-mnist-template&quot;, &quot;author&quot;: &quot;author-new&quot;, &quot;description&quot;: &quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;: &quot;learning_rate&quot;, &quot;value&quot;: 0.1, &quot;required&quot;: true, &quot;description&quot;: &quot;This is learning_rate of training.&quot; }, { &quot;name&quot;: &quot;batch_size&quot;, &quot;value&quot;: 150, &quot;required&quot;: true, &quot;description&quot;: &quot;This is batch_size of training.&quot; }, { &quot;name&quot;: &quot;experiment_name&quot;, &quot;value&quot;: &quot;tf-mnist1&quot;, &quot;required&quot;: true, &quot;description&quot;: &quot;the name of experiment.&quot; } ], &quot;experimentSpec&quot;: { &quot;meta&quot;: { &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate={{learning_rate}} --batch_size={{batch_size}}&quot;, &quot;name&quot;: &quot;{{experiment_name}}&quot;, &quot;envVars&quot;: { &quot;ENV1&quot;: &quot;ENV1&quot; }, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;namespace&quot;: &quot;default&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; } } } ' http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  note &quot;description&quot;, &quot;parameters&quot;, &quot;experimentSpec&quot;, &quot;author&quot; etc can be updated using this API. &quot;name&quot; of experiment template is not supported. "},{"title":"Delete template​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#delete-template","content":"GET /api/v1/template/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tExperiment template name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/template/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentTemplateId&quot;:{ &quot;id&quot;:2, &quot;serverTimestamp&quot;:1626160071451 }, &quot;experimentTemplateSpec&quot;:{ &quot;name&quot;:&quot;my-tf-mnist-template&quot;, &quot;author&quot;:&quot;author-new&quot;, &quot;description&quot;:&quot;This is a template to run tf-mnist&quot;, &quot;parameters&quot;: [{ &quot;name&quot;:&quot;learning_rate&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is learning_rate of training.&quot;, &quot;value&quot;:&quot;0.1&quot; }, { &quot;name&quot;:&quot;batch_size&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;This is batch_size of training.&quot;, &quot;value&quot;:&quot;150&quot; }, { &quot;name&quot;:&quot;experiment_name&quot;, &quot;required&quot;:&quot;true&quot;, &quot;description&quot;:&quot;the name of experiment.&quot;, &quot;value&quot;:&quot;tf-mnist1&quot; }, { &quot;name&quot;:&quot;spec.Ps.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Ps.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }, { &quot;name&quot;:&quot;spec.Worker.replicas&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.cpu&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1&quot; }, { &quot;name&quot;:&quot;spec.Worker.resourceMap.memory&quot;, &quot;required&quot;:&quot;false&quot;, &quot;description&quot;:&quot;&quot;, &quot;value&quot;:&quot;1024M&quot; }], &quot;experimentSpec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;{{experiment_name}}&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d{{learning_rate}} --batch_size\\u003d{{batch_size}}&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } } }, &quot;attributes&quot;:{} }  "},{"title":"Use template to create a experiment​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#use-template-to-create-a-experiment","content":"POST /api/v1/experiment/{template_name} "},{"title":"Parameters​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath and body\tExperiment template name.\to params\tMap&lt;String, String&gt;\tbody\tParameters of the experiment including experiment_name.\to "},{"title":"Example​","type":1,"pageTitle":"Experiment Template REST API","url":"docs/api/experiment-template#example-5","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;tf-mnist&quot;, &quot;params&quot;: { &quot;learning_rate&quot;:&quot;0.01&quot;, &quot;batch_size&quot;:&quot;150&quot;, &quot;experiment_name&quot;:&quot;newexperiment1&quot; } } ' http://127.0.0.1:32080/api/v1/experiment/my-tf-mnist-template  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:null, &quot;result&quot;:{ &quot;experimentId&quot;:&quot;experiment_1626160071451_0001&quot;, &quot;name&quot;:&quot;newexperiment1&quot;, &quot;uid&quot;:&quot;b895985c-411c-4e89-90e0-c60a2a8a4235&quot;, &quot;status&quot;:&quot;Accepted&quot;, &quot;acceptedTime&quot;:&quot;2021-07-13T16:21:31.000+08:00&quot;, &quot;createdTime&quot;:null, &quot;runningTime&quot;:null, &quot;finishedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;newexperiment1&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;framework&quot;:&quot;TensorFlow&quot;, &quot;cmd&quot;:&quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir\\u003d/train/log --learning_rate\\u003d0.01 --batch_size\\u003d150&quot;, &quot;envVars&quot;:{&quot;ENV1&quot;:&quot;ENV1&quot;} }, &quot;environment&quot;:{ &quot;name&quot;:null, &quot;dockerImage&quot;:null, &quot;kernelSpec&quot;:null, &quot;description&quot;:null, &quot;image&quot;:&quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;:{ &quot;Ps&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} }, &quot;Worker&quot;:{ &quot;replicas&quot;:1, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1024M&quot;, &quot;name&quot;:null, &quot;image&quot;:null, &quot;cmd&quot;:null, &quot;envVars&quot;:null, &quot;resourceMap&quot;:{&quot;memory&quot;:&quot;1024M&quot;,&quot;cpu&quot;:&quot;1&quot;} } }, &quot;code&quot;:null } }, &quot;attributes&quot;:{} }  "},{"title":"List model versions under a registered model​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#list-model-versions-under-a-registered-model","content":"GET /api/v1/model-version/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/model-version/register  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;List all model version instances&quot;, &quot;result&quot; : [ { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/1&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 1 }, { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/2&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 2 }, ], &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Get a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#get-a-model-version","content":"GET /api/v1/model-version/{name}/{version} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tRegistered model name.\to version\tString\tpath\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/model-version/register/1  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Get the model version instance&quot;, &quot;result&quot; : { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;currentStage&quot; : &quot;None&quot;, &quot;dataset&quot; : null, &quot;description&quot; : null, &quot;experimentId&quot; : &quot;experiment-1639276018590-0001&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;modelType&quot; : &quot;tensorflow&quot;, &quot;name&quot; : &quot;register&quot;, &quot;source&quot; : &quot;s3://submarine/experiment-1639276018590-0001/example/1&quot;, &quot;tags&quot; : [], &quot;userId&quot; : &quot;&quot;, &quot;version&quot; : 1 }, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Patch a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#patch-a-model-version","content":"PATCH /api/v1/model-version "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tbody\tRegistered model name.\to version\tString\tbody\tRegistered model version.\to description\tString\tbody\tNew description.\tx currentStage\tString\tbody\tStage of the model.\tx dataset\tString\tbody\tDataset use in the model.\tx "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;register&quot;, &quot;version&quot;: 1, &quot;description&quot;: &quot;new_description&quot;, &quot;currentStage&quot;: &quot;production&quot;, &quot;dataset&quot;: &quot;new_dataset&quot; }' http://127.0.0.1:32080/api/v1/model-version  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Update the model version instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a model version​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#delete-a-model-version","content":"DELETE /api/v1/model-version/{name}/{version} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tRegistered model name.\to version\tString\tpath\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/model-version/register/1  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete the model version instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Create a model version tag​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#create-a-model-version-tag","content":"POST /api/v1/model-version/tag?name={name}&amp;version={version}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-5","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tRegistered model name.\to version\tString\tquery\tRegistered model version.\to tag\tString\tquery\tTag of the registered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-5","content":"Example Request curl -X POST http://127.0.0.1:32080/api/v1/model-version/tag?name=register&amp;version=2&amp;tag=789  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Create a model version tag instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a model version tag​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#delete-a-model-version-tag","content":"DELETE /api/v1/model-version/tag?name={name}&amp;version={version}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#parameters-6","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tRegistered model name.\to version\tString\tquery\tRegistered model version.\to tag\tString\tquery\tTag of the registered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Model Version REST API","url":"docs/api/model-version#example-6","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/model-version/tag?name=register&amp;version=2&amp;tag=789  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete a registered model tag instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Register Model REST API","type":0,"sectionRef":"#","url":"docs/api/register-model","content":"","keywords":""},{"title":"Create a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#create-a-registered-model","content":"POST /api/v1/registered-model "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters","content":"Field Name\tType\tDescription\tRequiredname\tString\tRegistered model name.\to description\tString\tRegistered model description.\tx tags\tList&lt;String&gt;\tRegistered model tags.\tx "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;example_name&quot;, &quot;description&quot;: &quot;example_description&quot;, &quot;tags&quot;: [&quot;123&quot;, &quot;456&quot;] } ' http://127.0.0.1:32080/api/v1/registered-model  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a registered model instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"List registered models​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#list-registered-models","content":"GET /api/v1/registered-model "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/registered-model  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;List all registered model instances&quot;, &quot;result&quot; : [ { &quot;creationTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;name&quot; : &quot;example_name&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, { &quot;creationTime&quot; : &quot;2021-12-16 10:16:25&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:16:25&quot;, &quot;name&quot; : &quot;example_name1&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, { &quot;creationTime&quot; : &quot;2021-12-12 02:27:05&quot;, &quot;description&quot; : null, &quot;lastUpdatedTime&quot; : &quot;2021-12-14 12:49:33&quot;, &quot;name&quot; : &quot;register&quot;, &quot;tags&quot; : [] } ], &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Get a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#get-a-registered-model","content":"GET /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Get the registered model instance&quot;, &quot;result&quot; : { &quot;creationTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;description&quot; : &quot;example_description&quot;, &quot;lastUpdatedTime&quot; : &quot;2021-12-16 10:14:06&quot;, &quot;name&quot; : &quot;example_name&quot;, &quot;tags&quot; : [ &quot;123&quot;, &quot;456&quot; ] }, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Patch a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#patch-a-registered-model","content":"PATCH /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to name\tString\tbody\tNew model name.\tx description\tString\tpath\tNew model description.\tx "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-3","content":"Example Request curl -X PATCH -H &quot;Content-Type: application/json&quot; -d ' { &quot;name&quot;: &quot;new_name&quot;, &quot;description&quot;: &quot;new_description&quot; }' http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Update the registered model instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Delete a registered model​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#delete-a-registered-model","content":"DELETE /api/v1/registered-model/{name} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tpath\tregistered model name.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-4","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/registered-model/example_name  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete the registered model instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Create a registered model tag​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#create-a-registered-model-tag","content":"POST /api/v1/registered-model/tag?name={name}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters-4","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tregistered model name.\to tag\tString\tquery\tAdd a tag for the registered model.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-5","content":"Example Request curl -X POST http://127.0.0.1:32080/api/v1/registered-model/tag?name=example_name&amp;tag=example_tag  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a registered model tag instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Delete a registered model tag​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#delete-a-registered-model-tag","content":"DELETE /api/v1/registered-model/tag?name={name}&amp;tag={tag} "},{"title":"Parameters​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#parameters-5","content":"Field Name\tType\tIn\tDescription\tRequiredname\tString\tquery\tregistered model name.\to tag\tString\tquery\tDelete a tag in the registered model.\to "},{"title":"Example​","type":1,"pageTitle":"Register Model REST API","url":"docs/api/register-model#example-6","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/registered-model/tag?name=example_name&amp;tag=example_tag  Example Response { &quot;attributes&quot; : {}, &quot;code&quot; : 200, &quot;message&quot; : &quot;Delete a registered model tag instance&quot;, &quot;result&quot; : null, &quot;status&quot; : &quot;OK&quot;, &quot;success&quot; : true }  "},{"title":"Serve REST API","type":0,"sectionRef":"#","url":"docs/api/serve","content":"","keywords":""},{"title":"Create a model serve​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#create-a-model-serve","content":"POST /api/v1/serve "},{"title":"Parameters​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#parameters","content":"Field Name\tType\tDescription\tRequiredmodelName\tString\tRegistered model name.\to modelVersion\tString\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#example","content":"note Make sure there is a model named simple with version 1 in the database. Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;modelName&quot;: &quot;simple&quot;, &quot;modelVersion&quot;:1, } ' http://127.0.0.1:32080/api/v1/serve  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a serve instance&quot;, &quot;result&quot;:{&quot;url&quot;:null}, &quot;attributes&quot;:{} }  "},{"title":"Delete the TensorFlow model serve​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#delete-the-tensorflow-model-serve","content":"DELETE /api/v1/serve "},{"title":"Parameters​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#parameters-1","content":"Field Name\tType\tDescription\tRequiredmodelName\tString\tRegistered model name.\to modelVersion\tString\tRegistered model version.\to "},{"title":"Example​","type":1,"pageTitle":"Serve REST API","url":"docs/api/serve#example-1","content":"Example Request curl -X DELETE -H &quot;Content-Type: application/json&quot; -d ' { &quot;modelName&quot;: &quot;simple&quot;, &quot;modelVersion&quot;:1, } ' http://127.0.0.1:32080/api/v1/serve  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete the model serve instance&quot;, &quot;result&quot;:null, &quot;attributes&quot;:{} }  "},{"title":"Notebook REST API","type":0,"sectionRef":"#","url":"docs/api/notebook","content":"","keywords":""},{"title":"Create a notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#create-a-notebook-instance","content":"POST /api/v1/notebook "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#parameters","content":"NotebookSpec in request body. NotebookSpec​ Field Name\tType\tDescription\tRequiredmeta\tNotebookMeta\tMeta data of the notebook.\to environment\tEnvironmentSpec\tEnvironment of the experiment template.\to spec\tNotebookPodSpec\tSpec of the notebook pods.\to NotebookMeta​ Field Name\tType\tDescription\tRequiredname\tString\tNotebook name.\to namespace\tString\tNotebook namespace.\to ownerId\tString\tUser id.\to EnvironmentSpec​ See more details in environment api. NotebookPodSpec​ Field Name\tType\tDescription\tRequiredenvVars\tMap&lt;String, String&gt;\tEnvironmental variables.\tx resources\tString\tResourecs of the pod.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#example","content":"Example Request curl -X POST -H &quot;Content-Type: application/json&quot; -d ' { &quot;meta&quot;: { &quot;name&quot;: &quot;test-nb&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;ownerId&quot;: &quot;e9ca23d68d884d4ebb19d07889727dae&quot; }, &quot;environment&quot;: { &quot;name&quot;: &quot;notebook-env&quot; }, &quot;spec&quot;: { &quot;envVars&quot;: { &quot;TEST_ENV&quot;: &quot;test&quot; }, &quot;resources&quot;: &quot;cpu=1,memory=1.0Gi&quot; } } ' http://127.0.0.1:32080/api/v1/notebook  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Create a notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;creating&quot;, &quot;reason&quot;:&quot;The notebook instance is creating&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:null, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"List notebook instances which belong to user​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#list-notebook-instances-which-belong-to-user","content":"GET /api/v1/notebook?id={user_id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#parameters-1","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tquery\tUser id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#example-1","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/notebook?id={user_id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;List all notebook instances&quot;, &quot;result&quot;:[ { &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:null, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;running&quot;, &quot;reason&quot;:&quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } } ], &quot;attributes&quot;:{} }  "},{"title":"Get the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#get-the-notebook-instance","content":"GET /api/v1/notebook/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#parameters-2","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tNotebook id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#example-2","content":"Example Request curl -X GET http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Get the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;running&quot;, &quot;reason&quot;:&quot;The notebook instance is running&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Delete the notebook instance​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#delete-the-notebook-instance","content":"DELETE /api/v1/notebook/{id} "},{"title":"Parameters​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#parameters-3","content":"Field Name\tType\tIn\tDescription\tRequiredid\tString\tpath\tNotebook id.\to "},{"title":"Example​","type":1,"pageTitle":"Notebook REST API","url":"docs/api/notebook#example-3","content":"Example Request curl -X DELETE http://127.0.0.1:32080/api/v1/notebook/{id}  Example Response { &quot;status&quot;:&quot;OK&quot;, &quot;code&quot;:200, &quot;success&quot;:true, &quot;message&quot;:&quot;Delete the notebook instance&quot;, &quot;result&quot;:{ &quot;notebookId&quot;:&quot;notebook_1647574374688_0001&quot;, &quot;name&quot;:&quot;test-nb&quot;, &quot;uid&quot;:&quot;4a839fef-b4c9-483a-b4e8-c17236588118&quot;, &quot;url&quot;:&quot;/notebook/default/test-nb/lab&quot;, &quot;status&quot;:&quot;terminating&quot;, &quot;reason&quot;:&quot;The notebook instance is terminating&quot;, &quot;createdTime&quot;:&quot;2022-03-18T16:13:16.000+08:00&quot;, &quot;deletedTime&quot;:&quot;2022-03-18T16:13:21.000+08:00&quot;, &quot;spec&quot;:{ &quot;meta&quot;:{ &quot;name&quot;:&quot;test-nb&quot;, &quot;namespace&quot;:&quot;default&quot;, &quot;ownerId&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;labels&quot;:{ &quot;notebook-owner-id&quot;:&quot;e9ca23d68d884d4ebb19d07889727dae&quot;, &quot;notebook-id&quot;:&quot;notebook_1647574374688_0001&quot; } }, &quot;environment&quot;:{ &quot;name&quot;:&quot;notebook-env&quot;, &quot;dockerImage&quot;:&quot;apache/submarine:jupyter-notebook-0.7.0&quot;, &quot;kernelSpec&quot;:{ &quot;name&quot;:&quot;submarine_jupyter_py3&quot;, &quot;channels&quot;:[ &quot;defaults&quot; ], &quot;condaDependencies&quot;:[], &quot;pipDependencies&quot;:[] }, &quot;description&quot;:null, &quot;image&quot;:null }, &quot;spec&quot;:{ &quot;envVars&quot;:{ &quot;TEST_ENV&quot;:&quot;test&quot; }, &quot;resources&quot;:&quot;cpu\\u003d1,memory\\u003d1.0Gi&quot; } } }, &quot;attributes&quot;:{} }  "},{"title":"Apache Submarine Community","type":0,"sectionRef":"#","url":"docs/community/","content":"","keywords":""},{"title":"Communicating​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#communicating","content":"You can reach out to the community members via any one of the following ways: Slack Developer: https://join.slack.com/t/asf-submarine/shared_invite info After clicking the link above, you would join the ASF Submarine channel. Zoom: https://cloudera.zoom.us/j/97264903288 Sync Up: https://docs.google.com/document/d/16pUO3TP4SxSeLduG817GhVAjtiph9HYpRHo_JgduDvw/edit "},{"title":"Your First Contribution​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#your-first-contribution","content":"You can start by finding an existing issue with the https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE?filter=allopenissues label. These issues are well suited for new contributors. If a PR (Pull Request) submitted to the Submarine Github projects by you is approved and merged, then you become a Submarine Contributor. If you want to work on a new idea of relatively small scope: Submit an issue describing your proposed change to the repo in question. The repo owners will respond to your issue promptly. Submit a pull request of Submarine containing a tested change. Contributions are welcomed and greatly appreciated. See CONTRIBUTING for details on submitting patches and the contribution workflow. "},{"title":"How Do I Become a Committer?​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#how-do-i-become-a-committer","content":"First of all, you need to get involved and be a Contributor. Based on your track-record as a contributor, Per Apache code, PMCs vote on committership, may invite you to be a committer (after we've called a vote). When that happens, if you accept, the following process kicks into place... Note that becoming a committer is not just about submitting some patches; it‘s also about helping out on the development and user, helping with documentation and the issues. See How to become an Apache Submarine Committer and PMC for more details. "},{"title":"How to commit​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#how-to-commit","content":"See How to commit for helper doc for Submarine committers. "},{"title":"Communication​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#communication","content":"Communication within the Submarine community abides by Apache’s Code of Conduct. "},{"title":"Mailing lists​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#mailing-lists","content":"Get help using Apache Submarine or contribute to the project on our mailing lists: Users : subscribe, unsubscribe, archivesfor usage questions, help, and announcements.Dev : subscribe, unsubscribe, archivesfor people wanting to contribute to the project.Commits : subscribe, unsubscribe, archivesfor commit messages and patches. Take subscribe Dev as an example, you should send an email to dev-subscribe@submarine.apache.org. Usually, this happens when you just click the &quot;subscribe&quot; link. If this does not work, simply copy the address and paste it into the &quot;To:&quot; field of a new message. After that, you will get an email from dev-help@submarine.apache.org, follow the directives of the mail to reply, then you will subscribe dev@submarine.apache.org successfully. "},{"title":"License​","type":1,"pageTitle":"Apache Submarine Community","url":"docs/community/#license","content":"Submarine source code is under the Apache 2.0 license. See the LICENSE file for details. "},{"title":"Bylaws","type":0,"sectionRef":"#","url":"docs/community/Bylaws","content":"Bylaws This document defines the bylaws under which the Apache Submarine project operates. It defines the roles and responsibilities of the project, who may vote, how voting works, how conflicts are resolved, etc. Submarine is a project of the Apache Software Foundation. The foundation holds the trademark on the name “Submarine” and copyright on Apache code including the code in the Submarine codebase. The foundation FAQ explains the operation and background of the foundation. Submarine is typical of Apache projects in that it operates under a set of principles, known collectively as the “Apache Way”. If you are new to Apache development, please refer to the Incubator project for more information on how Apache projects operate. Roles and Responsibilities Apache projects define a set of roles with associated rights and responsibilities. These roles govern what tasks an individual may perform within the project. The roles are defined in the following sections Users The most important participants in the project are people who use our software. The majority of our developers start out as users and guide their development efforts from the user’s perspective. Users contribute to the Apache projects by providing feedback to developers in the form of bug reports and feature suggestions. As well, users participate in the Apache community by helping other users on mailing lists and user support forums. Contributors All of the volunteers who are contributing time, code, documentation, or resources to the Submarine Project. A contributor that makes sustained, welcome contributions to the project may be invited to become a Committer, though the exact timing of such invitations depends on many factors. Committers The project’s Committers are responsible for the project’s technical management. Committers have access to all subproject subversion repositories. Committers may cast binding votes on any technical discussion regarding any subproject. Committer access is by invitation only and must be approved by consensus approval of the active PMC members. A Committer is considered emeritus by their own declaration or by not contributing in any form to the project for over six months. An emeritus committer may request reinstatement of commit access from the PMC. Such reinstatement is subject to consensus approval of active PMC members. Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. All Apache committers are required to have a signed Contributor License Agreement (CLA) on file with the Apache Software Foundation. There is a Committer FAQ which provides more details on the requirements for Committers A committer who makes a sustained contribution to the project may be invited to become a member of the PMC. The form of contribution is not limited to code. It can also include code review, helping out users on the mailing lists, documentation, testing, etc. Release Manager A Release Manager (RM) is a committer who volunteers to produce a Release Candidate according to HowToRelease. The RM shall publish a Release Plan on the common-dev@ list stating the branch from which they intend to make a Release Candidate, at least one week before they do so. The RM is responsible for building consensus around the content of the Release Candidate, in order to achieve a successful Product Release vote. Project Management Committee The Project Management Committee (PMC) for Apache Submarine was created by the Apache Board in October 2019 when Submarine moved out of Hadoop and became a top level project at Apache. The PMC is responsible to the board and the ASF for the management and oversight of the Apache Submarine codebase. The responsibilities of the PMC include Deciding what is distributed as products of the Apache Submarine project. In particular all releases must be approved by the PMCMaintaining the project’s shared resources, including the codebase repository, mailing lists, websites.Speaking on behalf of the project.Resolving license disputes regarding products of the projectNominating new PMC members and committersMaintaining these bylaws and other guidelines of the project Membership of the PMC is by invitation only and must be approved by a consensus approval of active PMC members. A PMC member is considered “emeritus” by their own declaration or by not contributing in any form to the project for over six months. An emeritus member may request reinstatement to the PMC. Such reinstatement is subject to consensus approval of the active PMC members. The chair of the PMC is appointed by the ASF board. The chair is an office holder of the Apache Software Foundation (Vice President, Apache Submarine) and has primary responsibility to the board for the management of the projects within the scope of the Submarine PMC. The chair reports to the board quarterly on developments within the Submarine project. The chair of the PMC is rotated annually. When the chair is rotated or if the current chair of the PMC resigns, the PMC votes to recommend a new chair using Single Transferable Vote (STV) voting. See https://wiki.apache.org/general/BoardVoting for specifics. The decision must be ratified by the Apache board. Decision Making Within the Submarine project, different types of decisions require different forms of approval. For example, the previous section describes several decisions which require “consensus approval” approval. This section defines how voting is performed, the types of approvals, and which types of decision require which type of approval. Voting Decisions regarding the project are made by votes on the primary project development mailing list (dev@submarine.apache.org). Where necessary, PMC voting may take place on the private Submarine PMC mailing list. Votes are clearly indicated by subject line starting with [VOTE]. Votes may contain multiple items for approval and these should be clearly separated. Voting is carried out by replying to the vote mail. Voting may take four flavors +1 “Yes,” “Agree,” or “the action should be performed.” In general, this vote also indicates a willingness on the behalf of the voter in “making it happen”+0 This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.-0 This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.-1 This is a negative vote. On issues where consensus is required, this vote counts as a veto. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an alternative course of action. All participants in the Submarine project are encouraged to show their agreement with or against a particular action by voting. For technical decisions, only the votes of active committers are binding. Non binding votes are still useful for those with binding votes to understand the perception of an action in the wider Submarine community. For PMC decisions, only the votes of PMC members are binding. Voting can also be applied to changes made to the Submarine codebase. These typically take the form of a veto (-1) in reply to the commit message sent when the commit is made. Approvals These are the types of approvals that can be sought. Different actions require different types of approvals Consensus Approval - Consensus approval requires 3 binding +1 votes and no binding vetoes.Lazy Consensus - Lazy consensus requires no -1 votes (‘silence gives assent’).Lazy Majority - A lazy majority vote requires 3 binding +1 votes and more binding +1 votes than -1 votes.Lazy 2⁄3 Majority - Lazy 2⁄3 majority votes requires at least 3 votes and twice as many +1 votes as -1 votes. Vetoes A valid, binding veto cannot be overruled. If a veto is cast, it must be accompanied by a valid reason explaining the reasons for the veto. The validity of a veto, if challenged, can be confirmed by anyone who has a binding vote. This does not necessarily signify agreement with the veto - merely that the veto is valid. If you disagree with a valid veto, you must lobby the person casting the veto to withdraw their veto. If a veto is not withdrawn, any action that has been vetoed must be reversed in a timely manner. Actions This section describes the various actions which are undertaken within the project, the corresponding approval required for that action and those who have binding votes over the action. Code Change A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc. Consensus approval of active committers, but with a minimum of one +1. The code can be committed after the first +1, unless the code change represents a merge from a branch, in which case three +1s are required. Product Release When a release of one of the project’s products is ready, a vote is required to accept the release as an official release of the project. Lazy Majority of active PMC members Adoption of New Codebase When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project Lazy 2⁄3 majority of PMC members New Branch Committer When a branch committer is proposed for the PMC Lazy consensus of active PMC members New Committer When a new committer is proposed for the project Consensus approval of active PMC members New PMC Member When a committer is proposed for the PMC Consensus approval of active PMC members Branch Committer Removal When removal of commit privileges is sought or when the branch is merged to the mainline Lazy 2⁄3 majority of active PMC members Committer Removal When removal of commit privileges is sought. Note: Such actions will also be referred to the ASF board by the PMC chair Lazy 2⁄3 majority of active PMC members (excluding the committer in question if a member of the PMC). PMC Member Removal When removal of a PMC member is sought. Note: Such actions will also be referred to the ASF board by the PMC chair. Lazy 2⁄3 majority of active PMC members (excluding the member in question) Modifying Bylaws Modifying this document. Lazy majority of active PMC members Voting Timeframes Votes are open for a period of 7 days to allow all active voters time to consider the vote. Votes relating to code changes are not subject to a strict timetable but should be made as timely as possible. Product Release - Vote Timeframe Release votes, alone, run for a period of 5 days. All other votes are subject to the above timeframe of 7 days.","keywords":""},{"title":"How To Contribute to Submarine","type":0,"sectionRef":"#","url":"docs/community/contributing","content":"","keywords":""},{"title":"Preface​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#preface","content":"Apache Submarine is an Apache 2.0 License Software. Contributing to Submarine means you agree to the Apache 2.0 License. Please read Code of Conduct carefully.The document How It Works can help you understand Apache Software Foundation further. "},{"title":"Build Submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#build-submarine","content":"Build From Code "},{"title":"Creating patches​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#creating-patches","content":"Submarine follows Fork &amp; Pull model. "},{"title":"Step1: Fork apache/submarine github repository (first time)​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step1-fork-apachesubmarine-github-repository-first-time","content":"Visit https://github.com/apache/submarineClick the Fork button to create a fork of the repository "},{"title":"Step2: Clone the Submarine to your local machine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step2-clone-the-submarine-to-your-local-machine","content":"# USERNAME – your Github user account name. git clone git@github.com:${USERNAME}/submarine.git # or: git clone https://github.com/${USERNAME}/submarine.git cd submarine # set upstream git remote add upstream git@github.com:apache/submarine.git # or: git remote add upstream https://github.com/apache/submarine.git # Don't push to the upstream master. git remote set-url --push upstream no_push # Check upstream/origin: # origin git@github.com:${USERNAME}/submarine.git (fetch) # origin git@github.com:${USERNAME}/submarine.git (push) # upstream git@github.com:apache/submarine.git (fetch) # upstream no_push (push) git remote -v  "},{"title":"Step3: Create a new Jira in Submarine project​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step3-create-a-new-jira-in-submarine-project","content":"New contributors need privilege to create JIRA issues. Please email kaihsun@apache.org with your Jira username. In addition, the email title should be &quot;[New Submarine Contributor]&quot;.Check Jira issue tracker for existing issues.Create a new Jira issue in Submarine project. When the issue is created, a Jira number (eg. SUBMARINE-748) will be assigned to the issue automatically. "},{"title":"Step4: Create a local branch for your contribution​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step4-create-a-local-branch-for-your-contribution","content":"cd submarine # Make your local master up-to-date git checkout master git fetch upstream git rebase upstream/master # Create a new branch fro issue SUBMARINE-${jira_number} git checkout -b SUBMARINE-${jira_number} # Example: git checkout -b SUBMARINE-748  "},{"title":"Step5: Develop & Create commits​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step5-develop--create-commits","content":"You can edit the code on the SUBMARINE-${jira_number} branch. (Coding Style: Code Convention)Create commits git add ${edited files} git commit -m &quot;SUBMARINE-${jira_number}. ${Commit Message}&quot; # Example: git commit -m &quot;SUBMARINE-748. Update Contributing guide&quot;  "},{"title":"Step6: Syncing your local branch with upstream/master​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step6-syncing-your-local-branch-with-upstreammaster","content":"# On SUBMARINE-${jira_number} branch git fetch upstream git rebase upstream/master  Please do not use git pull to synchronize your local branch. Because git pull does a merge to create merged commits, these will make commit history messy. "},{"title":"Step7: Push your local branch to your personal fork​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step7-push-your-local-branch-to-your-personal-fork","content":"git push origin SUBMARINE-${jira_number}  "},{"title":"Step8: Check GitHub Actions status of your personal commit​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step8-check-github-actions-status-of-your-personal-commit","content":"Visit https://github.com/${USERNAME}/submarine/actionsPlease make sure your new commits can pass all workflows before creating a pull request.  "},{"title":"Step9: Create a pull request on github UI​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step9-create-a-pull-request-on-github-ui","content":"Visit your fork at https://github.com/${USERNAME}/submarine.gitClick Compare &amp; Pull Request button to create pull request. Pull Request template​ Pull request templateFilling the template thoroughly can improve the speed of the review process. Example:   "},{"title":"Step10: Check GitHub Actions status of your pull request in apache/submarine​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step10-check-github-actions-status-of-your-pull-request-in-apachesubmarine","content":"Visit https://github.com/apache/submarine/actionsPlease make sure your pull request can pass all workflows.  "},{"title":"Step11: The Review Process​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step11-the-review-process","content":"Anyone can be a reviewer and comment on the pull requests.Reviewer can indicate that a patch looks suitable for merging with a comment such as: &quot;Looks good&quot;, &quot;LGTM&quot;, &quot;+1&quot;. (PS: LGTM = Looks Good To Me)At least one indication of suitability (e.g. &quot;LGTM&quot;) from a committer is required to be merged. A committer can then initiate lazy consensus (&quot;Merge if there is no more discussion&quot;) after which the code can be merged after a particular time (usually 24 hours) if there are no more reviews.Contributors can ping reviewers (including committers) by commenting 'Ready to review'. "},{"title":"Step12: Address review comments​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#step12-address-review-comments","content":"Push new commits to SUBMARINE-${jira_number} branch. The pull request will update automatically.After you address all review comments, committers will merge the pull request. "},{"title":"Code convention​","type":1,"pageTitle":"How To Contribute to Submarine","url":"docs/community/contributing#code-convention","content":"We are following Google Code style: Java styleShell style There are some plugins to format, lint your code in IDE (use dev-support/maven-config/checkstyle.xml as rules) Checkstyle plugin for Intellij (Setting Guide)Checkstyle plugin for Eclipse (Setting Guide) "},{"title":"How to become a Committer","type":0,"sectionRef":"#","url":"docs/community/HowToBecomeCommitter","content":"How to become a Committer Apache Submarine builds a community completely following Apache’s rules. Apache Committer is a term used in ASF (Apache Software Foundation) to indicate the person who submits a specific project. Apache Submarine Committer has permission to write the Submarine codebase and can merge PR. Anyone who has made enough contributions to the community and gained enough trust can become an Apache Submarine Committer. As long as anyone contributes to the Submarine project, you are the officially recognized Contributor of the Submarine project. There is no exact standard for growing from Contributor to Committer, and there is no expected timetable, but Committer candidates are generally long-term active contributors, becoming Committer does not require a huge architectural improvement contribution, or how many lines of code contribution. Contributing to the codebase, contributing to the documents, participating in the discussion of the mailing list, helping to answer questions, etc., are all ways to increase your influence. List of potential contributions (in no particular order): Submit the bugs, features, and improvements you found to the issueUpdate the official documents so that the project documents are the most recent, the best practices for writing Submarine, and various useful documents for users to analyze the features.Perform test and report test results.Actively participate in voting when the version is releasedParticipate in the discussion on the mailing list, usually there will be mails starting with [DISCUSS]Answer questions from users or developers on the mailing listReview the work of others (both code and non-code) and publish your own suggestionsReview the issues on JIRA and maintain the latest status of the issues, such as closing outdated issues, changing the issue’s error information, etc.Guide new contributors and be familiar with the community processGive speeches and blogs about Submarine, and add these to the official website of SubmarineAny contribution that is beneficial to the development of the Submarine community ...... More can refer to: ASF official documents Not everyone can complete all (or even any) items on this list. If you want to contribute in other ways, then just do it (and add them to the list). Pleasant manners and dedication are all you need to have a positive impact on the Submarine project. Inviting you to become Committer is the result of your long-term and stable interaction with the community, and the trust and recognition of the Submarine community. Committer is obliged to review and merge PRs submitted by others, test and vote on candidate versions when the version is released, participate in the discussion of feature design plans, and other types of project contributions. When you are active enough and make a bigger contribution to the community, you can be promoted to a PMC member of the Submarine project.","keywords":""},{"title":"Guide for Apache Submarine Committers","type":0,"sectionRef":"#","url":"docs/community/HowToCommit","content":"","keywords":""},{"title":"New committers​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/community/HowToCommit#new-committers","content":"New committers are encouraged to first read Apache's generic committer documentation: Apache New Committer GuideApache Committer FAQ The first act of a new core committer is typically to add their name to the credits page. This requires changing the site source inhttps://github.com/apache/submarine-site/blob/master/community/member.md. Once done, update the Submarine website as describedhere(TLDR; don't forget to regenerate the site with hugo, and commit the generated results, too). "},{"title":"Review​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/community/HowToCommit#review","content":"Submarine committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they've not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected. The list of submitted patches can be found in the GitHubPull Requests page. Committers should scan the list from top-to-bottom, looking for patches that they feel qualified to review and possibly commit. For non-trivial changes, it is best to get another committer to review &amp; approve your own patches before commit. "},{"title":"Reject​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/community/HowToCommit#reject","content":"Patches should be rejected which do not adhere to the guidelines inContribution Guidelines. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review. "},{"title":"Commit individual patches​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/community/HowToCommit#commit-individual-patches","content":"Submarine uses git for source code version control. The writable repo is at -https://gitbox.apache.org/repos/asf/submarine.git It is strongly recommended to use the cicd script to merge the PRs. See the instructions athttps://github.com/apache/submarine/tree/master/dev-support/cicd "},{"title":"Adding Contributors role​","type":1,"pageTitle":"Guide for Apache Submarine Committers","url":"docs/community/HowToCommit#adding-contributors-role","content":"There are three roles (Administrators, Committers, Contributors) in the project. Contributors who have Contributors role can become assignee of the issues in the project.Committers who have Committers role can set arbitrary roles in addition to Contributors role.Committers who have Administrators role can edit or delete all comments, or even delete issues in addition to Committers role. How to set roles Login to ASF JIRAGo to the project page (e.g. https://issues.apache.org/jira/browse/SUBMARINE )Hit &quot;Administration&quot; tabHit &quot;Roles&quot; tab in left sideAdd Administrators/Committers/Contributors role "},{"title":"How to vote a Committer or PMC","type":0,"sectionRef":"#","url":"docs/community/HowToVoteCommitterOrPMC","content":"","keywords":""},{"title":"The voting process of becoming a Submarine Committer or PMC​","type":1,"pageTitle":"How to vote a Committer or PMC","url":"docs/community/HowToVoteCommitterOrPMC#the-voting-process-of-becoming-a-submarine-committer-or-pmc","content":"After the PMC members of Submarine discover any valuable contributions from the community contributors and obtain the consent of the candidate, they initiate a discussion on the private mailing list of Submarine: [DISCUSS] YYYYY as a Submarine XXXXXX In the email, the source of the candidate’s contributions should be clearly stated, so that everyone can discuss and analyze. The discussion email will last at least 72 hours, and the project team members, including the mentors, will fully express their views on the proposed email. Regardless of whether there is a disagreement, after the discussion email, the vote initiator needs to initiate a Committer or PMC vote on the private mailing list of Submarine; [VOTE] YYYYY as a Submarine XXXXXX The voting mail should last for at least 72 hours, and there should be at least 3 +1 votes to pass the vote. If there are 0 votes or one -1 vote, the entire vote will fail. If voting -1, you need to clarify the question so that everyone can understand. After the voting email is over, the vote initiator should summarize it on the voting line, remind the end of voting, and send it to the voting summary email. [RESULTS][vote] YYYYY as a Submarine XXXXXX After the vote summary email is sent, if the vote passed, the vote initiator must send an invitation email to the candidate, and the invitation email needs the candidate to reply to accept or decline through the designated mailbox. [Invitation] Invitation to join Apache Submarine as a XXXXXX The email should be sent to the candidate, and the copy is sent to private@submarine.apache.org After the candidate accepts the invitation, if the candidate does not have an apache email account, the vote initiator needs to assist the candidate to create an apache account according to the guidelines. If the above content is completed, the vote initiator still needs to do the following two things: 6.1 Apply to the project leader to add project team members, and open the authority accounts for the jira and apache projects. 6.2 Send a notification email to the dev@submarine.apache.org mail group: [ANNOUNCE] New XXXXXX: YYYYY So far, the entire process is completed, then the candidate officially becomes the Committer or PMC of Submarine. "},{"title":"Resources","type":0,"sectionRef":"#","url":"docs/community/Resources","content":"Resources This document contains some resources that may help you understand more about Submarine. Conferences Apache submarine: a unified machine learning platform made simple at EuroMLSys '22 ABSTRACT As machine learning is applied more widely, it is necessary to have a machine-learning platform for both infrastructure administrators and users including expert data scientists and citizen data scientists [24] to improve their productivity. However, existing machine-learning platforms are ill-equipped to address the &quot;Machine Learning tech debts&quot; [36] such as glue code, reproducibility, and portability. Furthermore, existing platforms only take expert data scientists into consideration, and thus they are inflexible for infrastructure administrators and non-user-friendly for citizen data scientists. We propose Submarine, a unified machine-learning platform, and takes all infrastructure administrators, expert data scientists, and citizen data scientists into consideration. Submarine has been widely used in many technology companies, including Ke.com and LinkedIn. We present two use cases in Section 5.","keywords":""},{"title":"Architecture and Requirment","type":0,"sectionRef":"#","url":"docs/designDocs/architecture-and-requirements","content":"","keywords":""},{"title":"Terminology​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#terminology","content":"Term\tDescriptionUser\tA single data-scientist/data-engineer. User has resource quota, credentials Team\tUser belongs to one or more teams, teams have ACLs for artifacts sharing such as notebook content, model, etc. Admin\tAlso called SRE, who manages user's quotas, credentials, team, and other components. "},{"title":"Background​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#background","content":"Everybody talks about machine learning today, and lots of companies are trying to leverage machine learning to push the business to the next level. Nowadays, as more and more developers, infrastructure software companies coming to this field, machine learning becomes more and more achievable. In the last decade, the software industry has built many open source tools for machine learning to solve the pain points: It was not easy to build machine learning algorithms manually, such as logistic regression, GBDT, and many other algorithms:Answer to that: Industries have open sourced many algorithm libraries, tools, and even pre-trained models so that data scientists can directly reuse these building blocks to hook up to their data without knowing intricate details inside these algorithms and models. It was not easy to achieve &quot;WYSIWYG, what you see is what you get&quot; from IDEs: not easy to get output, visualization, troubleshooting experiences at the same place.Answer to that: Notebooks concept was added to this picture, notebook brought the experiences of interactive coding, sharing, visualization, debugging under the same user interface. There're popular open-source notebooks like Apache Zeppelin/Jupyter. It was not easy to manage dependencies: ML applications can run on one machine is hard to deploy on another machine because it has lots of libraries dependencies.Answer to that: Containerization becomes popular and a standard to packaging dependencies to make it easier to &quot;build once, run anywhere&quot;. Fragmented tools, libraries were hard for ML engineers to learn. Experiences learned in one company are not naturally migratable to another company.Answer to that: A few dominant open-source frameworks reduced the overhead of learning too many different frameworks, concepts. Data-scientist can learn a few libraries such as Tensorflow/PyTorch, and a few high-level wrappers like Keras will be able to create your machine learning application from other open-source building blocks. Similarly, models built by one library (such as libsvm) were hard to be integrated into machine learning pipeline since there's no standard format.Answer to that: Industry has built successful open-source standard machine learning frameworks such as Tensorflow/PyTorch/Keras so their format can be easily shared across. And efforts to build an even more general model format such as ONNX. It was hard to build a data pipeline that flows/transform data from a raw data source to whatever required by ML applications.Answer to that: Open source big data industry plays an important role in providing, simplify, unify processes and building blocks for data flows, transformations, etc. The machine learning industry is moving on the right track to solve major roadblocks. So what are the pain points now for companies which have machine learning needs? What can we help here? To answer this question, let's look at machine learning workflow first. "},{"title":"Machine Learning Workflows & Pain points​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#machine-learning-workflows--pain-points","content":"1) From different data sources such as edge, clickstream, logs, etc. =&gt; Land to data lakes 2) From data lake, data transformation: =&gt; Data transformations: Cleanup, remove invalid rows/columns, select columns, sampling, split train/test data-set, join table, etc. =&gt; Data prepared for training. 3) From prepared data: =&gt; Training, model hyper-parameter tuning, cross-validation, etc. =&gt; Models saved to storage. 4) From saved models: =&gt; Model assurance, deployment, A/B testing, etc. =&gt; Model deployed for online serving or offline scoring.  Typically data scientists responsible for item 2)-4), 1) typically handled by a different team (called Data Engineering team in many companies, some Data Engineering team also responsible for part of data transformation) "},{"title":"Pain #1 Complex workflow/steps from raw data to model, different tools needed by different steps, hard to make changes to workflow, and not error-proof​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#pain-1-complex-workflowsteps-from-raw-data-to-model-different-tools-needed-by-different-steps-hard-to-make-changes-to-workflow-and-not-error-proof","content":"It is a complex workflow from raw data to usable models, after talking to many different data scientists, we have learned that a typical procedure to train a new model and push to production can take months to 1-2 years. It is also a wide skill set required by this workflow. For example, data transformation needs tools like Spark/Hive for large scale and tools like Pandas for a small scale. And model training needs to be switched between XGBoost, Tensorflow, Keras, PyTorch. Building a data pipeline requires Apache Airflow or Oozie. Yes, there are great, standardized open-source tools built for many of such purposes. But how about changes need to be made for a particular part of the data pipeline? How about adding a few columns to the training data for experiments? How about training models, and push models to validation, A/B testing before rolling to production? All these steps need jumping between different tools, UIs, and very hard to make changes, and it is not error-proof during these procedures. "},{"title":"Pain #2 Dependencies of underlying resource management platform​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#pain-2-dependencies-of-underlying-resource-management-platform","content":"To make jobs/services required by a machine learning platform to be able to run, we need an underlying resource management platform. There're some choices of resource management platform, and they have distinct advantages and disadvantages. For example, there're many machine learning platform built on top of K8s. It is relatively easy to get a K8s from a cloud vendor, easy to orchestrate machine learning required services/daemons run on K8s. However, K8s doesn't offer good support jobs like Spark/Flink/Hive. So if your company has Spark/Flink/Hive running on YARN, there're gaps and a significant amount of work to move required jobs from YARN to K8s. Maintaining a separate K8s cluster is also overhead to Hadoop-based data infrastructure. Similarly, if your company's data pipelines are mostly built on top of cloud resources and SaaS offerings, asking you to install a separate YARN cluster to run a new machine learning platform doesn't make a lot of sense. "},{"title":"Pain #3 Data scientist are forced to interact with lower-level platform components​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#pain-3-data-scientist-are-forced-to-interact-with-lower-level-platform-components","content":"In addition to the above pain, we do see Data Scientists are forced to learn underlying platform knowledge to be able to build a real-world machine learning workflow. For most of the data scientists we talked with, they're experts of ML algorithms/libraries, feature engineering, etc. They're also most familiar with Python, R, and some of them understand Spark, Hive, etc. If they're asked to do interactions with lower-level components like fine-tuning a Spark job's performance; or troubleshooting job failed to launch because of resource constraints; or write a K8s/YARN job spec and mount volumes, set networks properly. They will scratch their heads and typically cannot perform these operations efficiently. "},{"title":"Pain #4 Comply with data security/governance requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#pain-4-comply-with-data-securitygovernance-requirements","content":"TODO: Add more details. "},{"title":"Pain #5 No good way to reduce routine ML code development​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#pain-5-no-good-way-to-reduce-routine-ml-code-development","content":"After the data is prepared, the data scientist needs to do several routine tasks to build the ML pipeline. To get a sense of the existing the data set, it usually needs a split of the data set, the statistics of data set. These tasks have a common duplicate part of code, which reduces the efficiency of data scientists. An abstraction layer/framework to help the developer to boost ML pipeline development could be valuable. It's better than the developer only needs to fill callback function to focus on their key logic. Submarine "},{"title":"Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#overview","content":""},{"title":"A little bit history​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#a-little-bit-history","content":"Initially, Submarine is built to solve problems of running deep learning jobs like Tensorflow/PyTorch on Apache Hadoop YARN, allows admin to monitor launched deep learning jobs, and manage generated models. It was part of YARN initially, and code resides under hadoop-yarn-applications. Later, the community decided to convert it to be a subproject within Hadoop (Sibling project of YARN, HDFS, etc.) because we want to support other resource management platforms like K8s. And finally, we're reconsidering Submarine's charter, and the Hadoop community voted that it is the time to moved Submarine to a separate Apache TLP. "},{"title":"Why Submarine?​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#why-submarine","content":"ONE PLATFORM Submarine is the ONE PLATFORM to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM means it supports Data Scientists and data engineers to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training, and tuning, and push model to production. All these steps can be completed within the ONE PLATFORM. Resource Management Independent It is also designed to be resource management independent, no matter if you have Apache Hadoop YARN, K8s, or just a container service, you will be able to run Submarine on top it. "},{"title":"Requirements and non-requirements​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#requirements-and-non-requirements","content":""},{"title":"Notebook​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#notebook","content":"1) Users should be able to create, edit, delete a notebook. (P0) 2) Notebooks can be persisted to storage and can be recovered if failure happens. (P0) 3) Users can trace back to history versions of a notebook. (P1) 4) Notebooks can be shared with different users. (P1) 5) Users can define a list of parameters of a notebook (looks like parameters of the notebook's main function) to allow executing a notebook like a job. (P1) 6) Different users can collaborate on the same notebook at the same time. (P2) A running notebook instance is called notebook session (or session for short). "},{"title":"Experiment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#experiment","content":"Experiments of Submarine is an offline task. It could be a shell command, a Python command, a Spark job, a SQL query, or even a workflow. The primary purposes of experiments under Submarine's context is to do training tasks, offline scoring, etc. However, experiment can be generalized to do other tasks as well. Major requirement of experiment: 1) Experiments can be submitted from UI/CLI/SDK. 2) Experiments can be monitored/managed from UI/CLI/SDK. 3) Experiments should not bind to one resource management platform (K8s). Type of experiments​  There're two types of experiments:Adhoc experiments: which includes a Python/R/notebook, or even an adhoc Tensorflow/PyTorch task, etc. Predefined experiment library: This is specialized experiments, which including developed libraries such as CTR, BERT, etc. Users are only required to specify a few parameters such as input, output, hyper parameters, etc. Instead of worrying about where's training script/dependencies located. Adhoc experiment​ Requirements: Allow run adhoc scripts.Allow model engineer, data scientist to run Tensorflow/Pytorch programs on K8s/Container-cloud.Allow jobs easy access data/models in HDFS/s3, etc.Support run distributed Tensorflow/Pytorch jobs with simple configs.Support run user-specified Docker images.Support specify GPU and other resources. Predefined experiment library​ Here's an example of predefined experiment library to train deepfm model: { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  Predefined experiment libraries can be shared across users on the same platform, users can also add new or modified predefined experiment library via UI/REST API. We will also model AutoML, auto hyper-parameter tuning to predefined experiment library. Pipeline​ Pipeline is a special kind of experiment: A pipeline is a DAG of experiments.Can be also treated as a special kind of experiment.Users can submit/terminate a pipeline.Pipeline can be created/submitted via UI/API. "},{"title":"Environment Profiles​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#environment-profiles","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker or VM image (such as AMI: Amazon Machine Images) defines the base layer of the environment. On top of that, users can define a set of libraries (such as Python/R) to install. Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use. Environments can be added/listed/deleted/selected through CLI/SDK. "},{"title":"Model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#model","content":"Model management​ Model artifacts are generated by experiments or notebook.A model consists of artifacts from one or multiple files.Users can choose to save, tag, version a produced model.Once The Model is saved, Users can do the online model serving or offline scoring of the model. Model serving​ After model saved, users can specify a serving script, a model and create a web service to serve the model. We call the web service to &quot;endpoint&quot;. Users can manage (add/stop) model serving endpoints via CLI/API/UI. "},{"title":"Metrics for training job and model​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#metrics-for-training-job-and-model","content":"Submarine-SDK provides tracking/metrics APIs, which allows developers to add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Deployment​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#deployment","content":"Submarine Services (See architecture overview below) should be deployed easily on-prem / on-cloud. Since there're more and more public cloud offering for compute/storage management on cloud, we need to support deploy Submarine compute-related workloads (such as notebook session, experiments, etc.) to cloud-managed clusters. This also include Submarine may need to take input parameters from customers and create/manage clusters if needed. It is also a common requirement to use hybrid of on-prem/on-cloud clusters. "},{"title":"Security / Access Control / User Management / Quota Management​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#security--access-control--user-management--quota-management","content":"There're 4 kinds of objects need access-control: Assets belong to Submarine system, which includes notebook, experiments and results, models, predefined experiment libraries, environment profiles.Data security. (Who owns what data, and what data can be accessed by each users).User credentials. (Such as LDAP).Other security, such as Git repo access, etc. For the data security / user credentials / other security, it will be delegated to 3rd libraries such as Apache Ranger, IAM roles, etc. Assets belong to Submarine system will be handled by Submarine itself. Here're operations which Submarine admin can do for users / teams which can be used to access Submarine's assets. Operations for admins Admin uses &quot;User Management System&quot; to onboard new users, upload user credentials, assign resource quotas, etc.Admins can create new users, new teams, update user/team mappings. Or remove users/teams.Admin can set resource quotas (if different from system default), permissions, upload/update necessary credentials (like Kerberos keytab) of a user.A DE/DS can also be an admin if the DE/DS has admin access. (Like a privileged user). This will be useful when a cluster is exclusively shared by a user or only shared by a small team.Resource Quota Management System helps admin to manage resources quotas of teams, organizations. Resources can be machine resources like CPU/Memory/Disk, etc. It can also include non-machine resources like $$-based budgets. "},{"title":"Dataset​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#dataset","content":"There's also need to tag dataset which will be used for training and shared across the platform by different users. Like mentioned above, access to the actual data will be handled by 3rd party system like Apache Ranger / Hive Metastore which is out of the Submarine's scope. "},{"title":"Architecture Overview​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#architecture-overview","content":""},{"title":"Architecture Diagram​","type":1,"pageTitle":"Architecture and Requirment","url":"docs/designDocs/architecture-and-requirements#architecture-diagram","content":" +-----------------------------------------------------------------+ | Submarine UI / CLI / REST API / SDK | | Mini-Submarine | +-----------------------------------------------------------------+ +--------------------Submarine Server-----------------------------+ | +---------+ +---------+ +----------+ +----------+ +------------+| | |Data set | |Notebooks| |Experiment| |Models | |Servings || | +---------+ +---------+ +----------+ +----------+ +------------+| |-----------------------------------------------------------------| | | | +-----------------+ +-----------------+ +---------------------+ | | |Experiment | |Compute Resource | |Other Management | | | |Manager | | Manager | |Services | | | +-----------------+ +-----------------+ +---------------------+ | | Spark, template K8s/Docker | | TF, PyTorch, pipeline | | | + +-----------------+ + | |Submarine Meta | | | | Store | | | +-----------------+ | | | +-----------------------------------------------------------------+ (You can use http://stable.ascii-flow.appspot.com/#Draw to draw such diagrams)  Compute Resource Manager Helps to manage compute resources on-prem/on-cloud, this module can also handle cluster creation / management, etc. Experiment Manager Work with &quot;Compute Resource Manager&quot; to submit different kinds of workloads such as (distributed) Tensorflow / Pytorch, etc. Submarine SDK provides Java/Python/REST API to allow DS or other engineers to integrate into Submarine services. It also includes a mini-submarine component that launches Submarine components from a single Docker container (or a VM image). Details of Submarine Server design can be found at submarine-server-design. "},{"title":"Environments Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/environments-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#overview","content":"Environment profiles (or environment for short) defines a set of libraries and when Docker is being used, a Docker image in order to run an experiment or a notebook. Docker and/or VM-image (such as, VirtualBox/VMWare images, Amazon Machine Images - AMI, Or custom image of Azure VM) defines the base layer of the environment. Please note that VM-image is different from VM instance type, On top of that, users can define a set of libraries (such as Python/R) to install, we call it kernel. Example of Environment  +-------------------+ |+-----------------+| || Python=3.7 || || Tensorflow=2.0 || |+---Exp Dependency+| |+-----------------+| ||OS=Ubuntu16.04 || ||CUDA=10.2 || ||GPU_Driver=375.. || |+---Base Library--+| +-------------------+  As you can see, There're base libraries, such as what OS, CUDA version, GPU driver, etc. They can be achieved by specifying a VM-image / Docker image. On top of that, user can bring their dependencies, such as different version of Python, Tensorflow, Pandas, etc. How users use environment? Users can save different environment configs which can be also shared across the platform. Environment profiles can be used to run a notebook (e.g. by choosing different kernel from Jupyter), or an experiment. Predefined experiment library includes what environment to use so users don't have to choose which environment to use.  +-------------------+ |+-----------------+| +------------+ || Python=3.7 || |User1 | || Tensorflow=2.0 || +------------+ |+---Kernel -------+| +------------+ |+-----------------+|&lt;----+ |User2 | ||OS=Ubuntu16.04 || + +------------+ ||CUDA=10.2 || | +------------+ ||GPU_Driver=375.. || | |User3 | |+---Base Library--+| | +------------+ +-----Default-Env---+ | | | +-------------------+ | |+-----------------+| | || Python=3.3 || | || Tensorflow=2.0 || | |+---kernel--------+| | |+-----------------+| | ||OS=Ubuntu16.04 || | ||CUDA=10.3 ||&lt;----+ ||GPU_Driver=375.. || |+---Base Library--+| +-----My-Customized-+  There're two environments in the above graph, &quot;Default-Env&quot; and &quot;My-Customized&quot;, which can have different combinations of libraries for different experiments/notebooks. Users can choose different environments for different experiments as they want. Environments can be added/listed/deleted/selected through CLI/SDK/UI. Implementation "},{"title":"Environment API definition​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#environment-api-definition","content":"Let look at what object definition looks like to define an environment, API of environment looks like:  name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: &lt;object of kernel&gt; description: &quot;this is the most common env used by team ABC&quot;  vm-image is optional if we don't need to launch new VM (like running a training job in a cloud-remote machine). docker-image is requiredkernel could be optional if kernel is already included by vm-image or docker-image.name of the environment should be unique in the system, so user can reference it when create a new experiment/notebook. "},{"title":"VM-image and Docker-image​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#vm-image-and-docker-image","content":"Docker-image and VM image should be prepared by system admin / SREs, it is hard for Data-Scientists to write an error-proof Dockerfile, and push/manage Docker images. This is one of the reason we hide Docker-image inside &quot;environment&quot;, we will encourage users to customize their kernels if needed, but don't have to touch Dockerfile and build/push/manage new Docker images. As a project, we will document what's the best practice and example of Dockerfiles. Dockerfile should include proper ENTRYPOINT definition which pointed to our default script, so no matter it is notebook, or an experiment, we will setup kernel (see below) and other environment variables properly. "},{"title":"Kernel Implementation​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#kernel-implementation","content":"After investigating different alternatives (such as pipenv, venv, etc.), we decided to use Conda environment which nicely replaces Python virtual env, pip, and can also support other languages. More details can be found at: https://medium.com/@krishnaregmi/pipenv-vs-virtualenv-vs-conda-environment-3dde3f6869ed When once Conda, users can easily add, remove dependency of a Conda environment. User can also easily export environment to yaml file. The yaml file of Conda environment by using conda env export looks like: name: base channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0 - anaconda-project=0.8.4=py_0 - applaunchservices=0.2.1=py_0  Including Conda kernel, the environment object may look like: name: &quot;my_submarine_env&quot;, vm-image: &quot;...&quot;, docker-image: &quot;...&quot;, kernel: name: team_default_python_3.7 channels: - defaults dependencies: - _ipyw_jlab_nb_ext_conf=0.1.0=py37_0 - alabaster=0.7.12=py37_0 - anaconda=2020.02=py37_0 - anaconda-client=1.7.2=py37_0 - anaconda-navigator=1.9.12=py37_0  When launch a new experiment / notebook session using the my_submarine_env, submarine server will use defined Docker image, and Conda kernel to launch of container. "},{"title":"Storage of Environment​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#storage-of-environment","content":"Environment of Submarine is just a simple text file, so it will be persisted in Submarine metastore, which is ideally a Database. Docker image is stored inside a regular Docker registry, which will be handled outside of the system. Conda dependencies are stored in Conda channel (where referenced packages are stored), which will be handled/setuped separately. (Popular conda channels are default and conda-forge) For more detailed discussion about storage-related implementations, please refer to storage-implementation. "},{"title":"How to implement to make user can easily use Submarine environments?​","type":1,"pageTitle":"Environments Implementation","url":"docs/designDocs/environments-implementation#how-to-implement-to-make-user-can-easily-use-submarine-environments","content":"We like simplicities, and we don't want to leak complexities of implementations to the users. To make it happen, we have to do some works to hide complexities. There're two primary uses of environments: experiments and notebook, for both of them, users should not do works like explictily call conda active $env_name to active environments. To make it happen, what we can do is to include following parts in Dockerfile FROM ubuntu:18.04 &lt;Include whatever base-libraries like CUDA, etc.&gt; &lt;Make sure conda (with our preferred version) is installed&gt; &lt;Make sure Jupyter (with our preferred version) is installed&gt; # This is just a sample of Dockerfile, users can do more customizations if needed ENTRYPOINT [&quot;/submarine-bootstrap.sh&quot;]  When Submarine Server (this is implementation detail of Submarine Server, user will not see it at all) launch an experiment, or notebook, it will invoke following docker run command (or any other equvilant like using K8s spec): docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... python train.py --batch_size 5 (and other parameters)  Similarily, to launch a notebook: docker run &lt;submarine_docker_image&gt; --kernel &lt;kernel_name&gt; -- .... jupyter  The submarine-bootstrap.sh is part of Submarine repo, and will handle --kernel argument which will invoke conda active $kernel_name before anything else. (Like run the training job). "},{"title":"Implementation Notes","type":0,"sectionRef":"#","url":"docs/designDocs/implementation-notes","content":"Implementation Notes Before digging into details of implementations, you should read architecture-and-requirements first to understand overall requirements and architecture. Here're sub topics of Submarine implementations: Submarine Storage: How to store metadata, logs, metrics, etc. of Submarine.Submarine Environment: How environments created, managed, stored in Submarine.Submarine Experiment: How experiments managed, stored, and how the predefined experiment template works.Submarine Notebook: How experiments managed, stored, and how the predefined experiment template works.Submarine Server: How Submarine server is designed, architecture, implementation notes, etc. Working-in-progress designs, Below are designs which are working-in-progress, we will move them to the upper section once design &amp; review is finished: Submarine HA Design: How Submarine HA can be achieved, using RAFT, etc.Submarine services deployment module: How to deploy submarine services to k8s or cloud.","keywords":""},{"title":"Notebook Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/notebook-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#overview","content":""},{"title":"User's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#users-interaction","content":"Users can start N (N &gt;= 0) number of Notebook sessions, a notebook session is a running notebook instance. Notebook session can be launched by Submarine UI (P0), and Submarine CLI (P2). When launch notebook session, users can choose T-shirt size of notebook session (how much mem/cpu/gpu resources, or resource profile such as small, medium, large, etc.). (P0)And user can choose an environment for notebook. More details please refer to environment implementation (P0)When start a notebook, user can choose what code to be initialized, similar to experiment. (P1)Optionally, users can choose to attach a persistent volume to a notebook session. (P2) Users can get a list of notebook sessions belongs to themselves, and connect to notebook session. User can choose to terminate a running notebook session. "},{"title":"Admin's interaction​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#admins-interaction","content":"How many concurrent notebook sessions can be launched by each user is determined by resource quota limits of each user, and maximum concurrent notebook sessions can be launched by each user. (P2) "},{"title":"Relationship with other components​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#relationship-with-other-components","content":""},{"title":"Metadata store​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#metadata-store","content":"Running notebook sessions' metadata need persistented in Submarine's metadata store (Database). "},{"title":"Submarine Server​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#submarine-server","content":" +--------------+ +--------Submarine Server--------------------+ |Submarine UI | | +-------------------+ | | |+---&gt; Submarine | | | Notebook | | | Notebook REST API| | +--------------+ | | | | | +--------+----------+ +--------------+ | | | +-&gt;|Metastore | | | +--------v----------+ | |DB | | | | Submarine +--+ +--------------+ | | | Notebook Mgr | | | | | | | | | | | +--------+----------+ | | | | +----------|---------------------------------+ | +--------------+ +--------v---------+ | Notebook Session | | | | instance | | | +------------------+  Once user use Submarine UI to launch a notebook session, Submarine notebook manager inside Submarine Server will persistent notebook session's metadata, and launch a new notebook session instance. "},{"title":"Resource manager​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#resource-manager","content":"When using K8s as resource manager, Submarine notebook session will run as a new POD. "},{"title":"Storage​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#storage","content":"There're several different types of storage requirements for Submarine notebook. For code, environment, etc, storage, please refer to storage implementation, check &quot;Localization of experiment/notebook/model-serving code&quot;. When there're needs to attach volume (such as user's home folder) to Submarine notebook session, please check storage implementation, check &quot;Attachable volume&quot;. "},{"title":"Environment​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#environment","content":"Submarine notebook's environment should be used to run experiment, model serving, etc. Please check environment implementation. (More specific to notebook, please check &quot;How to implement to make user can easily use Submarine environments&quot;) Please note that notebook's Environment should include right version of notebook libraries, and admin should follow the guidance to build correct Docker image, Conda libraries to correctly run Notebook. "},{"title":"Submarine SDK (For Experiment, etc.)​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#submarine-sdk-for-experiment-etc","content":"Users can run new experiment, access metrics information, or do model operations using Submarine SDK. Submarine SDK is a Python library which can talk to Submarine Server which need Submarine Server's endpoint as well as user credentials. To ensure better experience, we recommend always install proper version of Submarine SDK from environment which users can use Submarine SDK directly from commandline. (We as Submarine community can provide sample Dockerfile or Conda environment which have correct base libraries installed for Submarine SDK). Submarine Server IP will be configured automatically by Submarine Server, and added as an envar when Submarine notebook session got launched. "},{"title":"Security​","type":1,"pageTitle":"Notebook Implementation","url":"docs/designDocs/notebook-implementation#security","content":"Please refer to Security Implementation Once user accessed to a running notebook session, the user can also access resources of the notebook, capability of submit new experiment, and access data. This is also very dangerous so we have to protect it. A simple solution is to use token-based authentication https://jupyter-notebook.readthedocs.io/en/stable/security.html. A more common way is to use solutions like KNOX to support SSO. We need expand this section to more details. (TODO). "},{"title":"Experiment Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/experiment-implementation","content":"","keywords":""},{"title":"Overview​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#overview","content":"This document talks about implementation of experiment, flows and design considerations. Experiment consists of following components, also interact with other Submarine or 3rd-party components, showing below:  +---------------------------------------+ +----------+ | Experiment Tasks | |Run | | | |Configs | | +----------------------------------+ | +----------+ | | Experiment Runnable Code | | +-----------------+ +----------+ | | | | |Output Artifacts | |Input Data| | | (Like train-job.py) | | |(Models, etc.) | | | | +----------------------------------+ | +-----------------+ | | | +----------------------------------+ | +----------+ | | Experiment Deps (Like Python) | | +-------------+ | +----------------------------------+ | |Logs/Metrics | | +----------------------------------+ | | | | | OS, Base Libaries (Like CUDA) | | +-------------+ | +----------------------------------+ | +---------------------------------------+ ^ | (Launch Task with resources) + +---------------------------------+ |Resource Manager (K8s/Cloud)| +---------------------------------+  As showing in the above diagram, Submarine experiment consists of the following items: On the left side, there're input data and run configs.In the middle box, they're experiment tasks, it could be multiple tasks when we run distributed training, pipeline, etc. There're main runnable code, such as train.py for the training main entry point.The two boxes below: experiment dependencies and OS/Base libraries we called Submarine Environment Profile or Environment for short. Which defined what is the basic libraries to run the main experiment code.Experiment tasks are launched by Resource Manager, such as K8s/Cloud or just launched locally. There're resources constraints for each experiment tasks. (e.g. how much memory, cores, GPU, disk etc. can be used by tasks). On the right side, they're artifacts generated by experiments: Output artifacts: Which are main output of the experiment, it could be model(s), or output data when we do batch prediction.Logs/Metrics for further troubleshooting or understanding of experiment's quality. For the rest of the design doc, we will talk about how we handle environment, code, and manage output/logs, etc. "},{"title":"API of Experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#api-of-experiment","content":"This is not a full definition of experiment, for more details, please reference to experiment API. Here's just an example of experiment object which help developer to understand what included in an experiment. experiment: name: &quot;abc&quot;, type: &quot;script&quot;, environment: &quot;team-default-ml-env&quot; code: sync_mode: s3 url: &quot;s3://bucket/training-job.tar.gz&quot; parameter: &gt; python training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; timeout: &quot;30 mins&quot;  This defined a &quot;script&quot; experiment, which has a name &quot;abc&quot;, the name can be used to track the experiment. There's environment &quot;team-default-ml-env&quot; defined to make sure dependencies of the job can be downloaded properly before executing the job. code defined where the experiment code will be downloaded, we will support a couple of sync_mode like s3 (or abfs/hdfs), git, etc. Different types of experiments will have different specs, for example distributed Tensorflow spec may look like: experiment: name: &quot;abc-distributed-tf&quot;, type: &quot;distributed-tf&quot;, ps: environment: &quot;team-default-ml-cpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=0&quot; worker: environment: &quot;team-default-ml-gpu&quot; resource_constraint: res=&quot;mem=20gb, vcore=3, gpu=2&quot; code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot; parameter: &gt; python /code/training-job/training.py --iteration 10 --input=s3://bucket/input output=s3://bucket/output tensorboard: enabled timeout: &quot;30 mins&quot;  Since we have different Docker image, one is using GPU and one is not using GPU, we can specify different environment and resource constraint. "},{"title":"Manage environments for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#manage-environments-for-experiment","content":"Please refer to environment-implementation.md for more details "},{"title":"Manage storages for experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#manage-storages-for-experiment","content":"There're different types of storage, such as logs, metrics, dependencies (environments). For more details. Please refer to storage-implementations for more details. This also includes how to manage code for experiment code. "},{"title":"Manage Pre-defined experiment libraries​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#manage-pre-defined-experiment-libraries","content":""},{"title":"Flow: Submit an experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#flow-submit-an-experiment","content":""},{"title":"Submit via SDK Flows.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#submit-via-sdk-flows","content":"To better understand experiment implementation, It will be good to understand what is the steps of experiment submission. Please note that below code is just pseudo code, not official APIs. "},{"title":"Specify what environment to use​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#specify-what-environment-to-use","content":"Before submit the environment, you have to choose what environment to choose. Environment defines dependencies, etc. of an experiment or a notebook. might looks like below: conda_environment = &quot;&quot;&quot; name: conda-env channels: - defaults dependencies: - asn1crypto=1.3.0=py37_0 - blas=1.0=mkl - ca-certificates=2020.1.1=0 - certifi=2020.4.5.1=py37_0 - cffi=1.14.0=py37hb5b8e2f_0 - chardet=3.0.4=py37_1003 prefix: /opt/anaconda3/envs/conda-env &quot;&quot;&quot; # This environment can be different from notebook's own environment environment = create_environment { DockerImage = &quot;ubuntu:16&quot;, CondaEnvironment = conda_environment }  To better understand how environment works, please refer to environment-implementation. "},{"title":"Create experiment, specify where's training code located, and parameters.​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#create-experiment-specify-wheres-training-code-located-and-parameters","content":"For ad-hoc experiment (code located at S3), assume training code is part of the training-job.tar.gz and main class is train.py. When the job is launched, whatever specified in the localize_artifacts will be downloaded. experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/training-job.tar.gz&quot; ], name = &quot;abc&quot;, parameter = &quot;python training.py --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run notebook file in offline mode​ It is possible we want to run a notebook file in offline mode, to do that, here's code to use to run a notebook code experiment = create_experiment { Environment = environment, ExperimentConfig = { type = &quot;adhoc&quot;, localize_artifacts = [ &quot;s3://bucket/folder/notebook-123.ipynb&quot; ], name = &quot;abc&quot;, parameter = &quot;runipy training.ipynb --iteration 10 --input=&quot;s3://bucket/input output=&quot;s3://bucket/output&quot;, } } experiment.run() experiment.wait_for_finish(print_output=True)  Run pre-defined experiment library​ experiment = create_experiment { # Here you can use default environment of library Environment = environment, ExperimentConfig = { type = &quot;template&quot;, name = &quot;abc&quot;, # A unique name of template template = &quot;deepfm_ctr&quot;, # yaml file defined what is the parameters need to be specified. parameter = { Input: &quot;S3://.../input&quot;, Output: &quot;S3://.../output&quot; Training: { &quot;batch_size&quot;: 512, &quot;l2_reg&quot;: 0.01, ... } } } } experiment.run() experiment.wait_for_finish(print_output=True)  "},{"title":"Summarize: Experiment v.s. Notebook session​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#summarize-experiment-vs-notebook-session","content":"There's a common misunderstanding about what is the differences between running experiment v.s. running task from a notebook session. We will talk about differences and commonalities: Differences \tExperiment\tNotebook SessionRun mode\tOffline\tInteractive Output Artifacts (a.k.a model)\tPersisted in a shared storage (like S3/NFS)\tLocal in the notebook session container, could be ephemeral Run history (meta, logs, metrics)\tMeta/logs/metrics can be traced from experiment UI (or corresponding API)\tNo run history can be traced from Submarine UI/API. Can view the current running paragraph's log/metrics, etc. What to run?\tCode from Docker image or shared storage (like Tarball on S3, Github, etc.)\tLocal in the notebook's paragraph Commonalities \tExperiment &amp; Notebook SessionEnvironment\tThey can share the same Environment configuration "},{"title":"Experiment-related modules inside Submarine-server​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#experiment-related-modules-inside-submarine-server","content":"(Please refer to architecture of submarine server for more details) "},{"title":"Experiment Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#experiment-manager","content":"The experiment manager receives the experiment requests, persisting the experiment metas in a database(e.g. MySQL), will invoke subsequence modules to submit and monitor the experiment's execution. "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#compute-cluster-manager","content":"After experiment accepted by experiment manager, based on which cluster the experiment intended to run (like mentioned in the previous sections, Submarine supports to manage multiple compute clusters), compute cluster manager will returns credentials to access the compute cluster. It will also be responsible to create a new compute cluster if needed. For most of the on-prem use cases, there's only one cluster involved, for such cases, ComputeClusterManager returns credentials to access local cluster if needed. "},{"title":"Experiment Submitter​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#experiment-submitter","content":"Experiment Submitter handles different kinds of experiments to run (e.g. ad-hoc script, distributed TF, MPI, pre-defined templates, Pipeline, AutoML, etc.). And such experiments can be managed by different resource management systems (e.g. K8s, container cloud, etc.) To meet the requirements to support variant kinds of experiments and resource managers, we choose to use plug-in modules to support different submitters (which requires jars to submarine-server’s classpath). To avoid jars and dependencies of plugins break the submarine-server, the plug-ins manager, or both. To solve this issue, we can instantiate submitter plug-ins using a classloader that is different from the system classloader. Submitter Plug-ins​ Each plug-in uses a separate module under the server-submitter module. As the default implements, we provide for K8s. The submitter-k8s plug-in is used to submit the job to Kubernetes cluster and use the operator as the runtime. The submitter-k8s plug-in implements the operation of CRD object and provides the java interface. In the beginning, we use the tf-operator for the TensorFlow. If Submarine want to support the other resource management system in the future, such as submarine-docker-cluster (submarine uses the Raft algorithm to create a docker cluster on the docker runtime environment on multiple servers, providing the most lightweight resource scheduling system for small-scale users). We should create a new plug-in module named submitter-docker under the server-submitter module. "},{"title":"Experiment Monitor​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#experiment-monitor","content":"The monitor tracks the experiment life cycle and records the main events and key info in runtime. As the experiment run progresses, the metrics are needed for evaluation of the ongoing success or failure of the execution progress. Due to adapt the different cluster resource management system, so we need a generic metric info structure and each submitter plug-in should inherit and complete it by itself. "},{"title":"Invoke flows of experiment-related components​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#invoke-flows-of-experiment-related-components","content":" +-----------------+ +----------------+ +----------------+ +-----------------+ |Experiments | |Compute Cluster | |Experiment | | Experiment | |Mgr | |Mgr | |Submitter | | Monitor | +-----------------+ +----------------+ +----------------+ +-----------------+ + + + + User | | | | Submit |+-------------------------------------&gt;+ + Xperiment| Use submitter.validate(spec) | | | to validate spec and create | | | experiment object (state- | | | machine). | | | | | | The experiment manager will | | | persist meta-data to Database| | | | | | | | + + |+-----------------&gt; + | | | Submit Experiments| | | | To ComputeCluster| | | | Mgr, get existing|+----------------&gt;| | | cluster, or | Use Submitter | | | create a new one.| to submit |+---------------&gt; | | | Different kinds | Once job is | | | of experiments | submitted, use |+----+ | | to k8s, etc| monitor to get | | | | | status updates | | | | | | | Monitor | | | | | Xperiment | | | | | status | | | | | |&lt;--------------------------------------------------------+| | | | | | | | Update Status back to Experiment | | | | Manager | |&lt;----+ | | | | | | | | | | | | v v v v  TODO: add more details about template, environment, etc. "},{"title":"Common modules of experiment/notebook-session/model-serving​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#common-modules-of-experimentnotebook-sessionmodel-serving","content":"Experiment/notebook-session/model-serving share a lot of commonalities, all of them are: Some workloads running on K8s.Need persist meta data to DB.Need monitor task/service running status from resource management system. We need to make their implementation are loose-coupled, but at the same time, share some building blocks as much as possible (e.g. submit PodSpecs to K8s, monitor status, get logs, etc.) to reduce duplications. "},{"title":"Support Predefined-experiment-templates​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#support-predefined-experiment-templates","content":"Predefined Experiment Template is just a way to save data-scientists time to repeatedly entering parameters which is not error-proof and user experience is also bad. "},{"title":"Predefined-experiment-template API to run experiment​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#predefined-experiment-template-api-to-run-experiment","content":"Predefined experiment template consists a list of parameters, each of the parameter has 4 properties: Key\tRequired\tDefault Value\tDescriptionName of the key\ttrue/false\tWhen required = false, a default value can be provided by the template\tDescription of the parameter For the example of deepfm CTR training experiment mentioned in the architecture-and-requirements.md { &quot;input&quot;: { &quot;train_data&quot;: [&quot;hdfs:///user/submarine/data/tr.libsvm&quot;], &quot;valid_data&quot;: [&quot;hdfs:///user/submarine/data/va.libsvm&quot;], &quot;test_data&quot;: [&quot;hdfs:///user/submarine/data/te.libsvm&quot;], &quot;type&quot;: &quot;libsvm&quot; }, &quot;output&quot;: { &quot;save_model_dir&quot;: &quot;hdfs:///user/submarine/deepfm&quot;, &quot;metric&quot;: &quot;auc&quot; }, &quot;training&quot;: { &quot;batch_size&quot; : 512, &quot;field_size&quot;: 39, &quot;num_epochs&quot;: 3, &quot;feature_size&quot;: 117581, ... } }  The template will be (in yaml format): # deepfm.ctr template name: deepfm.ctr author: description: &gt; This is a template to run CTR training using deepfm algorithm, by default it runs single node TF job, you can also overwrite training parameters to use distributed training. parameters: - name: input.train_data required: true description: &gt; train data is expected in SVM format, and can be stored in HDFS/S3 ... - name: training.batch_size required: false default: 32 description: This is batch size of training  The batch format can be used in UI/API. "},{"title":"Handle Predefined-experiment-template from server side​","type":1,"pageTitle":"Experiment Implementation","url":"docs/designDocs/experiment-implementation#handle-predefined-experiment-template-from-server-side","content":"Please note that, the conversion of predefined-experiment-template will be always handled by server. The invoke flow looks like:  +------------Submarine Server -----------------------+ +--------------+ | +-----------------+ | |Client |+-------&gt;|Experimment Mgr | | | | | | | | +--------------+ | +-----------------+ | | + | Submit | +-------v---------+ Get Experiment Template | Template | |Experiment |&lt;-----+From pre-registered | Parameters | |Template Registry| Templates | to Submarine | +-------+---------+ | Server | | | | +-------v---------+ +-----------------+ | | |Deepfm CTR Templ-| |Experiment- | | | |ate Handler +------&gt;|Tensorflow | | | +-----------------+ +--------+--------+ | | | | | | | | +--------v--------+ | | |Experiment | | | |Submitter | | | +--------+--------+ | | | | | | | | +--------v--------+ | | | | | | | ...... | | | +-----------------+ | | | +----------------------------------------------------+  Basically, from Client, it submitted template parameters to Submarine Server, inside submarine server, it finds the corresponding template handler based on the name. And the template handler converts input parameters to an actual experiment, such as a distributed TF experiment. After that, it goes the similar route to validate experiment spec, compute cluster manager, etc. to get the experiment submitted and monitored. Predefined-experiment-template is able to create any kind of experiment, it could be a pipeline:  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | | | | v +--------------------+ +------------------+ | +-----------------+| | Predefined | | | Split Train/ ||&lt;----+| Pipeline | | | Test data || +------------------+ | +-------+---------+| | | | | +-------v---------+| | | Spark Job ETL || | | || | +-------+---------+| | | | | +-------v---------+| | | Train using || | | XGBoost || | +-------+---------+| | | | | +-------v---------+| | | Validate Train || | | Results || | +-----------------+| | | +--------------------+  Template can be also chained to reuse other template handlers  +-----------------+ +------------------+ |Template XYZ | | XYZ Template | | |+---------------&gt; | Handler | +-----------------+ +------------------+ + | v +------------------+ +------------------+ |Distributed | | ABC Template | |TF Experiment |&lt;----+| Handler | +------------------+ +------------------+  Template Handler is a callable class inside Submarine Server with a standard interface defined like. interface ExperimentTemplateHandler { ExperimentSpec createExperiment(TemplatedExperimentParameters param) }  We should avoid users to do coding when they want to add new template, we should have several standard template handler to deal with most of the template handling. Experiment templates can be registered/updated/deleted via Submarine Server's REST API, which need to be discussed separately in the doc. (TODO) "},{"title":"Storage Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/storage-implementation","content":"","keywords":""},{"title":"ML-related objects and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#ml-related-objects-and-their-storages","content":"First let's look at what user will interact for most of the time: Notebook ExperimentModel Servings  +---------+ +------------+ |Logs |&lt;--+|Notebook | +----------+ +---------+ +------------+ +----------------+ |Trackings | &lt;-+|Experiment |&lt;--+&gt;|Model Artifacts | +----------+ +-----------------+ +------------+ +----------------+ +----------+&lt;---+|ML-related Metric|&lt;--+Servings | |tf.events | +-----------------+ +------------+ +----------+ ^ +-----------------+ + | Environments | +----------------------+ | | +-----------------+ | Submarine Metastore | | Dependencies | |Code | +----------------------+ | | +-----------------+ |Experiment Meta | | Docker Images | +----------------------+ +-----------------+ |Model Store Meta | +----------------------+ |Model Serving Meta | +----------------------+ |Notebook meta | +----------------------+ |Experiment Templates | +----------------------+ |Environments Meta | +----------------------+  First of all, all the notebook-sessions / experiments / model-serving instances) are more or less interact with following storage objects: Logs for these tasks for troubleshooting. ML-related metrics such as loss, epoch, etc. (in contrast of system metrics such as CPU/memory usage, etc.) There're different types of ML-related metrics, for Tensorflow/pytorch, they can use tf.events and get visualizations on tensorboard. Or they can use tracking APIs (such as Submarine tracking, mlflow tracking, etc.) to output customized tracking results for non TF/Pytorch workloads. Training jobs of experiment typically generate model artifacts (files) which need persisted, and both of notebook, model serving needs to load model artifacts from persistent storage. There're various of meta information, such as experiment meta, model registry, model serving, notebook, experiment, environment, etc. We need be able to read these meta information back.We also have code for experiment (like training/batch-prediction), notebook (ipynb), and model servings.And notebook/experiments/model-serving need depend on environments (dependencies such as pip, and Docker Images). "},{"title":"Implementation considerations for ML-related objects​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#implementation-considerations-for-ml-related-objects","content":"Object Type\tCharacteristics\tWhere to storeMetrics: tf.events\tTime series data with k/v, appendable to file\tLocal/EBS, HDFS, Cloud Blob Storage Metrics: other tracking metrics\tTime series data with k/v, appendable to file\tLocal, HDFS, Cloud Blob Storage, Database Logs\tLarge volumes, #files are potentially huge.\tLocal (temporary), HDFS (need aggregation), Cloud Blob Storage Submarine Metastore\tCRUD operations for small meta data.\tDatabase Model Artifacts\tSize varies for model (from KBs to GBs). #files are potentially huge.\tHDFS, Cloud Blob Storage Code\tNeed version control. (Please find detailed discussions below for code storage and localization)\tTarball on HDFS/Cloud Blog Storage, or Git Environment (Dependencies, Docker Image) Public/private environment repo (like Conda channel), Docker registry. "},{"title":"Detailed discussions​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#detailed-discussions","content":"Store code for experiment/notebook/model-serving​ There're following ways to get experiment code: 1) Code is part of Git repo: (Recommended) This is our recommended approach, once code is part of Git, it will be stored in version control, any change will be tracked, and much easier for users to trace back what change triggered a new bug, etc. 2) Code is part of Docker image: This is an anti-pattern and we will NOT recommend you to use it, Docker image can be used to include ANYTHING, like dependencies, the code you will execute, or even data. But this doesn't mean you should do it. We recommend to use Docker image ONLY for libraries/dependencies. Making code to be part of Docker image makes hard to edit code (if you want to update a value in your Python file, you will have to recreate the Docker image, push it and rerun it). 3) Code is part of S3/HDFS/ABFS: User may want to store their training code to a tarball on a shared storage. Submarine need to download code from remote storage to the launched container before running the code. Localization of experiment/notebook/model-serving code​ To make user experiences keeps same across different environment, we will localize code to a same folder after the container is launched, preferably /code For example, there's a git repo need to be synced up for an experiment/notebook/model-serving (example above): experiment: #Or notebook, model-serving name: &quot;abc&quot;, environment: &quot;team-default-ml-env&quot; ... (other fields) code: sync_mode: git url: &quot;https://foo.com/training-job.git&quot;  After localize, training-job/ will be placed under /code When we running on K8s environment, we can use K8s's initContainer and emptyDir to do these things for us. K8s POD spec (generated by Submarine server instead of user, user should NEVER edit K8s spec, that's too unfriendly to data-scientists): apiVersion: v1 kind: Pod metadata: name: experiment-abc spec: containers: - name: experiment-task image: training-job volumeMounts: - name: code-dir mountPath: /code initContainers: - name: git-localize image: git-sync command: &quot;git clone .. /code/&quot; volumeMounts: - name: code-dir mountPath: /code volumes: - name: code-dir emptyDir: {}  The above K8s spec create a code-dir and mount it to /code to launched containers. The initContainer git-localize uses https://github.com/kubernetes/git-sync to do the sync up. (If other storages are used such as s3, we can use similar initContainer approach to download contents) "},{"title":"System-related metrics/logs and their storages​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#system-related-metricslogs-and-their-storages","content":"Other than ML-related objects, we have system-related objects, including: Daemon logs (like logs of Submarine server). Logs for other dependency components (like Kubernetes logs when running on K8s). System metrics (Physical resource usages by daemons, launched training containers, etc.).  All these information should be handled by 3rd party system, such as Grafana, Prometheus, etc. And system admins are responsible to setup these infrastructures, dashboard. Users of submarine should NOT interact with system related metrics/logs. It is system admin's responsibility. "},{"title":"Attachable Volumes​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#attachable-volumes","content":"It is possible user has needs to have an attachable volume for their experiment / notebook, this is especially useful for notebook storage, since contents of notebook can be automatically saved, and it can be used as user's home folder. Downside of attachable volume is, it is not versioned, even notebook is mainly used for adhoc exploring tasks, an unversioned notebook file can lead to maintenance issues in the future. Since this is a common requirement, we can consider to support attachable volumes in Submarine in a long run, but with relatively lower priority. "},{"title":"In-scope / Out-of-scope​","type":1,"pageTitle":"Storage Implementation","url":"docs/designDocs/storage-implementation#in-scope--out-of-scope","content":"Describe what Submarine project should own and what Submarine project should NOT own. "},{"title":"Generic Experiment Spec","type":0,"sectionRef":"#","url":"docs/designDocs/submarine-server/experimentSpec","content":"","keywords":""},{"title":"Motivation​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#motivation","content":"As the machine learning platform, the submarine should support multiple machine learning frameworks, such as Tensorflow, Pytorch etc. But different framework has different distributed components for the training experiment. So that we designed a generic experiment spec to abstract the training experiment across different frameworks. In this way, the submarine-server can hide the complexity of underlying infrastructure differences and provide a cleaner interface to manager experiments "},{"title":"Proposal​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#proposal","content":"Considering the Tensorflow and Pytorch framework, we propose one spec which consists of library spec, submitter spec and task specs etc. Such as: name: &quot;mnist&quot; librarySpec: name: &quot;TensorFlow&quot; version: &quot;2.1.0&quot; image: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; cmd: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot; envVars: ENV_1: &quot;ENV1&quot; submitterSpec: type: &quot;k8s&quot; namespace: &quot;submarine&quot; taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  "},{"title":"Library Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#library-spec","content":"The library spec describes the info about machine learning framework. All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tNO\tMachine Learning Framework name. Only &quot;tensorflow&quot; and &quot;pytorch&quot; is supported. It doesn't matter if the value is uppercase or lowercase. version\tstring\tNO\tThe version of ML framework. Such as: 2.1.0 image\tstring\tNO\tThe public image used for each task if not specified. Such as: apache/submarine cmd\tstring\tYES\tThe public entry cmd for the task if not specified. envVars\tkey/value\tYES\tThe public env vars for the task if not specified. "},{"title":"Submitter Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#submitter-spec","content":"It describes the info of submitter which the user specified, such as k8s. All the fields as below: field\ttype\toptional\tdescriptiontype\tstring\tNO\tThe submitter type, supports k8s now configPath\tstring\tYES\tThe config path of the specified resource manager. You can set it in submarine-site.xml if run submarine-server locally namespace\tstring\tNO\tIt's known as namespace in Kubernetes. kind\tstring\tYES\tIt's used for k8s submitter, supports TFJob and PyTorchJob apiVersion\tstring\tYES\tIt should pair with the kind, such as the TFJob's api version is kubeflow.org/v1 "},{"title":"Task Spec​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#task-spec","content":"It describes the task info, the tasks make up the experiment. So it must be specified when submit the experiment. All the tasks should putted into the key value collection. Such as: taskSpecs: Ps: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot; Worker: name: tensorflow replicas: 2 resources: &quot;cpu=4,memory=2048M,nvidia.com/gpu=1&quot;  All the fields as below: field\ttype\toptional\tdescriptionname\tstring\tYES\tThe experiment name, if not specify using the library name image\tstring\tYES\tThe experiment docker image cmd\tstring\tYES\tThe entry command for running task envVars\tkey/value\tYES\tThe environment variables for the task resources\tstring\tNO\tThe limit resource for the task. Formatter: cpu=%s,memory=%s,nvidia.com/gpu=%s "},{"title":"Implements​","type":1,"pageTitle":"Generic Experiment Spec","url":"docs/designDocs/submarine-server/experimentSpec#implements","content":"For more info see SUBMARINE-321 "},{"title":"Security Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/wip-designs/security-implementation","content":"","keywords":""},{"title":"Handle User's Credential​","type":1,"pageTitle":"Security Implementation","url":"docs/designDocs/wip-designs/security-implementation#handle-users-credential","content":"Users credential includes Kerberoes Keytabs, Docker registry credentials, Github ssh-keys, etc. User's credential must be stored securitely, for example, via KeyCloak or K8s Secrets. (More details TODO) "},{"title":"Submarine Server Implementation","type":0,"sectionRef":"#","url":"docs/designDocs/submarine-server/architecture","content":"","keywords":""},{"title":"Architecture Overview​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#architecture-overview","content":" +---------------Submarine Server ---+ | | | +------------+ +------------+ | | |Web Svc/Prxy| |Backend Svc | | +--Submarine Asset + | +------------+ +------------+ | |Project/Notebook | | ^ ^ | |Model/Metrics | +---|---------|---------------------+ |Libraries/Dataset | | | +------------------+ | | | +--|-Compute Cluster 1---+ +--Image Registry--+ + | | | | User's Images | User / | + | | | Admin | User Notebook Instance | +------------------+ | Experiment Runs | +------------------------+ +-Data Storage-----+ | S3/HDFS, etc. | +----Compute Cluster 2---+ | | +------------------+ ...  Here's a diagram to illustrate the Submarine's deployment. Submarine Server consists of web service/proxy, and backend services. They're like &quot;control planes&quot; of Submarine, and users will interact with these services.Submarine server could be a microservice architecture and can be deployed to one of the compute clusters. (see below, this will be useful when we only have one cluster).There're multiple compute clusters that could be used by Submarine service. For user's running notebook instance, jobs, etc. they will be placed to one of the compute clusters by user's preference or defined policies.Submarine's asset includes project/notebook(content)/models/metrics/dataset-meta, etc. can be stored inside Submarine's own database.Datasets can be stored in various locations such as S3/HDFS.Users can push container (such as Docker) images to a preconfigured registry in Submarine, so Submarine service can know how to pull required container images.Image Registry/Data-Storage, etc. are outside of Submarine server's scope and should be managed by 3rd party applications. "},{"title":"Submarine Server and its APIs​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#submarine-server-and-its-apis","content":"Submarine server is designed to allow data scientists to access notebooks, submit/manage jobs, manage models, create model training workflows, access datasets, etc. Submarine Server exposed UI and REST API. Users can also use CLI / SDK to manage assets inside Submarine Server.  +----------+ | CLI |+---+ +----------+ v +----------------+ +--------------+ | Submarine | +----------+ | REST API | | | | SDK |+&gt;| |+&gt; Server | +----------+ +--------------+ | | ^ +----------------+ +----------+ | | UI |+---+ +----------+  REST API will be used by the other 3 approaches. (CLI/SDK/UI) The REST API Service handles HTTP requests and is responsible for authentication. It acts as the caller for the JobManager component. The REST component defines the generic job spec which describes the detailed info about job. For more details, refer to here. (Please note that we're converting REST endpoint description from Java-based REST API to swagger definition, once that is done, we should replace the link with swagger definition spec). "},{"title":"Proposal​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#proposal","content":" +-----------+ | | | workbench +---+ +----------------------------------+ | | | | +------+ +---------------------+ | +-----------+ | | | | | +-------+ | | +---------------------+ | | | | | | K8s | | | | +--------+ +----+ | +-----------+ | | | | | +-------+ | | | | +--&gt;+job1| | | | | | | | | submitter | | | | | +----+ | | CLI +------&gt;+ | REST | +---------------------+ +----&gt;+ |operator| +----+ | | | | | | | +---------------------+ | | | +--&gt;+job2| | +-----------+ | | | | | +-------+ +-------+ | | | +--------+ +----+ | | | | | | |PlugMgr| |monitor| | | | K8s Cluster | +-----------+ | | | | | +-------+ +-------+ | | +---------------------+ | | | | | | | JobManager | | | SDK +---+ | +------+ +---------------------+ | | | +----------------------------------+ +-----------+ client server  We propose to split the original core module in the old layout into two modules, CLI and server as shown in FIG. The submarine-client calls the REST APIs to submit and retrieve the job info. The submarine-server provides the REST service, job management, submitting the job to cluster, and running job in different clusters through the corresponding runtime. "},{"title":"Submarine Server Components​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#submarine-server-components","content":" +----------------------Submarine Server--------------------------------+ | +-----------------+ +------------------+ +--------------------+ | | | Experiment | |Notebook Session | |Environment Mgr | | | | Mgr | |Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | Model Registry | |Model Serving Mgr | |Compute Cluster Mgr | | | | | | | | | | | +-----------------+ +------------------+ +--------------------+ | | | | +-----------------+ +------------------+ +--------------------+ | | | DataSet Mgr | |User/Team | |Metadata Mgr | | | | | |Permission Mgr | | | | | +-----------------+ +------------------+ +--------------------+ | +----------------------------------------------------------------------+  "},{"title":"Experiment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#experiment-manager","content":"TODO "},{"title":"Notebook Sessions Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#notebook-sessions-manager","content":"TODO "},{"title":"Environment Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#environment-manager","content":"TODO "},{"title":"Model Registry​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#model-registry","content":"TODO "},{"title":"Model Serving Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#model-serving-manager","content":"TODO "},{"title":"Compute Cluster Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#compute-cluster-manager","content":"TODO "},{"title":"Dataset Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#dataset-manager","content":"TODO "},{"title":"User/team permissions manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#userteam-permissions-manager","content":"TODO "},{"title":"Metadata Manager​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#metadata-manager","content":"TODO "},{"title":"Components/services outside of Submarine Server's scope​","type":1,"pageTitle":"Submarine Server Implementation","url":"docs/designDocs/submarine-server/architecture#componentsservices-outside-of-submarine-servers-scope","content":"TODO: Describe what are the out-of-scope components, which should be handled and managed outside of Submarine server. Candidates are: Identity management, data storage, metastore storage, etc. "},{"title":"Submarine Launcher","type":0,"sectionRef":"#","url":"docs/designDocs/wip-designs/submarine-launcher","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#introduction","content":"Submarine is built and run in Cloud Native, taking advantage of the cloud computing model. To give full play to the advantages of cloud computing. These applications are characterized by rapid and frequent build, release, and deployment. Combined with the features of cloud computing, they are decoupled from the underlying hardware and operating system, and can easily meet the requirements of scalability, availability, and portability. And provide better economy. In the enterprise data center, submarine can support k8s/docker three resource scheduling systems; in the public cloud environment, submarine can support these cloud services in GCE/AWS/Azure; "},{"title":"Requirement​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#requirement","content":""},{"title":"Cloud-Native Service​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#cloud-native-service","content":"The submarine server is a long-running services in the daemon mode. The submarine server is mainly used by algorithm engineers to provide online front-end functions such as algorithm development, algorithm debugging, data processing, and workflow scheduling. And submarine server also mainly used for back-end functions such as scheduling and execution of jobs, tracking of job status, and so on. Through the ability of rolling upgrades, we can better provide system stability. For example, we can upgrade or restart the workbench server without affecting the normal operation of submitted jobs. You can also make full use of system resources. For example, when the number of current developers or job tasks increases, The number of submarine server instances can be adjusted dynamically. In addition, submarine will provide each user with a completely independent workspace container. This workspace container has already deployed the development tools and library files commonly used by algorithm engineers including their operating environment. Algorithm engineers can work in our prepared workspaces without any extra work. Each user's workspace can also be run through a cloud service. "},{"title":"Service discovery​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#service-discovery","content":"With the cluster function of submarine, each service only needs to run in the container, and it will automatically register the service in the submarine cluster center. Submarine cluster management will automatically maintain the relationship between service and service, service and user. "},{"title":"Design​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#design","content":" "},{"title":"Launcher​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher","content":"The submarine launcher module defines the complete interface. By using this interface, you can run the submarine server, and workspace in k8s / docker / AWS / GCE / Azure. "},{"title":"Launcher On Docker​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher-on-docker","content":"In order to allow some small and medium-sized users without k8s to use submarine, we support running the submarine system in docker mode. Users only need to provide several servers with docker runtime environment. The submarine system can automatically cluster these servers into clusters, manage all the hardware resources of the cluster, and run the service or workspace container in this cluster through scheduling algorithms. "},{"title":"Launcher On Kubernetes​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher-on-kubernetes","content":"submarine operator "},{"title":"Launcher On AWS​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher-on-aws","content":"[TODO] "},{"title":"Launcher On GCP​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher-on-gcp","content":"[TODO] "},{"title":"Launcher On Azure​","type":1,"pageTitle":"Submarine Launcher","url":"docs/designDocs/wip-designs/submarine-launcher#launcher-on-azure","content":"[TODO] "},{"title":"Cluster Server Design - High-Availability","type":0,"sectionRef":"#","url":"docs/designDocs/wip-designs/submarine-clusterServer","content":"","keywords":""},{"title":"Below is existing proposal:​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#below-is-existing-proposal","content":""},{"title":"Introduction​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#introduction","content":"The Submarine system contains a total of two daemon services, Submarine Server and Workbench Server. Submarine Server mainly provides job submission, job scheduling, job status monitoring, and model online service for Submarine. Workbench Server is mainly for algorithm users to provide algorithm development, Python/Spark interpreter operation, and other services through Notebook. The goal of the Submarine project is to provide high availability and high-reliability services for big data processing, algorithm development, job scheduling, model online services, model batch, and incremental updates. In addition to the high availability of big data and machine learning frameworks, the high availability of Submarine Server and Workbench Server itself is a key consideration. "},{"title":"Requirement​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#requirement","content":""},{"title":"Cluster Metadata Center​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#cluster-metadata-center","content":"Multiple Submarine (or Workbench) Server processes create a Submarine Cluster through the RAFT algorithm library. The cluster internally maintains a metadata center. All servers can operate the metadata. The RAFT algorithm ensures that multiple processes are simultaneously co-located. A data modification will not cause problems such as mutual coverage and dirty data. This metadata center stores data by means of key-value pairs. it can store/support a variety of data, but it should be noted that metadata is only suitable for storing small amounts of data and cannot be used to replace data storage. "},{"title":"Service discovery​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#service-discovery","content":"By storing the information of the service or process in the metadata center, we can easily find the information of the service or process we need in any place, for example, the IP address and port where the Python interpreter will be the process. Information is stored in metadata, and other services can easily find process information through process IDs and connect to provide service discovery capabilities. "},{"title":"Cluster event​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#cluster-event","content":"In the entire Submarine cluster, the servers can communicate with each other and other child processes to send cluster events to each other. The service or process processes the corresponding programs according to the cluster events. For example, the Workbench Server can be managed to Python. The interpreter process sends a shutdown event that controls the operation of the services and individual subprocesses throughout the cluster. Cluster events support both broadcast and separate delivery capabilities. "},{"title":"Independence​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#independence","content":"We implement Submarine's clustering capabilities through the RAFT algorithm library, without relying on any external services (e.g. Zookeeper, Etcd, etc.) "},{"title":"Disadvantages​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#disadvantages","content":"Because the RAFT algorithm requires more than half of the servers available to ensure the normality of the RAFT algorithm, if we need to turn on the clustering capabilities of Submarine (Workbench) Server, when more than half of the servers are unavailable, some programs may appear abnormal. Of course, we also detected this in the system, downgrading the system or refusing to provide service status. "},{"title":"System design​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#system-design","content":""},{"title":"Universal design​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#universal-design","content":"Modular design, Submarine (Workbench) Server exists in the Submarine system, these two services need to provide clustering capabilities, so we abstract the cluster function into a separate module for development so that Submarine (Workbench) Server can reuse the cluster function module. "},{"title":"ClusterConfigure​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#clusterconfigure","content":"Add a submarine.server.addr and workbench.server.addr configuration items in submarine-site.xml, submarine.server.addr=ip1, ip2, ip3, through the IP list, the RAFT algorithm module in the server process can Cluster with other server processes. "},{"title":"ClusterServer​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#clusterserver","content":"The ClusterServer module encapsulates the RAFT algorithm module, which can create a service cluster and read and write metadata based on the two configuration items submarine.server.addr or workbench.server.addr. The cluster management service runs in each submarine server; The cluster management service establishes a cluster by using the atomix RaftServer class of the Raft algorithm library, maintains the ClusterStateMachine, and manages the service state metadata of each submarine server through the PutCommand, GetQuery, and DeleteCommand operation commands. "},{"title":"ClusterClient​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#clusterclient","content":"The ClusterClient module encapsulates the RAFT algorithm client module, which can communicate with the cluster according to the two configuration items submarine.server.addr or workbench.server.addr, read and write metadata, and write the IP and port information of the client process. Into the cluster's metadata center. The cluster management client runs in each submarine server and submarine Interpreter process; The cluster management client manages the submarine server and submarine Interpreter process state (metadata information) in the ClusterStateMachine by using the atomix RaftClient class of the Raft library to connect to the atomix RaftServer. When the submarine server and Submarine Interpreter processes are started, they are added to the ClusterStateMachine and are removed from the ClusterStateMachine when the Submarine Server and Submarine Interpreter processes are closed. "},{"title":"ClusterMetadata​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#clustermetadata","content":"Metadata stores metadata information in a KV key-value pair。 ServerMeta：key='host:port'，value= {SERVER_HOST=...，SERVER_PORT=...，...} Name\tDescriptionSUBMARINE_SERVER_HOST\tSubmarine server IP SUBMARINE_SERVER_PORT\tSubmarine server port WORKBENCH_SERVER_HOST\tSubmarine workbench server IP WORKBENCH_SERVER_PORT\tSubmarine workbench server port InterpreterMeta：key=InterpreterGroupId，value={INTP_TSERVER_HOST=...，...} Name\tDescriptionINTP_TSERVER_HOST\tSubmarine Interpreter Thrift IP INTP_TSERVER_PORT\tSubmarine Interpreter Thrift port INTP_START_TIME\tSubmarine Interpreter start time HEARTBEAT\tSubmarine Interpreter heartbeat time "},{"title":"Network fault tolerance​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#network-fault-tolerance","content":"In a distributed environment, there may be network anomalies, network delays, or service exceptions. After submitting metadata to the cluster, check whether the submission is successful. After the submission fails, save the metadata in the local message queue. A separate commit thread to retry; "},{"title":"Cluster monitoring​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#cluster-monitoring","content":"The cluster needs to monitor whether the Submarine Server and Submarine-Interpreter processes are working properly. The Submarine Server and Submarine Interpreter processes periodically send heartbeats to update their own timestamps in the cluster metadata. The Submarine Server with Leader identity periodically checks the timestamps of the Submarine Server and Submarine Interpreter processes to clear the timeout services and processes. The cluster monitoring module runs in each Submarine Server and Submarine Interpreter process, periodically sending heartbeat data of the service or process to the cluster; When the cluster monitoring module runs in Submarine Server, it sends the heartbeat to the cluster's ClusterStateMachine. If the cluster does not receive heartbeat information for a long time, Indicates that the service or process is abnormal and unavailable. Resource usage statistics strategy, in order to avoid the instantaneous high peak and low peak of the server, the cluster monitoring will collect the average resource usage in the most recent period for reporting, and improve the reasonable line and effectiveness of the server resources as much as possible; When the cluster monitoring module runs in the Submarine Server, it checks the heartbeat data of each Submarine Server and Submarine Interpreter process. If it times out, it considers that the service or process is abnormally unavailable and removes it from the cluster. "},{"title":"Atomix Raft algorithm library​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#atomix-raft-algorithm-library","content":"In order to reduce the deployment complexity of distributed mode, submarine server does not use Zookeeper to build a distributed cluster. Multiple submarine server groups are built into distributed clusters by using the Raft algorithm in submarine server. The Raft algorithm is involved by atomix lib of atomix that has passed Jepsen consistency verification. "},{"title":"Synchronize workbench notes​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#synchronize-workbench-notes","content":"In cluster mode, the user creates, modifies, and deletes the note on any of the servers. All need to be notified to all the servers in the cluster to synchronize the update of Notebook. Failure to do so will result in the user not being able to continue while switching to another server. "},{"title":"Listen for note update events​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#listen-for-note-update-events","content":"Listen for the NEW_NOTE, DEL_NOTE, REMOVE_NOTE_TO_TRASH ... event of the notebook in the NotebookServer#onMessage() function. "},{"title":"Broadcast note update event​","type":1,"pageTitle":"Cluster Server Design - High-Availability","url":"docs/designDocs/wip-designs/submarine-clusterServer#broadcast-note-update-event","content":"The note is refreshed by notifying the event to all Submarine servers in the cluster via messaging Service. "},{"title":"Project Architecture","type":0,"sectionRef":"#","url":"docs/devDocs/","content":"","keywords":""},{"title":"1. Introduction​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#1-introduction","content":"This document mainly describes the structure of each module of the Submarine project, the development and test description of each module. "},{"title":"2. Submarine Project Structure​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#2-submarine-project-structure","content":""},{"title":"2.1. submarine-client​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#21-submarine-client","content":"Provide the CLI interface for submarine user. (Currently only support YARN service (deprecated)) "},{"title":"2.2. submarine-cloud-v2​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#22-submarine-cloud-v2","content":"The operator for Submarine application. For details, please see the README on github. "},{"title":"2.3. submarine-commons​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#23-submarine-commons","content":"Define utility function used in multiple packages, mainly related to hadoop. "},{"title":"2.4. submarine-dist​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#24-submarine-dist","content":"Store the pre-release files. "},{"title":"2.5. submarine-sdk​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#25-submarine-sdk","content":"Provide Python SDK for submarine user. "},{"title":"2.6. submarine-server​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#26-submarine-server","content":"Include core server, restful api, and k8s submitter. "},{"title":"2.7. submarine-test​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#27-submarine-test","content":"Provide end-to-end and k8s test for submarine. "},{"title":"2.8. submarine-workbench​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#28-submarine-workbench","content":"workbench-server: is a Jetty-based web server service. Workbench-server provides RESTful interface and Websocket interface. The RESTful interface provides workbench-web with management capabilities for databases such as project, department, user, and role.workbench-web: is a web front-end service based on Angular.js framework. With workbench-web users can manage Submarine project, department, user, role through browser. You can also use the notebook to develop machine learning algorithms, model release and other lifecycle management. "},{"title":"2.9 dev-support​","type":1,"pageTitle":"Project Architecture","url":"docs/devDocs/#29-dev-support","content":"mini-submarine: by using the docker image provided by Submarine, you can experience all the functions of Submarine in a single docker environment, while mini-submarine also provides developers with a development and testing environment, Avoid the hassle of installing and deploying the runtime environment.submarine-installer: submarine-installer is our submarine runtime environment installation tool for yarn-3.1+ and above.By using submarine-installer, it is easy to install and deploy system services such asdocker, nvidia-docker, nvidia driver, ETCD, Calico network etc. required by yarn-3.1+. "},{"title":"How to Build Submarine","type":0,"sectionRef":"#","url":"docs/devDocs/BuildFromCode","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"How to Build Submarine","url":"docs/devDocs/BuildFromCode#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Quick Start​","type":1,"pageTitle":"How to Build Submarine","url":"docs/devDocs/BuildFromCode#quick-start","content":""},{"title":"Build Your Custom Submarine Docker Images​","type":1,"pageTitle":"How to Build Submarine","url":"docs/devDocs/BuildFromCode#build-your-custom-submarine-docker-images","content":"Submarine provides default Docker image in the release artifacts, sometimes you would like to do some modifications on the images. You can rebuild Docker image after you make changes. Note that you need to make sure the images built above can be accessed in k8s Usually this needs to rename and push to a proper Docker registry. mvn clean package -DskipTests  Build submarine server image: ./dev-support/docker-images/submarine/build.sh  Build submarine database image: ./dev-support/docker-images/database/build.sh  "},{"title":"Checking releases for licenses​","type":1,"pageTitle":"How to Build Submarine","url":"docs/devDocs/BuildFromCode#checking-releases-for-licenses","content":"mvn clean org.apache.rat:apache-rat-plugin:check  "},{"title":"Building source code / binary distribution with Maven Wrapper​","type":1,"pageTitle":"How to Build Submarine","url":"docs/devDocs/BuildFromCode#building-source-code--binary-distribution-with-maven-wrapper","content":"Maven Wrapper (Optional): Maven Wrapper can help you avoid dependencies problem about Maven version. # Setup Maven Wrapper (Maven 3.6.1) mvn -N io.takari:maven:0.7.7:wrapper -Dmaven=3.6.1 # Check Maven Wrapper ./mvnw -version # Replace 'mvn' with 'mvnw'. Example: ./mvnw clean package -DskipTests  "},{"title":"Dependencies for Submarine","type":0,"sectionRef":"#","url":"docs/devDocs/Dependencies","content":"","keywords":""},{"title":"Kubernetes​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#kubernetes","content":"Kubernetes Version\tSupport?1.14.x (or earlier)\tX 1.15.x - 1.21.x\t√ 1.22.x (or later)\tX "},{"title":"KinD​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#kind","content":"KinD Version\tSupport?0.5.x (or earlier)\tX 0.6.x\t√ 0.7.x\t√ 0.8.x\t√ 0.9.x\t√ 0.10.x\t√ 0.11.x\t√ "},{"title":"Java​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#java","content":"JDK Version\tSupport?8\t√ 11\tX 17\tX "},{"title":"Maven​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#maven","content":"3.3 or later ( &lt; 3.8.1 ) "},{"title":"Docker​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#docker","content":"Latest "},{"title":"Helm​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#helm","content":"Version 3 "},{"title":"NodeJS​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#nodejs","content":"14 (or later) "},{"title":"Go​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#go","content":"Go Version\tSupport?1.15\tX 1.16\t√ 1.17 (or later)\tTo be verified "},{"title":"Python​","type":1,"pageTitle":"Dependencies for Submarine","url":"docs/devDocs/Dependencies#python","content":"Python Version\tSupport?3.5 (or earlier)\tX 3.6, 3.7\t√ 3.8 (or later)\tTo be verified "},{"title":"Development Guide","type":0,"sectionRef":"#","url":"docs/devDocs/Development","content":"","keywords":""},{"title":"Video​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#video","content":"From this Video, you will know how to deal with the configuration of Submarine and be able to contribute to it via Github. "},{"title":"Develop server​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#develop-server","content":""},{"title":"Prerequisites​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#prerequisites","content":"JDK 1.8Maven 3.3 or later ( &lt; 3.8.1 )Docker "},{"title":"Setting up checkstyle in IDE​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#setting-up-checkstyle-in-ide","content":"Checkstyle plugin may help to detect violations directly from the IDE. Install Checkstyle+IDEA plugin from Preference -&gt; PluginsOpen Preference -&gt; Tools -&gt; Checkstyle Set Checkstyle version: Checkstyle version: 8.0 Add (+) a new Configuration File Description: SubmarineUse a local checkstyle ${SUBMARINE_HOME}/dev-support/maven-config/checkstyle.xml Open the Checkstyle Tool Window, select the Submarine rule and execute the check "},{"title":"Testing​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#testing","content":"Unit Test For each class, there is a corresponding testClass. For example, SubmarineServerTest is used for testing SubmarineServer. Whenever you add a funtion in classes, you must write a unit test to test it. Integration Test: IntegrationTestK8s.md "},{"title":"Build from source​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#build-from-source","content":"Before building We assume the developer use minikube as a local kubernetes cluster.Make sure you have installed the submarine helm-chart in the cluster. Package the Submarine server into a new jar file mvn install -DskipTests Build the new server docker image in minikube # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/submarine/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Delete the server deployment and the operator will create a new one using the new image kubectl delete deployment submarine-server  "},{"title":"Develop workbench​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#develop-workbench","content":"Deploy the Submarine Follow Getting Started/Quickstart, and make sure you can connect to http://localhost:32080 in the browser. Install the dependencies cd submarine-workbench/workbench-web npm install Run the workbench based on proxy server npm run start The request sent to http://localhost:4200 will be redirected to http://localhost:32080.Open http://localhost:4200 in browser to see the real-time change of workbench. Frontend E2E test: IntegrationTestE2E.md "},{"title":"Develop database​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#develop-database","content":"Build the docker image # switch to minikube docker daemon to build image directly in minikube eval $(minikube docker-env) # run docker build ./dev-support/docker-images/database/build.sh # exit minikube docker daemon eval $(minikube docker-env -u) Deploy new pods in the cluster helm upgrade --set submarine.database.dev=true submarine ./helm-charts/submarine  "},{"title":"Develop operator​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#develop-operator","content":"For details, please check out the README and Developer Guide on GitHub. "},{"title":"Develop Submarine Website​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#develop-submarine-website","content":"Submarine website is built using Docusaurus 2, a modern static website generator. We store all the website content in markdown format in the submarine/website/docs. When committing a new patch to the submarine repo, Docusaurus will help us generate the html and javascript files and push them to https://github.com/apache/submarine-site/tree/asf-site. To update the website, click “Edit this page” on the website.  "},{"title":"Add a new page​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#add-a-new-page","content":"If you want to add a new page to the website, make sure to add the file path to sidebars.js. "},{"title":"Installation​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#installation","content":"We use the yarn package manager to install all dependencies for the website yarn install  "},{"title":"Build​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#build","content":"Make sure you can successfully build the website before creating a pull request. yarn build  "},{"title":"Local Development​","type":1,"pageTitle":"Development Guide","url":"docs/devDocs/Development#local-development","content":"This command starts a local development server and open up a browser window. Most changes are reflected live without having to restart the server. yarn start  "},{"title":"How to Verify","type":0,"sectionRef":"#","url":"docs/devDocs/HowToVerify","content":"","keywords":""},{"title":"Verification of the release candidate​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#verification-of-the-release-candidate","content":""},{"title":"1. Download the candidate version to be released to the local environment​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#1-download-the-candidate-version-to-be-released-to-the-local-environment","content":"svn co https://dist.apache.org/repos/dist/dev/submarine/${release_version}-${rc_version}/  "},{"title":"2. Verify whether the uploaded version is compliant​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#2-verify-whether-the-uploaded-version-is-compliant","content":"Begin the verification process, which includes but is not limited to the following content and forms. "},{"title":"2.1 Check if the release package is complete​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#21-check-if-the-release-package-is-complete","content":"The package uploaded to dist must include the source code package, and the binary package is optional. Whether it includes the source code package.Whether it includes the signature of the source code package.Whether it includes the sha512 of the source code package.If the binary package is uploaded, also check the contents listed in (2)-(4). "},{"title":"2.2 Check gpg signature​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#22-check-gpg-signature","content":"Import the public key curl https://dist.apache.org/repos/dist/dev/submarine/KEYS &gt; KEYS # Download KEYS gpg --import KEYS # Import KEYS to local  Trust the public key Trust the KEY used in this version.  gpg --edit-key xxxxxxxxxx # The KEY used in this version gpg (GnuPG) 2.2.21; Copyright (C) 2020 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Secret key is available. sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; gpg&gt; trust sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; Please decide how far you trust this user to correctly verify other users' keys (by looking at passports, checking fingerprints from different sources, etc.) 1 = I don't know or won't say 2 = I do NOT trust 3 = I trust marginally 4 = I trust fully 5 = I trust ultimately m = back to the main menu Your decision? 5 #choose 5 Do you really want to set this key to ultimate trust? (y/N) y # choose y sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt; gpg&gt; sec rsa4096/5EF3A66D57EC647A created: 2020-05-19 expires: never usage: SC trust: ultimate validity: ultimate ssb rsa4096/17628566FEED6AF7 created: 2020-05-19 expires: never usage: E [ultimate] (1). XXX YYYZZZ &lt;yourAccount@apache.org&gt;  Use the following command to check the signature. for i in *.tar.gz; do echo $i; gpg --verify $i.asc $i ; done #Or gpg --verify apache-submarine-${release_version}-src.tar.gz.asc apache-submarine-${release_version}-src.tar.gz # If you upload a binary package, you also need to check whether the signature of the binary package is correct. gpg --verify apache-submarine-server-${release_version}-bin.tar.gz.asc apache-submarine-server-${release_version}-bin.tar.gz gpg --verify apache-submarine-client-${release_version}-bin.tar.gz.asc apache-submarine-client-${release_version}-bin.tar.gz  Check the result If something like the following appears, it means that the signature is correct. The keyword：Good signature apache-submarine-${release_version}-src.tar.gz gpg: Signature made Sat May 30 11:45:01 2020 CST gpg: using RSA key 9B12C2228BDFF4F4CFE849445EF3A66D57EC647A gpg: Good signature from &quot;XXX YYYZZZ &lt;yourAccount@apache.org&gt;&quot; [ultimate]gular2  "},{"title":"2.3 Check sha512 hash​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#23-check-sha512-hash","content":"After calculating the sha512 hash locally, verify whether it is consistent with the one on dist. for i in *.tar.gz; do echo $i; gpg --print-md SHA512 $i; done #Or gpg --print-md SHA512 apache-submarine-${release_version}-src.tar.gz # If you upload a binary package, you also need to check the sha512 hash of the binary package. gpg --print-md SHA512 apache-submarine-server-${release_version}-bin.tar.gz gpg --print-md SHA512 apache-submarine-client-${release_version}-bin.tar.gz # 或者 for i in *.tar.gz.sha512; do echo $i; sha512sum -c $i; done  "},{"title":"2.4. Check the file content of the source package.​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#24-check-the-file-content-of-the-source-package","content":"Unzip apache-submarine-${release_version}-src.tar.gz and check as follows: Whether the DISCLAIMER file exists and whether the content is correct.Whether the LICENSE and NOTICE file exists and whether the content is correct.Whether all files have ASF License header.Whether the source code can be compiled normally.Whether the single test is passed..... "},{"title":"2.5 Check the binary package (if the binary package is uploaded)​","type":1,"pageTitle":"How to Verify","url":"docs/devDocs/HowToVerify#25-check-the-binary-package-if-the-binary-package-is-uploaded","content":"Unzip apache-submarine-client-${release_version}-src.tar.gz and apache-submarine-server-${release_version}-src.tar.gz, then check as follows: Whether the DISCLAIMER file exists and whether the content is correct.Whether the LICENSE and the NOTICE file exists and whether the content is correct.Whether the deployment is successful.Deploy a test environment to verify whether production and consumption can run normally.Verify what you think might go wrong. "},{"title":"How to Run Frontend Integration Test","type":0,"sectionRef":"#","url":"docs/devDocs/IntegrationTestE2E","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/devDocs/IntegrationTestE2E#introduction","content":"The test cases under the directory test-e2e are integration tests to ensure the correctness of the Submarine Workbench. These test cases can be run either locally or on GitHub Actions. "},{"title":"Run E2E test locally​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/devDocs/IntegrationTestE2E#run-e2e-test-locally","content":"Ensure you have setup the submarine locally. If not, you can refer to Quickstart. Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80 Modify run_frontend_e2e.sh You need to modify the port and the URL in this script to where you run the workbench on. Example: If your Submarine workbench is running on 127.0.0.1:4200, you should modify the WORKBENCH_PORT to 4200. # at submarine-test/test_e2e/run_frontend_e2e.sh ... # ======= Modifiable Variables ======= # # Note: URL must start with &quot;http&quot; # (Ref: https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/WebDriver.html#get(java.lang.String)) WORKBENCH_PORT=8080 #&lt;= modify this URL=&quot;http://127.0.0.1&quot; #&lt;=modify this # ==================================== # ... Run run_frontend_e2e.sh (Run a specific test case) This script will check whether the port can be accessed or not, and run the test case. # at submarine-test/test_e2e ./run_fronted_e2e.sh ${TESTCASE} # TESTCASE is the IT you want to run, ex: loginIT, experimentIT... Run all test cases Following commands will compile all files and run all files ending with &quot;IT&quot; in the directory. # Make sure the Submarine workbench is running on 127.0.0.1:8080 cd submarine/submarine-test/test-e2e # Method 1: mvn verify # Method 2: mvn clean install -U  "},{"title":"Run E2E test in GitHub Actions​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/devDocs/IntegrationTestE2E#run-e2e-test-in-github-actions","content":"Each time a commit is pushed, GitHub Actions will be triggered automatically. "},{"title":"Add a new frontend E2E test case​","type":1,"pageTitle":"How to Run Frontend Integration Test","url":"docs/devDocs/IntegrationTestE2E#add-a-new-frontend-e2e-test-case","content":"WARNING You MUST read the document carefully, and understand the difference between explicit wait, implicit wait, and fluent wait.Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. We define many useful functions in AbstractSubmarineIT.java. "},{"title":"How to Release","type":0,"sectionRef":"#","url":"docs/devDocs/HowToRelease","content":"","keywords":""},{"title":"0. Preface​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#0-preface","content":"Source Release is the focus of Apache’s attention and it is also a required content for release. Binary Release is optional, Submarine can choose whether to release the binary package to the Apache warehouse or to the Maven central warehouse. Please refer to the following link to find more details about release guidelines: How to Release Submarine Release Guidelines "},{"title":"1. Add GPG KEY​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#1-add-gpg-key","content":"Main references in this chapter:https://infra.apache.org/openpgp.html &gt; This chapter is only needed for the first release manager of the project. "},{"title":"1.1 Install gpg​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#11-install-gpg","content":"Detailed installation documents can refer to tutorial, The environment configuration of Mac OS is as follows: $ brew install gpg $ gpg --version #Check the version，should be 2.x  "},{"title":"1.2 generate gpg Key​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#12-generate-gpg-key","content":"Need to pay attention to the following points：​ When entering the name, it is better to be consistent with the Full name registered in ApacheThe mailbox used should be apache mailboxIt’s better to use pinyin or English for the name, otherwise there will be garbled characters Follow the hint，generate a key​ ➜ ~ gpg --full-gen-key gpg (GnuPG) 2.2.20; Copyright (C) 2020 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) (14) Existing key from card Your selection? 1 # enter 1 here RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 4096 # enter 4096 here Requested keysize is 4096 bits Please specify how long the key should be valid. 0 = key does not expire &lt;n&gt; = key expires in n days &lt;n&gt;w = key expires in n weeks &lt;n&gt;m = key expires in n months &lt;n&gt;y = key expires in n years Key is valid for? (0) 0 # enter 0 here Key does not expire at all Is this correct? (y/N) y # enter y here GnuPG needs to construct a user ID to identify your key. Real name: Guangxu Cheng # enter your name here Email address: gxcheng@apache.org # enter your mailbox here Comment: # enter some comment here (Optional) You selected this USER-ID: &quot;Guangxu Cheng &lt;gxcheng@apache.org&gt;&quot; Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O #enter O here We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. # A dialog box will pop up, asking you to enter the key for this gpg. ┌──────────────────────────────────────────────────────┐ │ Please enter this passphrase │ │ │ │ Passphrase: _______________________________ │ │ │ │ &lt;OK&gt; &lt;Cancel&gt; │ └──────────────────────────────────────────────────────┘ # After entering the secret key, it will be created. And it will output the following information. gpg: key 2DD587E7B10F3B1F marked as ultimately trusted gpg: revocation certificate stored as '/Users/cheng/.gnupg/openpgp-revocs.d/41936314E25F402D5F7D73152DD587E7B10F3B1F.rev' public and secret key created and signed. pub rsa4096 2020-05-19 [SC] 41936314E25F402D5F7D73152DD587E7B10F3B1F uid Guangxu Cheng &lt;gxcheng@apache.org&gt; sub rsa4096 2020-05-19 [E]  "},{"title":"1.3 Upload the generated key to the public server​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#13-upload-the-generated-key-to-the-public-server","content":"➜ ~ gpg --list-keys ------------------------------- pub rsa4096 2020-05-18 [SC] 5931F8CFD04B37A325E4465D8C0D31C4149B3A87 uid [ultimate] Guangxu Cheng &lt;gxcheng@apache.org&gt; sub rsa4096 2020-05-18 [E] # Send public key to keyserver via key id $ gpg --keyserver pgpkeys.mit.edu --send-key &lt;key id&gt; # Among them, pgpkeys.mit.edu is a randomly selected keyserver, and the keyserver list is: https://sks-keyservers.net/status/, which is automatically synchronized with each other, you can choose any one.  "},{"title":"1.4 Check whether the key is created successfully​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#14-check-whether-the-key-is-created-successfully","content":"Through the following URL, use the email to check whether the upload is successful or not. It will take about a minute to find out. When searching, check the show full-key hashes under advance on http://keys.gnupg.net. The query results are as follows: "},{"title":"1.5 Add your gpg public key to the KEYS file​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#15-add-your-gpg-public-key-to-the-keys-file","content":"SVN is required for this step The svn library of the DEV branch is https://dist.apache.org/repos/dist/dev/submarine The SVN library of the Release branch is https://dist.apache.org/repos/dist/release/submarine 1.5.1 Add the public key to KEYS in the dev branch to release the RC version​ ➜ ~ svn co https://dist.apache.org/repos/dist/dev/submarine /tmp/submarine-dist-dev # This step is relatively slow, and all versions will be copied. If the network is disconnected, use svn cleanup to delete the lock and re-execute it, and the transfer will be resumed. ➜ ~ cd submarine-dist-dev ➜ submarine-dist-dev ~ (gpg --list-sigs YOUR_NAME@apache.org &amp;&amp; gpg --export --armor YOUR_NAME@apache.org) &gt;&gt; KEYS # Append the KEY you generated to the file KEYS, it is best to check if it is correct after appending. ➜ submarine-dist-dev ~ svn add . # If there is a KEYS file before, it is not needed. ➜ submarine-dist-dev ~ svn ci -m &quot;add gpg key for YOUR_NAME&quot; # Next, you will be asked to enter a username and password, just use your apache username and password.  1.5.2 Add the public key to KEYS in the release branch to release the official version​ ➜ ~ svn co https://dist.apache.org/repos/dist/release/submarine /tmp/submarine-dist-release ➜ ~ cd submarine-dist-release ➜ submarine-dist-release ~ (gpg --list-sigs YOUR_NAME@apache.org &amp;&amp; gpg --export --armor YOUR_NAME@apache.org) &gt;&gt; KEYS # Append the KEY you generated to the file KEYS, it is best to check if it is correct after appending. ➜ submarine-dist-release ~ svn add . # If there is a KEYS file before, it is not needed. ➜ submarine-dist-release ~ svn ci -m &quot;add gpg key for YOUR_NAME&quot; # Next, you will be asked to enter a username and password, just use your apache username and password.  "},{"title":"1.6 Upload GPG public key to Github account​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#16-upload-gpg-public-key-to-github-account","content":"Go to https://github.com/settings/keys and add GPG KEYS.If you find &quot;unverified&quot; is written after the key after adding it, remember to bind the mailbox used in the GPG key to your github account (https://github.com/settings/emails). "},{"title":"2. Set maven settings​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#2-set-maven-settings","content":"Skip if it has already been set In the maven configuration file ~/.m2/settings.xml, add the following &lt;server&gt; item &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;settings xsi:schemaLocation=&quot;http://maven.apache.org/SETTINGS/1.1.0 http://maven.apache.org/xsd/settings-1.1.0.xsd&quot; xmlns=&quot;http://maven.apache.org/SETTINGS/1.1.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;servers&gt; &lt;!-- Apache Repo Settings --&gt; &lt;server&gt; &lt;id&gt;apache.snapshots.https&lt;/id&gt; &lt;username&gt;{user-id}&lt;/username&gt; &lt;password&gt;{user-pass}&lt;/password&gt; &lt;/server&gt; &lt;server&gt; &lt;id&gt;apache.releases.https&lt;/id&gt; &lt;username&gt;{user-id}&lt;/username&gt; &lt;password&gt;{user-pass}&lt;/password&gt; &lt;/server&gt; &lt;/servers&gt; &lt;profiles&gt; &lt;profile&gt; &lt;id&gt;apache-release&lt;/id&gt; &lt;properties&gt; &lt;gpg.keyname&gt;Your KEYID&lt;/gpg.keyname&gt;&lt;!-- Your GPG Keyname here --&gt; &lt;!-- Use an agent: Prevents being asked for the password during the build --&gt; &lt;gpg.useagent&gt;true&lt;/gpg.useagent&gt; &lt;gpg.passphrase&gt;Your password of the private key&lt;/gpg.passphrase&gt; &lt;/properties&gt; &lt;/profile&gt; &lt;/profiles&gt; &lt;/settings&gt;  "},{"title":"3. Compile and package​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#3-compile-and-package","content":""},{"title":"3.1 Prepare a branch​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#31-prepare-a-branch","content":"Pull the new branch from the main branch as a release branch, release-${release_version} Update CHANGES.md Check whether the code is normal, including successful compilation, all unit tests, successful RAT check, etc. # build check $ mvn clean package -Dmaven.javadoc.skip=true # RAT check $ mvn apache-rat:check Change the version number "},{"title":"3.2 Create the tag​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#32-create-the-tag","content":"Before creating the tag, make sure that the code has been checked for errors, including: successful compilation, all unit tests, and successful RAT checks, etc. Create a tag with signature $ git_tag=${release_version}-${rc_version} $ git tag -s $git_tag -m &quot;Tagging the ${release_version} first Releae Candidate (Candidates start at zero)&quot; # If a error happened like gpg: signing failed: secret key not available, set the private key first. $ git config user.signingkey ${KEY_ID}  "},{"title":"3.3 Package the source code​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#33-package-the-source-code","content":"After the tag is successfully created, the tag source code should be packaged into a tar package. mkdir /tmp/apache-submarine-${release_version}-${rc_version} git archive --format=tar.gz --output=&quot;/tmp/apache-submarine-${release_version}-${rc_version}/apache-submarine-${release_version}-src.tar.gz&quot; --prefix=&quot;apache-submarine-${release_version}/&quot; $git_tag  "},{"title":"3.4 Packaged binary package​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#34-packaged-binary-package","content":"Compile the source code packaged in the previous step cd /tmp/apache-submarine-${release_version}-${rc_version} # Enter the source package directory. tar xzvf apache-submarine-${release_version}-src.tar.gz # Unzip the source package. cd apache-submarine-${release_version} # Enter the source directory. mvn compile clean install package -DskipTests # Compile. cp ./submarine-distribution/target/apache-submarine-${release_version}-bin.tar.gz /tmp/apache-submarine-${release_version}-${rc_version}/ # Copy the binary package to the source package directory to facilitate signing the package in the next step.  "},{"title":"3.5 Sign the source package/binary package/sha512​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#35-sign-the-source-packagebinary-packagesha512","content":"for i in *.tar.gz; do echo $i; gpg --print-md SHA512 $i &gt; $i.sha512 ; done # Calculate SHA512 for i in *.tar.gz; do echo $i; gpg --armor --output $i.asc --detach-sig $i ; done # Calculate the signature  "},{"title":"3.6 Check whether the generated signature/sha512 is correct​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#36-check-whether-the-generated-signaturesha512-is-correct","content":"For example, verify that the signature is correct as follows: for i in *.tar.gz; do echo $i; gpg --verify $i.asc $i ; done  "},{"title":"4. Prepare for Apache release​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#4-prepare-for-apache-release","content":""},{"title":"4.1 Publish the jar package to the Apache Nexus repository​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#41-publish-the-jar-package-to-the-apache-nexus-repository","content":"cd /tmp/apache-submarine-${release_version}-${rc_version} # Enter the source package directory tar xzvf apache-submarine-${release_version}-src.tar.gz # Unzip the source package cd apache-submarine-${release_version} mvn -DskipTests deploy -Papache-release -Dmaven.javadoc.skip=true # Start upload  "},{"title":"4.2 Upload the tag to git repository​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#42-upload-the-tag-to-git-repository","content":"git push origin ${release_version}-${rc_version}  "},{"title":"4.3 Upload the compiled file to dist​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#43-upload-the-compiled-file-to-dist","content":"This step requires the use of SVN, the svn library of the DEV branch is https://dist.apache.org/repos/dist/dev/submarine "},{"title":"4.3.1 Checkout Submarine to a local directory​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#431-checkout-submarine-to-a-local-directory","content":"# This step may be slow, and all versions will be tested. If the network is broken, use svn cleanup to delete the lock and re-execute it, and the upload will be resumed. svn co https://dist.apache.org/repos/dist/dev/submarine /tmp/submarine-dist-dev  "},{"title":"4.3.2 Add the public key to the KEYS file and submit it to the SVN repository​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#432-add-the-public-key-to-the-keys-file-and-submit-it-to-the-svn-repository","content":"cd /tmp/submarine-dist-dev mkdir ${release_version}-${rc_version} # Create version directory # Copy the source code package and signed package here. cp /tmp/apache-submarine-${release_version}-${rc_version}/*tar.gz* ${release_version}-${rc_version}/ svn status # Check svn status. svn add ${release_version}-${rc_version} # Add to svn version. svn status # Check svn status. svn commit -m &quot;prepare for ${release_version} ${rc_version}&quot; # Submit to svn remote server.  "},{"title":"4.4 Shut down the Apache Staging repository​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#44-shut-down-the-apache-staging-repository","content":"Please make sure all artifacts are fine. Log in http://repository.apache.org , with Apache accountClick on Staging repositories on the left.Search for Submarine keywords and select the repository you uploaded recently.Click the Close button above, and a series of checks will be performed during this process.After the check is passed, a link will appear on the Summary tab below. Please save this link and put it in the next voting email. The link should look like: https://repository.apache.org/content/repositories/orgapachesubmarine-xxxx WARN: Please note that clicking Close may fail, please check the reason for the failure and deal with it. "},{"title":"5. Enter voting​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#5-enter-voting","content":"To vote in the Submarine community, send an email to:dev@submarine.apache.org "},{"title":"Vote in the Submarine community​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#vote-in-the-submarine-community","content":"Voting template​ Title：[VOTE] Submarine-${release_version}-${rc_version} is ready for a vote! Content： Hi folks, Thanks to everyone's help on this release. I've created a release candidate (${rc_version}) for submarine ${release_version}. The highlighted features are as follows: 1. AAA 2. BBB 3. CCC The mini-submarine image is here: docker pull apache/submarine:mini-${release_version}-${rc_version} The RC tag in git is here: https://github.com/apache/submarine/releases/tag/release-${release_version}-${rc_version} The RC release artifacts are available at: http://home.apache.org/~pingsutw/submarine-${release_version}-${rc_version} The Maven staging repository is here: https://repository.apache.org/content/repositories/orgapachesubmarine-1030 My public key is here: https://dist.apache.org/repos/dist/release/submarine/KEYS *This vote will run for 7 days, ending on DDDD/EE/FF at 11:59 pm PST.* For the testing, I have verified the 1. Build from source, Install Submarine on minikube 2. Workbench UI (Experiment / Notebook / Template / Environment) 3. Experiment / Notebook / Template / Environment REST API My +1 to start. Thanks! BR, XXX  Announce voting results template​ Title：[RESULT][VOTE] Release Apache Submarine ${release_version} ${rc_version} Content： Hello Apache Submarine PMC and Community, The vote closes now as 72hr have passed. The vote PASSES with xx (+1 non-binding) votes from the PMC, xx (+1 non-binding) vote from the rest of the developer community, and no further 0 or -1 votes. The vote thread:{vote_mail_address} Thank you for your support. Your Submarine Release Manager  "},{"title":"6. Officially released​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#6-officially-released","content":""},{"title":"6.1 Merge the changes from the release-${release_version} branch to the master branch​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#61-merge-the-changes-from-the-release-release_version-branch-to-the-master-branch","content":""},{"title":"6.2 Release the version in the Apache Staging repository​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#62-release-the-version-in-the-apache-staging-repository","content":"Please make sure all artifacts are fine. Log in to http://repository.apache.org with your Apache account.Click on Staging repositories on the left.Search for Submarine keywords, select your recently uploaded repository, the repository specified in the voting email.Click the Release button above, and a series of checks will be carried out during this process.It usually takes 24 hours to wait for the repository to synchronize to other data sources "},{"title":"6.3 Update official website link​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#63-update-official-website-link","content":""},{"title":"6.4. Send an email todev@submarine.apache.org​","type":1,"pageTitle":"How to Release","url":"docs/devDocs/HowToRelease#64-send-an-email-todevsubmarineapacheorg","content":"Please make sure that the repository in 6.4 has been successfully released, generally the email is sent 24 hours after 6.4 Announce release email template: Title： [ANNOUNCE] Apache Submarine ${release_version} release! Content： Hi folks, It's a great honor for me to announce that the Apache Submarine Community has released Apache Submarine ${release_version}! The highlighted features are: 1. AAA 2. BBB 3. CCC Tons of thanks to our contributors and community! Let's keep fighting! *Apache Submarine ${release_version} released*: https://submarine.apache.org/docs/next/releases/submarine-release-${release_version} BR, XXXX  "},{"title":"How to Run Integration K8s Test","type":0,"sectionRef":"#","url":"docs/devDocs/IntegrationTestK8s","content":"","keywords":""},{"title":"Introduction​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/devDocs/IntegrationTestK8s#introduction","content":"The test cases under the directory test-k8s are integration tests to ensure the correctness of the Submarine RESTful API. You can run these tests either locally or on GitHub Actions. Before running the tests, the minikube (KinD) cluster must be created. Then, compile and package the submarine project in submarine-dist directory for building a docker image. In addition, the 8080 port in submarine-traefik should be forwarded. "},{"title":"Run k8s test locally​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/devDocs/IntegrationTestK8s#run-k8s-test-locally","content":"Ensure you have setup the KinD cluster or minikube cluster. If you haven't, follow this minikube tutorial Build the submarine from source and upgrade the server pod through this guide Forward port kubectl port-forward --address 0.0.0.0 service/submarine-traefik 8080:80 Install the latest package &quot;submarine-server-core&quot; into the local repository, for use as a dependency in the module test-k8s mvn install -DskipTests Execute the test command mvn verify -DskipRat -pl :submarine-test-k8s -Phadoop-2.9 -B   "},{"title":"Run k8s test in GitHub Actions​","type":1,"pageTitle":"How to Run Integration K8s Test","url":"docs/devDocs/IntegrationTestK8s#run-k8s-test-in-github-actions","content":"Each time a code is submitted, GitHub Actions is triggered automatically. "},{"title":"下载 Apache Submarine","type":0,"sectionRef":"#","url":"docs/download","content":"","keywords":""},{"title":"验证文件完整性​","type":1,"pageTitle":"下载 Apache Submarine","url":"docs/download#验证文件完整性","content":"您必须使用 PGP 或 MD5 签名来 验证 下载文件的完整性。 此签名应与 KEYS 文件匹配。 gpg --import KEYS gpg --verify submarine-dist-X.Y.Z-src.tar.gz.asc  "},{"title":"旧版本​","type":1,"pageTitle":"下载 Apache Submarine","url":"docs/download#旧版本","content":"Apache Submarine 0.6.0 于2021年10月21日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.6.0-hadoop-2.9.tar.gz (518 MB, checksum, signature) 源代码:submarine-dist-0.6.0-src.tar.gz (8.3 MB, checksum, signature)) Docker 镜像: mini-submarine docker pull apache/submarine:mini-0.6.0submarine server docker pull apache/submarine:server-0.6.0submarine database docker pull apache/submarine:database-0.6.0submarine jupyter-notebook docker pull apache/submarine:jupyter-notebook-0.6.0submarine quickstart docker pull apache/submarine:quickstart-0.6.0submarine serve docker pull apache/submarine:serve-0.6.0submarine mlflow docker pull apache/submarine:mlflow-0.6.0submarine operator docker pull apache/submarine:operator-0.6.0 SDK: PySubmarine pip install apache-submarine==0.6.0 Apache Submarine 0.5.0 于2020年12月17日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.5.0-hadoop-2.9.tar.gz (505 MB, checksum, signature)源代码:submarine-dist-0.5.0-src.tar.gz (5.0 MB, checksum, signature))Docker 镜像: mini-submarine docker pull apache/submarine:mini-0.5.0submarine server docker pull apache/submarine:server-0.5.0submarine database docker pull apache/submarine:database-0.5.0submarine jupyter-notebook docker pull apache/submarine:jupyter-notebook-0.5.0 SDK: PySubmarine pip install apache-submarine==0.5.0 Apache Submarine 0.4.0于2020年7月5日发布 (发布公告) (git tag) 二进制部署包:submarine-dist-0.4.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)源代码:submarine-dist-0.4.0-src.tar.gz (6 MB,checksum,signature)Docker 镜像:mini-submarine (guide) Apache Submarine 0.3.0 于2020年2月1日发布 (发布公告) (git tag) submarine 二进制部署包:submarine-dist-0.3.0-hadoop-2.9.tar.gz (550 MB,checksum,signature)源代码:submarine-dist-0.3.0-src.tar.gz (6 MB,checksum,signature)Docker 镜像:mini-submarine (guide) Apache Submarine 0.2.0 于2019年7月2日发布 submarine 二进制部署包:hadoop-submarine-0.2.0.tar.gz (111 MB,checksum,signature,Announcement) 源代码:hadoop-submarine-0.2.0-src.tar.gz (1.4 MB,checksum,signature) Apache Submarine 0.1.0 于2019年1月16日发布 submarine 二进制部署包:submarine-0.2.0-bin-all.tgz (97 MB,checksum,signature,Announcement) 源代码:submarine-hadoop-3.2.0-src.tar.gz (1.1 MB,checksum,signature) "},{"title":"Custom Configuation","type":0,"sectionRef":"#","url":"docs/gettingStarted/helm","content":"","keywords":""},{"title":"Helm Chart Volume Type​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#helm-chart-volume-type","content":"Submarine can support various volume types, currently including hostPath (default) and NFS. It can be easily configured in the ./helm-charts/submarine/values.yaml, or you can override the default values in values.yaml by helm CLI. hostPath​ In hostPath, you can store data directly in your node.Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable hostPath storage, set .storage.type to host.To set the root path for your storage, set .storage.host.root to &lt;any-path&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: host host: root: /tmp  NFS (Network File System)​ In NFS, it allows multiple clients to access a shared space.Prerequisite: A pre-existing NFS server. You have two options. Create NFS server kubectl create -f ./dev-support/nfs-server/nfs-server.yaml It will create a nfs-server pod in kubernetes cluster, and expose nfs-server ip at 10.96.0.2Use your own NFS server Install NFS dependencies in your nodes Ubuntu apt-get install -y nfs-common CentOS yum install nfs-util Usage: Configure setting in ./helm-charts/submarine/values.yaml.To enable NFS storage, set .storage.type to nfs.To set the ip for NFS server, set .storage.nfs.ip to &lt;any-ip&gt; Example: # ./helm-charts/submarine/values.yaml storage: type: nfs nfs: ip: 10.96.0.2  "},{"title":"Access to Submarine Server​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#access-to-submarine-server","content":"Submarine server by default expose 8080 port within K8s cluster. After Submarine v0.5 uses Traefik as reverse-proxy by default. If you don't want to use Traefik, you can modify below value to false in ./helm-charts/submarine/values.yaml. # Use Traefik by default traefik: enabled: true  To access the server from outside of the cluster, we use Traefik ingress controller and NodePort for external access.\\ Please refer to ./helm-charts/submarine/charts/traefik/values.yaml and Traefik docsfor more details if you want to customize the default value for Traefik. Notice:If you use kind to run local Kubernetes cluster, please refer to this docsand set the configuration &quot;extraPortMappings&quot; when creating the k8s cluster. kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraPortMappings: - containerPort: 32080 hostPort: [the port you want to access]  # Use nodePort and Traefik ingress controller by default. # To access the submarine server, open the following URL in your browser. http://127.0.0.1:32080  If minikube is installed, use the following command to find the URL to the Submarine server. $ minikube service submarine-traefik --url  "},{"title":"Kubernetes Dashboard (optional)​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#kubernetes-dashboard-optional","content":""},{"title":"Deploy​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#deploy","content":"To deploy Dashboard, execute the following command: kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml  "},{"title":"Create RBAC​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#create-rbac","content":"Run the following commands to grant the cluster access permission of dashboard: kubectl create serviceaccount dashboard-admin-sa kubectl create clusterrolebinding dashboard-admin-sa --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin-sa  "},{"title":"Get access token (optional)​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#get-access-token-optional","content":"If you want to use the token to login the dashboard, run the following commands to get key: kubectl get secrets # select the right dashboard-admin-sa-token to describe the secret kubectl describe secret dashboard-admin-sa-token-6nhkx  "},{"title":"Start dashboard service​","type":1,"pageTitle":"Custom Configuation","url":"docs/gettingStarted/helm#start-dashboard-service","content":"kubectl proxy  Now access Dashboard at: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ Dashboard screenshot:  "},{"title":"Jupyter Notebook","type":0,"sectionRef":"#","url":"docs/gettingStarted/notebook","content":"","keywords":""},{"title":"Working with notebooks​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/gettingStarted/notebook#working-with-notebooks","content":"We recommend using Web UI to manage notebooks. "},{"title":"Notebooks Web UI​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/gettingStarted/notebook#notebooks-web-ui","content":"Notebooks can be started from the Web UI. You can click the “Notebook” tab in the left-hand panel to manage your notebooks.  To create a new notebook server, click “New Notebook”. You should see a form for entering details of your new notebook server. Notebook Name : Name of the notebook server. It should follow the rules below. Contain at most 63 characters.Contain only lowercase alphanumeric characters or '-'.Start with an alphabetic character.End with an alphanumeric character. Environment : It defines a set of libraries and docker image.CPU and MemoryGPU (optional)EnvVar (optional) : Injects environment variables into the notebook. If you want to use notebook-gpu-env, you should set up the gpu environment in your kubernetes. You can install NVIDIA/k8s-device-plugin. The list of prerequisites for running the NVIDIA device plugin is described below NVIDIA drivers ~= 384.81nvidia-docker version &gt; 2.0docker configured with nvidia as the default runtimeKubernetes version &gt;= 1.10 If you’re not sure which environment you need, please choose the environment “notebook-env” for the new notebook.  You should see your new notebook server. Click the name of your notebook server to connect to it.  "},{"title":"Experiment with your notebook​","type":1,"pageTitle":"Jupyter Notebook","url":"docs/gettingStarted/notebook#experiment-with-your-notebook","content":"The environment “notebook-env” includes Submarine Python SDK which can talk to Submarine Server to create experiments, as the example below: from __future__ import print_function import submarine from submarine.client.models.environment_spec import EnvironmentSpec from submarine.client.models.experiment_spec import ExperimentSpec from submarine.client.models.experiment_task_spec import ExperimentTaskSpec from submarine.client.models.experiment_meta import ExperimentMeta from submarine.client.models.code_spec import CodeSpec # Create Submarine Client submarine_client = submarine.ExperimentClient() # Define TensorFlow experiment spec environment = EnvironmentSpec(image='apache/submarine:tf-dist-mnist-test-1.0') experiment_meta = ExperimentMeta(name='mnist-dist', namespace='default', framework='Tensorflow', cmd='python /var/tf_dist_mnist/dist_mnist.py --train_steps=100', env_vars={'ENV1': 'ENV1'}) worker_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) ps_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M', replicas=1) code_spec = CodeSpec(sync_mode='git', url='https://github.com/apache/submarine.git') experiment_spec = ExperimentSpec(meta=experiment_meta, environment=environment, code=code_spec, spec={'Ps' : ps_spec,'Worker': worker_spec}) # Create experiment experiment = submarine_client.create_experiment(experiment_spec=experiment_spec)  You can create a new notebook, paste the above code and run it. Or, you can find the notebook submarine_experiment_sdk.ipynb inside the launched notebook session. You can open it, try it out. After experiment submitted to Submarine server, you can find the experiment jobs on the UI. "},{"title":"Submarine Python SDK","type":0,"sectionRef":"#","url":"docs/gettingStarted/python-sdk","content":"","keywords":""},{"title":"Prepare Python Environment to run Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#prepare-python-environment-to-run-submarine-sdk","content":"Submarine SDK requires Python3.7+. It's better to use a new Python environment created by Anoconda or Python virtualenv to try this to avoid trouble to existing Python environment. A sample Python virtual env can be setup like this: wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz tar xf virtualenv-16.0.0.tar.gz # Make sure to install using Python 3 python3 virtualenv-16.0.0/virtualenv.py venv . venv/bin/activate  "},{"title":"Install Submarine SDK​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#install-submarine-sdk","content":""},{"title":"Install SDK from pypi.org (recommended)​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#install-sdk-from-pypiorg-recommended","content":"Starting from 0.4.0, Submarine provides Python SDK. Please change it to a proper version needed. More detail: https://pypi.org/project/apache-submarine/ # Install latest stable version pip install apache-submarine # Install specific version pip install apache-submarine==&lt;REPLACE_VERSION&gt;  "},{"title":"Install SDK from source code​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#install-sdk-from-source-code","content":"Please first clone code from github or go to http://submarine.apache.org/download.html to download released source code. git clone https://github.com/apache/submarine.git # (optional) chackout specific branch or release git checkout &lt;correct release tag/branch&gt; cd submarine/submarine-sdk/pysubmarine pip install .  "},{"title":"Manage Submarine Experiment​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#manage-submarine-experiment","content":"Assuming you've installed submarine on K8s and forward the traefik service to localhost, now you can open a Python shell, Jupyter notebook or any tools with Submarine SDK installed. Follow SDK experiment example to run an experiment. "},{"title":"Training a DeepFM model​","type":1,"pageTitle":"Submarine Python SDK","url":"docs/gettingStarted/python-sdk#training-a-deepfm-model","content":"The Submarine also supports users to train an easy-to-use CTR model with a few lines of code and a configuration file, so they don’t need to reimplement the model by themself. In addition, they can train the model on both local on distributed systems, such as Hadoop or Kubernetes. Follow SDK DeepFM example to try the model. "},{"title":"MLflow UI","type":0,"sectionRef":"#","url":"docs/userDocs/others/mlflow","content":"","keywords":""},{"title":"Usage​","type":1,"pageTitle":"MLflow UI","url":"docs/userDocs/others/mlflow#usage","content":"MLflow UI shows the tracking result of the experiments. When we use the log_param or log_metric in ModelClient API, we could view the result in MLflow UI. Below is the example of the usage of MLflow UI. "},{"title":"Example​","type":1,"pageTitle":"MLflow UI","url":"docs/userDocs/others/mlflow#example","content":"Run the following code in the cluster from submarine import ModelsClient import random import time if __name__ == &quot;__main__&quot;: modelClient = ModelsClient() with modelClient.start() as run: modelClient.log_param(&quot;learning_rate&quot;, random.random()) for i in range(100): time.sleep(1) modelClient.log_metric(&quot;mse&quot;, random.random() * 100, i) modelClient.log_metric(&quot;acc&quot;, random.random(), i)  In the MLflow UI page, you can see the log_param and the log_metric result. You can also compare the training between different workers.  "},{"title":"Quickstart","type":0,"sectionRef":"#","url":"docs/gettingStarted/quickstart","content":"","keywords":""},{"title":"Installation​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#installation","content":""},{"title":"Prepare a Kubernetes cluster​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#prepare-a-kubernetes-cluster","content":"Prerequisite Check dependency page for the compatible versionkubectlhelm (Helm v3 is minimum requirement.)minikube. Start minikube cluster minikube start --vm-driver=docker --cpus 8 --memory 4096 --kubernetes-version v1.21.2  "},{"title":"Launch submarine in the cluster​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#launch-submarine-in-the-cluster","content":"Clone the project git clone https://github.com/apache/submarine.git  Install the submarine operator and dependencies by helm chart cd submarine helm install submarine ./helm-charts/submarine  Create a Submarine custom resource and the operator will create the submarine server, database, etc. for us. kubectl apply -f submarine-cloud-v2/artifacts/examples/example-submarine.yaml  "},{"title":"Ensure submarine is ready​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#ensure-submarine-is-ready","content":"Use kubectl to query the status of pods kubectl get pods  Make sure each pod is Running NAME READY STATUS RESTARTS AGE notebook-controller-deployment-5d4f5f874c-mnbc8 1/1 Running 0 61m pytorch-operator-844c866d54-xm8nl 1/1 Running 2 61m submarine-database-85bd68dbc5-qggtm 1/1 Running 0 11m submarine-minio-76465444f6-hdgdp 1/1 Running 0 11m submarine-mlflow-75f86d8f4d-rj2z7 1/1 Running 0 11m submarine-operator-5dd79cdf86-gpm2p 1/1 Running 0 61m submarine-server-68985b767-vjdvx 1/1 Running 0 11m submarine-tensorboard-5df8499fd4-vnklf 1/1 Running 0 11m submarine-traefik-7cbcfd4bd9-wbf8b 1/1 Running 0 61m tf-job-operator-6bb69fd44-zmlmr 1/1 Running 1 61m  "},{"title":"Connect to workbench​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#connect-to-workbench","content":"Exposing service # Method 1 -- use minikube ip minikube ip # you'll get the IP address of minikube, ex: 192.168.49.2 # Method 2 -- use port-forwarding kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80 View workbench If you use method 1, go to http://{minikube ip}:32080. For example, http://192.168.49.2:32080. If you use method 2, go to http://0.0.0.0:32080. "},{"title":"Example: Submit a mnist distributed example​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#example-submit-a-mnist-distributed-example","content":"We put the code of this example here. train.py is our training script, and build.sh is the script to build a docker image. "},{"title":"1. Write a python script for distributed training​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#1-write-a-python-script-for-distributed-training","content":"Take a simple mnist tensorflow script as an example. We choose MultiWorkerMirroredStrategy as our distributed strategy. &quot;&quot;&quot; ./dev-support/examples/quickstart/train.py Reference: https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py &quot;&quot;&quot; import tensorflow_datasets as tfds import tensorflow as tf from tensorflow.keras import layers, models import submarine def make_datasets_unbatched(): BUFFER_SIZE = 10000 # Scaling MNIST data from (0, 255] to (0., 1.] def scale(image, label): image = tf.cast(image, tf.float32) image /= 255 return image, label datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True) return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE) def build_and_compile_cnn_model(): model = models.Sequential() model.add( layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) model.summary() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) return model def main(): strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy( communication=tf.distribute.experimental.CollectiveCommunication.AUTO) BATCH_SIZE_PER_REPLICA = 4 BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync with strategy.scope(): ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat() options = tf.data.Options() options.experimental_distribute.auto_shard_policy = \\ tf.data.experimental.AutoShardPolicy.DATA ds_train = ds_train.with_options(options) # Model building/compiling need to be within `strategy.scope()`. multi_worker_model = build_and_compile_cnn_model() class MyCallback(tf.keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): # monitor the loss and accuracy print(logs) submarine.log_metrics({&quot;loss&quot;: logs[&quot;loss&quot;], &quot;accuracy&quot;: logs[&quot;accuracy&quot;]}, epoch) multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70, callbacks=[MyCallback()]) if __name__ == '__main__': main()  "},{"title":"2. Prepare an environment compatible with the training​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#2-prepare-an-environment-compatible-with-the-training","content":"Build a docker image equipped with the requirement of the environment. eval $(minikube docker-env) ./dev-support/examples/quickstart/build.sh  "},{"title":"3. Submit the experiment​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#3-submit-the-experiment","content":"Open submarine workbench and click + New Experiment Choose Define your experiment Fill the form accordingly. Here we set 3 workers. Step 1Step 2Step 3The experiment is successfully submitted "},{"title":"4. Monitor the process​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#4-monitor-the-process","content":"In our code, we use submarine from submarine-sdk to record the metrics. To see the result, click corresponding experiment with name mnist-example in the workbench.To see the metrics of each worker, you can select a worker from the left top list.  "},{"title":"5. Serve the model (In development)​","type":1,"pageTitle":"Quickstart","url":"docs/gettingStarted/quickstart#5-serve-the-model-in-development","content":""},{"title":"Tensorboard","type":0,"sectionRef":"#","url":"docs/userDocs/others/tensorboard","content":"","keywords":""},{"title":"Write to LogDirs by the environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/userDocs/others/tensorboard#write-to-logdirs-by-the-environment-variable","content":""},{"title":"Environment variable​","type":1,"pageTitle":"Tensorboard","url":"docs/userDocs/others/tensorboard#environment-variable","content":"SUBMARINE_TENSORBOARD_LOG_DIR: Exist in every experiment container. You just need to direct your logs to $(SUBMARINE_TENSORBOARD_LOG_DIR) (NOTICE: it is () not {}), and you can inspect the process on the tensorboard webpage. "},{"title":"Example​","type":1,"pageTitle":"Tensorboard","url":"docs/userDocs/others/tensorboard#example","content":"{ &quot;meta&quot;: { &quot;name&quot;: &quot;tensorflow-tensorboard-dist-mnist&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=$(SUBMARINE_TENSORBOARD_LOG_DIR) --learning_rate=0.01 --batch_size=20&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=512M&quot; } } }  "},{"title":"Connect to the tensorboard webpage​","type":1,"pageTitle":"Tensorboard","url":"docs/userDocs/others/tensorboard#connect-to-the-tensorboard-webpage","content":"Open the experiment page in the workbench, and Click the TensorBoard button.  Inspect the process on tensorboard page.  "},{"title":"Submarine-SDK","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/","content":"","keywords":""},{"title":"Summary​","type":1,"pageTitle":"Submarine-SDK","url":"docs/userDocs/submarine-sdk/#summary","content":"Support Python, Scala, R language for algorithm development Support tracking/metrics APIs which allows developers add tracking/metrics and view tracking/metrics from Submarine Workbench UI. "},{"title":"Experiment Client","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/experiment-client","content":"","keywords":""},{"title":"class ExperimentClient()​","type":1,"pageTitle":"Experiment Client","url":"docs/userDocs/submarine-sdk/experiment-client#class-experimentclient","content":"Client of a submarine server that creates and manages experients and logs. create_experiment(experiment_spec) -&gt; dict​ Create an experiment. Param\tType\tDescription\tDefault Valueexperiment_spec\tDict\tSubmarine experiment spec. More detailed information can be found at Experiment API\tx Returns The detailed info about the submarine experiment. Example from submarine import * client = ExperimentClient() client.create_experiment({ &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Ps&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; }, &quot;Worker&quot;: { &quot;replicas&quot;: 1, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })   patch_experiment(id, experiment_spec) -&gt; dict​ Patch an experiment. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx experiment_spec\tDict\tSubmarine experiment spec. More detailed information of Submarine experiment spec can be found at Experiment API.\tx Returns The detailed info about the submarine experiment. Example client.patch_experiment(&quot;experiment_1626160071451_0008&quot;, { &quot;meta&quot;: { &quot;name&quot;: &quot;tf-mnist-json&quot;, &quot;namespace&quot;: &quot;default&quot;, &quot;framework&quot;: &quot;TensorFlow&quot;, &quot;cmd&quot;: &quot;python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150&quot;, &quot;envVars&quot;: { &quot;ENV_1&quot;: &quot;ENV1&quot; } }, &quot;environment&quot;: { &quot;image&quot;: &quot;apache/submarine:tf-mnist-with-summaries-1.0&quot; }, &quot;spec&quot;: { &quot;Worker&quot;: { &quot;replicas&quot;: 2, &quot;resources&quot;: &quot;cpu=1,memory=1024M&quot; } } })   get_experiment(id) -&gt; dict​ Get the experiment's detailed info by id. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx Returns The detailed info about the submarine experiment. Example experiment = client.get_experiment(&quot;experiment_1626160071451_0008&quot;)   list_experiments(status) -&gt; list[dict]​ List all experiment for the user. Param\tType\tDescription\tDefault Valuestatus\tOptional[str]\tAccepted, Created, Running, Succeeded, Deleted.\tNone Returns List of submarine experiments. Example experiments = client.list_experiments()   delete_experiment(id) -&gt; dict​ Delete the submarine experiment. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx Returns The detailed info about the deleted submarine experiment. Example client.delete_experiment(&quot;experiment_1626160071451_0008&quot;)   get_log(id, onlyMaster)​ Print training logs of all pod of the experiment. By default print all the logs of Pod. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx onlyMaster\tOptional[bool]\tBy default include pod log of &quot;master&quot; which might be Tensorflow PS/Chief or PyTorch master.\tx Return The info of pod logs Example client.get_log(&quot;experiment_1626160071451_0009&quot;)   list_log(status)​ List experiment log. Param\tType\tDescription\tDefault Valuestatus\tString\tAccepted, Created, Running, Succeeded, Deleted.\tx Returns List of submarine experiment logs. Example logs = client.list_log(&quot;Succeeded&quot;)   wait_for_finish(id, polling_interval)​ Waits until the experiment is finished or failed. Param\tType\tDescription\tDefault Valueid\tString\tSubmarine experiment id.\tx polling_interval\tOptional[int]\tHow many seconds between two polls for the status of the experiment.\t10 Returns Submarine experiment logs. Example logs = client.wait_for_finish(&quot;experiment_1626160071451_0009&quot;, 5)   "},{"title":"Python SDK Development","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/pysubmarine/development","content":"","keywords":""},{"title":"Prerequisites​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#prerequisites","content":"This is required for developing &amp; testing changes, we recommend installing pysubmarine in its own conda environment by running the following conda create --name submarine-dev python=3.6 conda activate submarine-dev # Install auto-format and lints from current checkout pip install -r ./dev-support/style-check/python/lint-requirements.txt # Install mypy from current checkout pip install -r ./dev-support/style-check/python/mypy-requirements.txt # test-requirements.txt from current checkout pip install -r ./submarine-sdk/pysubmarine/github-actions/test-requirements.txt # Installs pysubmarine from current checkout pip install -e ./submarine-sdk/pysubmarine  "},{"title":"PySubmarine Docker​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#pysubmarine-docker","content":"We also use docker to provide build environments for CI, development, generate python sdk from swagger. ./run-pysubmarine-ci.sh  The script does the following things: Start an interactive bash sessionMount submarine directory to /workspace and set it as homeSwitch user to be the same user that calls the run-pysubmarine-ci.sh "},{"title":"Coding Style​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#coding-style","content":"Use isort to sort the Python imports and black to format Python codeBoth style is configured in pyproject.tomlTo autoformat code ./dev-support/style-check/python/auto-format.sh  Use flake8 to verify the linter, its' configure is in .flake8.Also, we are using mypy to check the static type in submarine-sdk/pysubmarine/submarine.Verify linter pass before submitting a pull request by running: ./dev-support/style-check/python/lint.sh  If you encouter a unexpected format, use the following method # fmt: off &quot;Unexpected format, formated by yourself&quot; # fmt: on  "},{"title":"Unit Testing​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#unit-testing","content":"We are using pytest to develop our unit test suite. After building the project (see below) you can run its unit tests like so: cd submarine-sdk/pysubmarine  Run unit test pytest --cov=submarine -vs -m &quot;not e2e&quot;  Run integration test pytest --cov=submarine -vs -m &quot;e2e&quot;  Before run this command in local, you should make sure the submarine server is running. "},{"title":"Generate python SDK from swagger​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#generate-python-sdk-from-swagger","content":"We use open-api generatorto generate pysubmarine client API that used to communicate with submarine server. To generate different API Component, please change the code in Bootstrap.java. If just updating java code for NotebookRestApi , ExperimentRestApi or EnvironmentRestApi, please skip step 1. SwaggerConfiguration oasConfig = new SwaggerConfiguration() .openAPI(oas) .resourcePackages(Stream.of(&quot;org.apache.submarine.server.rest&quot;) .collect(Collectors.toSet())) .resourceClasses(Stream.of(&quot;org.apache.submarine.server.rest.NotebookRestApi&quot;, &quot;org.apache.submarine.server.rest.ExperimentRestApi&quot;, &quot;org.apache.submarine.server.rest.EnvironmentRestApi&quot;) .collect(Collectors.toSet())); After starting the server, http://localhost:8080/v1/openapi.json will includes API specs for NotebookRestApi, ExperimentRestApi and EnvironmentRestApi swagger_config.json defines the import path for python SDK Ex: For submarine.client { &quot;packageName&quot; : &quot;submarine.client&quot;, &quot;projectName&quot; : &quot;submarine.client&quot;, &quot;packageVersion&quot;: &quot;0.8.0-SNAPSHOT&quot; } Usage: import submarine.client... Execute ./dev-support/pysubmarine/gen-sdk.sh to generate latest version of SDK. Notice: Please install required package before running the script: lint-requirements.txt In submarine/submarine-sdk/pysubmarine/client/api_client.py line 74 Please change &quot;long&quot;: int if six.PY3 else long, # noqa: F821 to &quot;long&quot;: int,  "},{"title":"Model Management Model Development​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#model-management-model-development","content":"For local development, we can access cluster's service easily thanks to telepresence. To elaborate, we can develop the sdk in local but can reach out to database and minio server by proxy. Install telepresence follow the instruction.Start proxy pod telepresence --new-deployment submarine-dev  You can develop as if in the cluster. "},{"title":"Upload package to PyPi​","type":1,"pageTitle":"Python SDK Development","url":"docs/userDocs/submarine-sdk/pysubmarine/development#upload-package-to-pypi","content":"For Apache Submarine committer and PMCs to do a new release. Change the version from 0.x.x-SNAPSHOT to 0.x.x in setup.pyInstall Python packages cd submarine-sdk/pysubmarine pip install -r github-actions/pypi-requirements.txt  Compiling Your Package It will create build, dist, and project.egg.infoin your local directory python setup.py bdist_wheel  Upload python package to TestPyPI for testing python -m twine upload --repository testpypi dist/*  Upload python package to PyPi python -m twine upload --repository-url https://upload.pypi.org/legacy/ dist/*  "},{"title":"Submarine Client","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/submarine-client","content":"","keywords":""},{"title":"class SubmarineClient()​","type":1,"pageTitle":"Submarine Client","url":"docs/userDocs/submarine-sdk/submarine-client#class-submarineclient","content":"Client of submarine to log metric/param, save model and create/delete serve. log_metric(job_id, key, value, worker_index, timestamp, step) -&gt; None​ Log a single key-value metric with job id and worker index. The value must always be a number. Param\tType\tDescription\tDefault Valuejob_id\tString\tThe job name to which the metric should be logged.\tx key\tString\tMetric name.\tx value\tFloat\tMetric worker_index.\tx worker_index\tString\tParameter worker_index.\tx timestamp\tDatetime\tTime when this metric was calculated. Defaults to the current system time.\tdatetime.now() step\tInteger\tA single integer step at which to log the specified Metrics, by default it's 0.\t0  log_param(job_id, key, value, worker_index) -&gt; None​ Log a single key-value parameter with job id and worker index. The key and value are both strings. Param\tType\tDescription\tDefault Valuejob_id\tString\tThe job name to which the parameter should be logged.\tx key\tString\tParameter name.\tx value\tString\tParameter value.\tx worker_index\tString\tParameter worker_index.\tx  save_model(model, model_type, registered_model_name, input_dim, output_dim) -&gt; None​ Save a model into the minio pod. Param\tType\tDescription\tDefault Valuemodel\tObject\tModel artifact.\tx model_type\tString\tVersion of a registered model.\tx registered_model_name\tString\tIf it is not None, the model will be registered into the model registry with this name.\tNone input_dim\tList&lt;String&gt;\tThe input dimension of the model.\tNone output_dim\tList&lt;String&gt;\tThe output dimension of the model.\tNone  create_serve(self, model_name, model_version, async_req = True) -&gt; dict​ Create serve of a model through Seldon Core. Param\tType\tDescription\tDefault Valuemodel_name\tString\tName of a registered model.\tx model_version\tInteger\tVersion of a registered model.\tx async_req\tBoolean\tExecute request asynchronously.\tTrue  ReturnsReturn a dictionary with inference url. delete_serve(self, model_name, model_version, async_req) -&gt; None​ Delete a serving model. Param\tType\tDescription\tDefault Valuemodel_name\tString\tName of a registered model.\tx model_version\tInteger\tVersion of a registered model.\tx async_req\tBoolean\tExecute request asynchronously.\tTrue "},{"title":"Submarine CLI","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/submarine-cli","content":"","keywords":""},{"title":"Config​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#config","content":"You can set your CLI settings by this command "},{"title":"Init​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#init","content":"submarine config init  Return Submarine CLI Config initialized  Restore CLI config to default (hostname=localhost,port=32080) "},{"title":"Show current config​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#show-current-config","content":"submarine config list  For example : return ╭──────────────────── SubmarineCliConfig ─────────────────────╮ │ { │ │ &quot;connection&quot;: { │ │ &quot;hostname&quot;: &quot;localhost&quot;, │ │ &quot;port&quot;: 32080 │ │ } │ │ } │ ╰─────────────────────────────────────────────────────────────╯  "},{"title":"Set config​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#set-config","content":"submarine config set &lt;parameter_path&gt; &lt;value&gt;  For example, Set connection port to 8080: submarine config set connection.port 8080  "},{"title":"Get config​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#get-config","content":"submarine config get &lt;parameter_path&gt;  For example, submarine config get connection.port  Return connection.port=8080  "},{"title":"Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#notebooks","content":""},{"title":"List Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#list-notebooks","content":"submarine list notebook  "},{"title":"Get Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#get-notebooks","content":"submarine get notebook &lt;notebook id&gt;  you can get notebook id by using list command "},{"title":"Delete Notebooks​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#delete-notebooks","content":"submarine delete notebook &lt;notebook id&gt;  "},{"title":"Experiments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#experiments","content":""},{"title":"List Experiments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#list-experiments","content":"submarine list experiment  "},{"title":"Get Experiment​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#get-experiment","content":"submarine get experiment &lt;experiment id&gt;  you can get experiment id by using list command "},{"title":"Delete Experiment​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#delete-experiment","content":"submarine delete experiment &lt;experiment id&gt; [--wait/--no-wait]  --wait/--no-wait: blocking or non blocking (default no wait) "},{"title":"Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#environments","content":""},{"title":"List Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#list-environments","content":"submarine list environment  "},{"title":"Get Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#get-environments","content":"submarine get environment &lt;environment name&gt;  "},{"title":"Delete Environments​","type":1,"pageTitle":"Submarine CLI","url":"docs/userDocs/submarine-sdk/submarine-cli#delete-environments","content":"submarine delete experiment &lt;environment name&gt;  "},{"title":"Tracking","type":0,"sectionRef":"#","url":"docs/userDocs/submarine-sdk/tracking","content":"","keywords":""},{"title":"Functional api​","type":1,"pageTitle":"Tracking","url":"docs/userDocs/submarine-sdk/tracking#functional-api","content":"submarine.get_tracking_uri() -&gt; str​ Get the tracking URI. If none has been specified, check the environmental variables. If uri is still none, return the default submarine jdbc url. Returns The tracking URI.  submarine.set_tracking_uri(uri) -&gt; None​ set the tracking URI. You can also set the SUBMARINE_TRACKING_URI environment variable to have Submarine find a URI from there. The URI should be database connection string. Param\tType\tDescription\tDefault Valueuri\tString\tSubmarine record data to Mysql server. The database URL is expected in the format &lt;dialect&gt;+&lt;driver&gt;://&lt;username&gt;:&lt;password&gt;@&lt;host&gt;:&lt;port&gt;/&lt;database&gt;.By default it's mysql+pymysql://submarine:password@submarine-database:3306/submarine. More detail : SQLAlchemy docs\tx  submarine.log_param(key: str, value: str) -&gt; None​ log a single key-value parameter. The key and value are both strings. Param\tType\tDescription\tDefault Valuekey\tString\tParameter name.\tx value\tString\tParameter value.\tx  submarine.log_metric(key, value, step=0) -&gt; None​ log a single key-value metric. The value must always be a number. Param\tType\tDescription\tDefault Valuekey\tString\tMetric name.\tx value\tFloat\tMetric value.\tx step\tInteger\tA single integer step at which to log the specified Metrics.\t0  submarine.save_model(model_type, model, registered_model_name, input_dim, output_dim) -&gt; None​ Save a model into the minio pod. Param\tType\tDescription\tDefault Valuemodel_type\tString\tThe type of model. Only support pytorch and tensorflow.\tx model\tObject\tModel artifact.\tx registered_model_name\tString\tIf it is not None, the model will be registered into the model registry with this name.\tNone input_dim\tList&lt;Integer&gt;\tThe input dimension of the model.\tNone output_dim\tList&lt;Integer&gt;\tThe output dimension of the model.\tNone  "}]