dev-support/mini-submarine/README.md - submarine - Git at Google

 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 # Mini-submarine

 This is a docker image built for submarine development and quick start test.

 **Please Note: don't use the image in production environment. It's only for test purpose.**

 ## Start mini-submarine

 ### Use the image we provide

 ```
 docker pull apache/submarine:mini-0.6.0
 ```

 ### Create image by yourself

 > You may need a VPN if your network is limited

 1.Clone the source code of Submarine
 ```
 git clone https://github.com/apache/submarine.git
 ```

 2.Build Submarine
 ```
 cd ./submarine
 mvn clean package -DskipTests
 ```

 3.Build image of mini-submarine
 >  You can download in advance of these three kind of compressed file for building :
 >  zookeeper-3.4.14.tar.gz , hadoop-2.9.2.tar.gz , spark-2.4.4-bin-hadoop2.7.tgz
 >  and put  them into "submarine/dev-support/mini-submarine/"

 ```
 cd submarine/dev-support/mini-submarine/
 ./build_mini-submarine.sh
 ```
 #### Package An Existing Release Candidates
 When doing release, the release manager might needs to package a artifact candidates in this docker image and public the image candidate for a vote.
 In this scenario, we can do this:

 Put submarine candidate artifacts to a folder like "~/releases/submarine-release"
 ```
 $ ls $release_candidates_path
 submarine-dist-0.7.0-hadoop-2.9.tar.gz        submarine-dist-0.7.0-src.tar.gz.asc
 submarine-dist-0.7.0-hadoop-2.9.tar.gz.asc    submarine-dist-0.7.0-src.tar.gz.sha512
 submarine-dist-0.7.0-hadoop-2.9.tar.gz.sha512 submarine-dist-0.7.0-src.tar.gz
 ```
 ```
 export submarine_version=0.7.0
 export release_candidates_path=~/releases/submarine-release
 ./build_mini-submarine.sh
 #docker run -it -h submarine-dev --net=bridge --privileged -P local/mini-submarine:0.7.0 /bin/bash
 docker tag local/mini-submarine:0.7.0 apache/mini-submarine:0.7.0-RC0
 docker push apache/mini-submarine:0.7.0-RC0
 ```
 In the container, we can verify that the submarine jar version is the expected 0.7.0. Then we can upload this image with a "RC" tag for a vote.

 ### Run mini-submarine image

 ```
 docker run -it -h submarine-dev --name mini-submarine --net=bridge --privileged -P local/mini-submarine:0.7.0 /bin/bash

 # In the container, use root user to bootstrap hdfs and yarn
 /tmp/hadoop-config/bootstrap.sh

 # Two commands to check if yarn and hdfs is running as expected
 yarn node -list -showDetails
 ```

 If you pull the image directly, please replace "local/mini-submarine:0.7.0" with "apache/submarine:mini-0.7.0".

 ### You should see info like this:

 ```
 Total Nodes:1
          Node-Id      Node-State	Node-Http-Address	Number-of-Running-Containers
 submarine-dev:35949         RUNNING	submarine-dev:8042                            0
 Detailed Node Information :
   Configured Resources : <memory:8192, vCores:16, nvidia.com/gpu: 1>
   Allocated Resources : <memory:0, vCores:0>
   Resource Utilization by Node : PMem:4144 MB, VMem:4189 MB, VCores:0.25308025
   Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
   Node-Labels :
 ```

 ```
 hdfs dfs -ls /user
 ```

 > drwxr-xr-x   - yarn supergroup          0 2019-07-22 07:59 /user/yarn

 ## Run workbench server

 1. Setup mysql mariadb server
 > Because mysql and mariadb use the GPL license, So there is no binary file containing mysql in the image, you need to manually execute the script to install it.

 ```
 /tmp/hadoop-config/setup-mysql.sh
 ```

 You can execute command `mysql -uroot` login mysql mariadb.

 2. Start submarine server
 ```
 su yarn
 /opt/submarine-current/bin/submarine-daemon.sh start getMysqlJar
 ```

 3. Login submarine workbench

 Execute the following command in your host machine, Get the access URL of the submarine workbench running in docker

 ```shell
 echo "http://localhost:$(docker inspect --format='{{(index (index .NetworkSettings.Ports "8080/tcp") 0).HostPort}}' mini-submarine)"
 ```
 The URL returned by the command (like to: http://localhost:32819) is opened through a browser. The username and initial password of the workbench are both `admin`.

 ## Run a submarine job

 ### Switch to user yarn

 ```
 su yarn
 ```

 ### Navigate to submarine example directory
 ```
 cd /home/yarn/submarine/
 ```

 ### Run a mnist TF job with submarine + TonY runtime
 ```
 # run TF 1 distributed training job
 ./run_submarine_mnist_tony.sh

 # run TF 2 distributed training job
 ./run_submarine_mnist_tf2_tony.sh
 ```
 When run_submarine_mnist_tony.sh is executed, mnist data is download from the url, [google mnist](https://storage.googleapis.com/cvdf-datasets/mnist/), by default. If the url is unaccessible, you can use parameter "-d" to specify a customized url.
 For example, if you are in mainland China, you can use the following command
 ```
 ./run_submarine_mnist_tony.sh -d http://yann.lecun.com/exdb/mnist/
 ```

 ### Run a mnist TF job via submarine server

 Submarine server is supposed to manage jobs lifecycle. Clients can just submit
 job parameters or yaml file to submarine server instead of submitting jobs
 directly by themselves. Submarine server can handle the rest of the work.

 Set submarine.server.rpc.enabled to true in the file of
 /opt/submarine-current/conf/submarine-site
 ```
   <property>
     <name>submarine.server.rpc.enabled</name>
     <value>true</value>
     <description>Run jobs using rpc server.</description>
   </property>
 ```
 Run the following command to submit a job via submarine server
 ```
 ./run_submarine_mnist_tony_rpc.sh
 ```

 ### Try your own submarine program

 Run container with your source code. You can also use "docker cp" to an existing running container

 1. `docker run -it -h submarine-dev --net=bridge --privileged -v pathToMyScrit.py:/home/yarn/submarine/myScript.py local/hadoop-docker:submarine /bin/bash`

 2. Refer to the `run_submarine_mnist_tony.sh` and modify the script to your script

 3. Try to run it. Since this is a single node environment, keep in mind that the workers could have conflicts with each other. For instance, the mnist_distributed.py example has a workaround to fix the conflicts when two workers are using same "data_dir" to download data set.


 ## Update Submarine Version

 You can follow the documentation instructions to update your own modified and compiled submarine package to the submarine container.

 ### Build Submarine

 ```
 cd submarine-project-dir/
 mvn clean package -DskipTests
 ```

 ### Copy submarine jar to mini-submarine container

 ```
 docker cp submarine-all/target/submarine-all-<SUBMARINE_VERSION>-hadoop-<HADOOP_VERSION>.jar <container-id>:/tmp/
 ```

 ### Modify environment variables

 ```
 cd /home/yarn/submarine
 vi run_customized_submarine-all_mnist.sh

 # Need to modify environment variables based on hadoop and submarine version numbers
 SUBMARINE_VERSION=<submarine-version-number>
 HADOOP_VERSION=<hadoop-version-number> # default 2.9
 ```

 ### Test submarine jar package in container

 ```
 cd /home/yarn/submarine
 ./run_customized_submarine-all_mnist.sh
 ```

 ## Debug Submarine

 When using mini-submarine, you can debug submarine client, applicationMaster and executor for trouble shooting.

 ### <span id="debug">Debug submarine client</span>

 Run the following command to start mini-submarine.

 ```
 docker run -it -P -h submarine-dev --net=bridge --expose=8000 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
 ```

 Debug submarine client with the parameter "--debug"

 ```
 ./run_submarine_mnist_tony.sh --debug
 ```

 Port 8000 is used in the mini-submarine.
 You need to find the debug port mapping between mini-submarine and the host on which run mini-subamrine.

 ```
 docker port <SUBMARINE_CONTAINER_ID>
 ```

 For example, we can get some info like this

 ```
 8000/tcp -> 0.0.0.0:32804
 ```

 Then port 32804 can be used for remote debug.

 ### Debug submarine job applicationMaster

 Run the following command to start mini-submarine.

 ```
 docker run -it -P -h submarine-dev --net=bridge --expose=8001 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
 ```

 Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

 ```
 <property>
   <name>tony.task.am.jvm.opts</name>
   <value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8001</value>
 </property>
 ```

 You can use run_submarine_mnist_tony.sh to submit a job. Port 8001 is used for AM debugging in mini-submarine.
 And the debug port mapping can be obtained using the way as [Debug submarine client](#debug) shows.

 ### Debug submarine job executor

 Run the following command to start mini-submarine.

 ```
 docker run -it -P -h submarine-dev --net=bridge --expose=8002 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
 ```

 Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

 ```
 <property>
   <name>tony.task.executor.jvm.opts</name>
   <value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8002</value>
 </property>
 ```

 Port 8002 is used for executor debugging in mini-submarine.
 To avoid port confliction, you need to use only one executor, which means the parameter of
 submarine job should be like this

 ```
 --num_workers 1 \
 --num_ps 0 \
 ```

 You can get the debug port mapping using the way as [Debug submarine client](#debug) shows.

 ## Run a distributedShell job with docker container

 You can also run a distributedShell job in mini-submarine.

 ```
 cd && ./yarn-ds-docker.sh
 ```

 ## Run a spark job

 Spark jobs are supported as well.

 ```
 cd && cd spark-script && ./run_spark.sh
 ```

 ## Question and answer

 1. Submarine package name error

    Because the package name of submarine 0.3.0 or higher has been changed from `apache.hadoop.yarn.submarine` to `apache.submarine`, So you need to set the Runtime settings in the `/usr/local/hadoop/etc/hadoop/submarine-site.xml` file.

    ```
    <configuration>
       <property>
         <name>submarine.runtime.class</name>
         <value>org.apache.submarine.server.submitter.yarn.YarnRuntimeFactory</value>
       </property>
    </configuration>
    ```
	<!--
	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	# Mini-submarine

	This is a docker image built for submarine development and quick start test.

	Please Note: don't use the image in production environment. It's only for test purpose.

	## Start mini-submarine

	### Use the image we provide

	```
	docker pull apache/submarine:mini-0.6.0
	```

	### Create image by yourself

	> You may need a VPN if your network is limited

	1.Clone the source code of Submarine
	```
	git clone https://github.com/apache/submarine.git
	```

	2.Build Submarine
	```
	cd ./submarine
	mvn clean package -DskipTests
	```

	3.Build image of mini-submarine
	> You can download in advance of these three kind of compressed file for building :
	> zookeeper-3.4.14.tar.gz , hadoop-2.9.2.tar.gz , spark-2.4.4-bin-hadoop2.7.tgz
	> and put them into "submarine/dev-support/mini-submarine/"

	```
	cd submarine/dev-support/mini-submarine/
	./build_mini-submarine.sh
	```
	#### Package An Existing Release Candidates
	When doing release, the release manager might needs to package a artifact candidates in this docker image and public the image candidate for a vote.
	In this scenario, we can do this:

	Put submarine candidate artifacts to a folder like "~/releases/submarine-release"
	```
	$ ls $release_candidates_path
	submarine-dist-0.7.0-hadoop-2.9.tar.gz submarine-dist-0.7.0-src.tar.gz.asc
	submarine-dist-0.7.0-hadoop-2.9.tar.gz.asc submarine-dist-0.7.0-src.tar.gz.sha512
	submarine-dist-0.7.0-hadoop-2.9.tar.gz.sha512 submarine-dist-0.7.0-src.tar.gz
	```
	```
	export submarine_version=0.7.0
	export release_candidates_path=~/releases/submarine-release
	./build_mini-submarine.sh
	#docker run -it -h submarine-dev --net=bridge --privileged -P local/mini-submarine:0.7.0 /bin/bash
	docker tag local/mini-submarine:0.7.0 apache/mini-submarine:0.7.0-RC0
	docker push apache/mini-submarine:0.7.0-RC0
	```
	In the container, we can verify that the submarine jar version is the expected 0.7.0. Then we can upload this image with a "RC" tag for a vote.

	### Run mini-submarine image

	```
	docker run -it -h submarine-dev --name mini-submarine --net=bridge --privileged -P local/mini-submarine:0.7.0 /bin/bash

	# In the container, use root user to bootstrap hdfs and yarn
	/tmp/hadoop-config/bootstrap.sh

	# Two commands to check if yarn and hdfs is running as expected
	yarn node -list -showDetails
	```

	If you pull the image directly, please replace "local/mini-submarine:0.7.0" with "apache/submarine:mini-0.7.0".

	### You should see info like this:

	```
	Total Nodes:1
	Node-Id Node-State Node-Http-Address Number-of-Running-Containers
	submarine-dev:35949 RUNNING submarine-dev:8042 0
	Detailed Node Information :
	Configured Resources : <memory:8192, vCores:16, nvidia.com/gpu: 1>
	Allocated Resources : <memory:0, vCores:0>
	Resource Utilization by Node : PMem:4144 MB, VMem:4189 MB, VCores:0.25308025
	Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
	Node-Labels :
	```

	```
	hdfs dfs -ls /user
	```

	> drwxr-xr-x - yarn supergroup 0 2019-07-22 07:59 /user/yarn

	## Run workbench server

	1. Setup mysql mariadb server
	> Because mysql and mariadb use the GPL license, So there is no binary file containing mysql in the image, you need to manually execute the script to install it.

	```
	/tmp/hadoop-config/setup-mysql.sh
	```

	You can execute command `mysql -uroot` login mysql mariadb.

	2. Start submarine server
	```
	su yarn
	/opt/submarine-current/bin/submarine-daemon.sh start getMysqlJar
	```

	3. Login submarine workbench

	Execute the following command in your host machine, Get the access URL of the submarine workbench running in docker

	```shell
	echo "http://localhost:$(docker inspect --format='{{(index (index .NetworkSettings.Ports "8080/tcp") 0).HostPort}}' mini-submarine)"
	```
	The URL returned by the command (like to: http://localhost:32819) is opened through a browser. The username and initial password of the workbench are both `admin`.

	## Run a submarine job

	### Switch to user yarn

	```
	su yarn
	```

	### Navigate to submarine example directory
	```
	cd /home/yarn/submarine/
	```

	### Run a mnist TF job with submarine + TonY runtime
	```
	# run TF 1 distributed training job
	./run_submarine_mnist_tony.sh

	# run TF 2 distributed training job
	./run_submarine_mnist_tf2_tony.sh
	```
	When run_submarine_mnist_tony.sh is executed, mnist data is download from the url, [google mnist](https://storage.googleapis.com/cvdf-datasets/mnist/), by default. If the url is unaccessible, you can use parameter "-d" to specify a customized url.
	For example, if you are in mainland China, you can use the following command
	```
	./run_submarine_mnist_tony.sh -d http://yann.lecun.com/exdb/mnist/
	```

	### Run a mnist TF job via submarine server

	Submarine server is supposed to manage jobs lifecycle. Clients can just submit
	job parameters or yaml file to submarine server instead of submitting jobs
	directly by themselves. Submarine server can handle the rest of the work.

	Set submarine.server.rpc.enabled to true in the file of
	/opt/submarine-current/conf/submarine-site
	```
	<property>
	<name>submarine.server.rpc.enabled</name>
	<value>true</value>
	<description>Run jobs using rpc server.</description>
	</property>
	```
	Run the following command to submit a job via submarine server
	```
	./run_submarine_mnist_tony_rpc.sh
	```

	### Try your own submarine program

	Run container with your source code. You can also use "docker cp" to an existing running container

	1. `docker run -it -h submarine-dev --net=bridge --privileged -v pathToMyScrit.py:/home/yarn/submarine/myScript.py local/hadoop-docker:submarine /bin/bash`

	2. Refer to the `run_submarine_mnist_tony.sh` and modify the script to your script

	3. Try to run it. Since this is a single node environment, keep in mind that the workers could have conflicts with each other. For instance, the mnist_distributed.py example has a workaround to fix the conflicts when two workers are using same "data_dir" to download data set.


	## Update Submarine Version

	You can follow the documentation instructions to update your own modified and compiled submarine package to the submarine container.

	### Build Submarine

	```
	cd submarine-project-dir/
	mvn clean package -DskipTests
	```

	### Copy submarine jar to mini-submarine container

	```
	docker cp submarine-all/target/submarine-all-<SUBMARINE_VERSION>-hadoop-<HADOOP_VERSION>.jar <container-id>:/tmp/
	```

	### Modify environment variables

	```
	cd /home/yarn/submarine
	vi run_customized_submarine-all_mnist.sh

	# Need to modify environment variables based on hadoop and submarine version numbers
	SUBMARINE_VERSION=<submarine-version-number>
	HADOOP_VERSION=<hadoop-version-number> # default 2.9
	```

	### Test submarine jar package in container

	```
	cd /home/yarn/submarine
	./run_customized_submarine-all_mnist.sh
	```

	## Debug Submarine

	When using mini-submarine, you can debug submarine client, applicationMaster and executor for trouble shooting.

	### <span id="debug">Debug submarine client</span>

	Run the following command to start mini-submarine.

	```
	docker run -it -P -h submarine-dev --net=bridge --expose=8000 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
	```

	Debug submarine client with the parameter "--debug"

	```
	./run_submarine_mnist_tony.sh --debug
	```

	Port 8000 is used in the mini-submarine.
	You need to find the debug port mapping between mini-submarine and the host on which run mini-subamrine.

	```
	docker port <SUBMARINE_CONTAINER_ID>
	```

	For example, we can get some info like this

	```
	8000/tcp -> 0.0.0.0:32804
	```

	Then port 32804 can be used for remote debug.

	### Debug submarine job applicationMaster

	Run the following command to start mini-submarine.

	```
	docker run -it -P -h submarine-dev --net=bridge --expose=8001 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
	```

	Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

	```
	<property>
	<name>tony.task.am.jvm.opts</name>
	<value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8001</value>
	</property>
	```

	You can use run_submarine_mnist_tony.sh to submit a job. Port 8001 is used for AM debugging in mini-submarine.
	And the debug port mapping can be obtained using the way as [Debug submarine client](#debug) shows.

	### Debug submarine job executor

	Run the following command to start mini-submarine.

	```
	docker run -it -P -h submarine-dev --net=bridge --expose=8002 --privileged local/mini-submarine:<REPLACE_VERSION> /bin/bash
	```

	Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

	```
	<property>
	<name>tony.task.executor.jvm.opts</name>
	<value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8002</value>
	</property>
	```

	Port 8002 is used for executor debugging in mini-submarine.
	To avoid port confliction, you need to use only one executor, which means the parameter of
	submarine job should be like this

	```
	--num_workers 1 \
	--num_ps 0 \
	```

	You can get the debug port mapping using the way as [Debug submarine client](#debug) shows.

	## Run a distributedShell job with docker container

	You can also run a distributedShell job in mini-submarine.

	```
	cd && ./yarn-ds-docker.sh
	```

	## Run a spark job

	Spark jobs are supported as well.

	```
	cd && cd spark-script && ./run_spark.sh
	```

	## Question and answer

	1. Submarine package name error

	Because the package name of submarine 0.3.0 or higher has been changed from `apache.hadoop.yarn.submarine` to `apache.submarine`, So you need to set the Runtime settings in the `/usr/local/hadoop/etc/hadoop/submarine-site.xml` file.

	```
	<configuration>
	<property>
	<name>submarine.runtime.class</name>
	<value>org.apache.submarine.server.submitter.yarn.YarnRuntimeFactory</value>
	</property>
	</configuration>
	```