To run the examples here, you need to:
Create a JSON configuration file containing train,valid and test data, model parameters, metrics, save model path, resources. e.g. deepfm.json
Install submarine python bindings by setup.py:
python ./submarine/submarine-sdk/pysubmarine/setup.py install
python run_deepfm.py -conf=deepfm.json -task_type train
python run_deepfm.py -conf=deepfm.json -task_type evaluate
Upload data to a shared file system like hdfs, s3.
Create a JSON configuration file for distributed training. e.g. deepfm_distributed.json
Submit Job
SUBMARINE_VERSION=0.4.0 SUBMARINE_HADOOP_VERSION=2.9 java -cp $(${HADOOP_COMMON_HOME}/bin/hadoop classpath --glob):submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar:${HADOOP_CONF_PATH} \ org.apache.submarine.client.cli.Cli job run --name deepfm-job-001 \ --framework tensorflow \ --verbose \ --input_path "" \ --num_workers 2 \ --worker_resources memory=4G,vcores=4 \ --num_ps 1 \ --ps_resources memory=4G,vcores=4 \ --worker_launch_cmd "myvenv.zip/venv/bin/python run_deepfm.py -conf=deepfm_distributed.json" \ --ps_launch_cmd "myvenv.zip/venv/bin/python run_deepfm.py -conf=deepfm_distributed.json" \ --insecure \ --conf tony.containers.resources=myvenv.zip#archive,submarine-all-${SUBMARINE_VERSION}-hadoop-${SUBMARINE_HADOOP_VERSION}.jar,deepfm_distributed.json,run_deepfm.py \ --conf tony.chief.instances=1 \ --conf tony.chief.memory=4G \ --conf tony.chief.vcores=4 \ --conf tony.chief.command="myvenv.zip/venv/bin/python run_deepfm.py -conf=deepfm_distributed.json"