Model/predictor-dl-model/README.md

What is predictor_dl_model?

predictor_dl_model is a suite of offline processes to forecast traffic inventory. The suite contains the following modules. More information is included in the module’s directory.

datagen: This module generates factdata table which contains traffic data.
trainer: This module builds and trains a deep learning model based on the factdata table.
pipeline: This module processes factdata table into training-ready data which is used to train the neural network. a. Main-ts only modifies the structure the raw data, does not remove any data. b. Pre-cluster denoises(new)/removes individual uckeys and prepare them for clustering c. Cluster creates clusters and denoises/removes clusters d. Distribution records the relationship between virtual-uckey and uckey e. Norm normalizes attributes f. Tfrecords, save data into tfrecords format

Prerequisites

Cluster: Spark 2.3/HDFS 2.7/YARN 2.3/MapReduce 2.7/Hive 1.2 Driver: Python 3.6, Spark Client 2.3, HDFS Client, tensorflow-gpu 1.10

To install dependencies run: pip install -r requirements.txt

Install and Run

Download the blue-martin/models project
Transfer the predictor_dl_model directory to ~/code/predictor_dl_model/ on a GPU machine which also has Spark Client.
cd predictor_dl_model
pip install -r requirements.txt to install required packages. These packages are install on top of python using pip.
python setup install (to install predictor_dl_model package)
(optional) python set_up.py bdist_egg (to create .egg file to provide to spark-submit)
Follow the steps in ~/code/predictor_dl_model/datagen/README.md to generate data
Go to directory ~/code/predictor_dl_model/predictor_dl_model
Run run.sh or each script individually

Documentation

Documentation is provided through comments in config.yml and README files

Note

saved_model_cli show --dir <model_dir>/ --all