What is predictor_dl_model?
predictor_dl_model is a suite of offline processes to forecast traffic inventory. The suite contains the following modules. More information is included in the module’s directory.
- datagen: This module generates factdata table which contains traffic data.
- trainer: This module builds and trains a deep learning model based on the factdata table.
- pipeline: This module processes factdata table into training-ready data which is used to train the neural network. a. Main-ts only modifies the structure the raw data, does not remove any data. b. Pre-cluster denoises(new)/removes individual uckeys and prepare them for clustering c. Cluster creates clusters and denoises/removes clusters d. Distribution records the relationship between virtual-uckey and uckey e. Norm normalizes attributes f. Tfrecords, save data into tfrecords format
Prerequisites
Cluster: Spark 2.3/HDFS 2.7/YARN 2.3/MapReduce 2.7/Hive 1.2 Driver: Python 3.6, Spark Client 2.3, HDFS Client, tensorflow-gpu 1.10
To install dependencies run: pip install -r requirements.txt
Install and Run
- Download the blue-martin/models project
- Transfer the predictor_dl_model directory to ~/code/predictor_dl_model/ on a GPU machine which also has Spark Client.
- cd predictor_dl_model
- pip install -r requirements.txt to install required packages. These packages are install on top of python using pip.
- python setup install (to install predictor_dl_model package)
- (optional) python set_up.py bdist_egg (to create .egg file to provide to spark-submit)
- Follow the steps in ~/code/predictor_dl_model/datagen/README.md to generate data
- Go to directory ~/code/predictor_dl_model/predictor_dl_model
- Run run.sh or each script individually
Documentation
Documentation is provided through comments in config.yml and README files
Note
saved_model_cli show --dir <model_dir>/ --all