Library for simulating customer purchasing behavior at a fictional chain of petstores for the purpose of generating synthetic transaction data.
The data generator is part of a Gradle multiproject build. Please see the README in the parent directory for build and test instructions.
The data generator can be used as a library (for incorporating in Hadoop or Spark applications) or using a command-line interface. The data generator CLI requires several parameters. To get descriptions:
$ java -jar build/libs/bigpetstore-data-generator-1.1.0-SNAPSHOT.jar
Here is an example for generating 10 stores, 1000 customers, 100 purchasing models, and a year of transactions:
$ java -jar build/libs/bigpetstore-data-generator-1.1.0-SNAPSHOT.jar generatedData/ 10 1000 100 365.0
Several Groovy example script drivers are included in the groovy_example_drivers
directory. Groovy scripts can be used to easily call and interact with classes in the data generator jar without having to create separate Java projects or worry about compilation. I‘ve found them to be very useful for interactive exploration and validating my implementations when unit tests alone aren’t sufficient.
To use Groovy scripts, you will need to have Groovy installed on your system. Build the data generator as instructed above. Then run the scripts in the groovy_example_drivers
directory as so:
$ groovy -classpath ../build/libs/bigpetstore-data-generator-1.1.0-SNAPSHOT.jar MonteCarloExponentialSamplingExample.groovy