This example provides an end-to-end pipeline for a common datahack competition - Urban Sounds Classification Example.
After logging in, the data set can be downloaded. The details of the dataset and the link to download it are given below:
The dataset contains 8732 wav files which are audio samples(<= 4s)) of street sounds like engine_idling, car_horn, children_playing, dog_barking and so on. The task is to classify these audio samples into one of the following 10 labels:
siren, street_music, drilling, dog_bark, children_playing, gun_shot, engine_idling, air_conditioner, jackhammer, car_horn
To be able to run this example:
pip install -r requirements.txt
If you are in the directory where the requirements.txt file lies, this step installs the required libraries to run the example. The main dependency that is required is: Librosa. The version used to test the example is: 0.6.2 For more details, refer here: https://librosa.github.io/librosa/install.html
Download the dataset(train.zip, test.zip) required for this example from the location: https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU
Extract both the zip archives into the current directory - after unzipping you would get 2 new folders namely, Train and Test and two csv files - train.csv, test.csv
Assuming you are in a directory “UrbanSounds”, after downloading and extracting train.zip, the folder structure should be:
UrbanSounds
- Train
- 0.wav, 1.wav ...
- train.csv
- train.py
- predict.py ...
Apache MXNet is installed on the machine. For instructions, go to the link: https://mxnet.apache.org/install/
For information on the current design of how the AudioFolderDataset is implemented, refer below: https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio
For training:
python train.py
or
python train.py --train ./Train --csv train.csv --batch_size 32 --epochs 30
For prediction:
python predict.py
or
python predict.py --pred ./Test