third_party/rusty-machine/examples/README.md

Examples with rusty-machine

This directory gathers fully-fledged programs, each using a piece of rusty-machine's API.

Overview

K-Means
SVM
Neural Networks
Naïve Bayes

The Examples

K Means

Generating Clusters

Generating Clusters randomly generates data around a pair of clusters. It then trains a K-Means model to learn new centroids from this sample.

The example shows a basic usage of the K-Means API - an Unsupervised model. We also show some basic usage of rulinalg to generate the data.

Sample run:

cargo run --example k-means_generating_cluster
   Compiling rusty-machine v0.4.0 (file:///rusty-machine/rusty-machine)
     Running `target/debug/examples/k-means_generating_cluster`
K-Means clustering example:
Generating 2000 samples from each centroids:
⎡-0.5 -0.5⎤
⎣   0  0.5⎦
Training the model...
Model Centroids:
⎡-0.812 -0.888⎤
⎣-0.525  0.877⎦
Classifying the samples...
Samples closest to first centroid: 1878
Samples closest to second centroid: 2122

SVM

Sign Learner

Sign learner constructs and evaluates a model that learns to recognize the sign of an input number.

The sample shows a basic usage of the SVM API. It also configures the SVM algorithm with a specific kernel (HyperTan). Evaluations are run in a loop to log individual predictions and do some book keeping for reporting the performance at the end. The salient part from rusty-machine is to use the train and predict methods of the SVM model.

The accuracy evaluation is simplistic, so the model manages 100% accuracy (which is really too simple an example).

Sample run:

cargo run --example svm-sign_learner
   Compiling rusty-machine v0.3.0 (file:///rusty-machine/rusty-machine)
     Running `target/debug/examples/svm-sign_learner`
Sign learner sample:
Training...
Evaluation...
-1000 -> -1: true
-900 -> -1: true
-800 -> -1: true
-700 -> -1: true
-600 -> -1: true
-500 -> -1: true
-400 -> -1: true
-300 -> -1: true
-200 -> -1: true
-100 -> -1: true
0 -> -1: true
100 -> 1: true
200 -> 1: true
300 -> 1: true
400 -> 1: true
500 -> 1: true
600 -> 1: true
700 -> 1: true
800 -> 1: true
900 -> 1: true
Performance report:
Hits: 20, Misses: 0
Accuracy: 100

Neural Networks

AND Gate

AND gate makes an AND gate out of a perceptron.

The sample code generates random data to learn from. The input data is like an electric signal between 0 and 1, with some jitter that makes it not quite 0 or 1. By default, the code decides that any pair input “above” (0.7, 0.7) is labeled as 1.0 (AND gate passing), otherwise labeled as 0.0 (AND gate blocking). This means that the training set is biased toward learning the passing scenario: An AND gate passes 25% of the time on average, and we'd like it to learn it.

The test data uses only the 4 “perfect” inputs for a gate: (0.0, 0.0), (1.0, 0.0), etc.

The code generates 10,000 training data points by default. Please give it a try, and then change SAMPLE, the number of training data points, and THRESHOLD, the value for “deciding” for a passing gate.

Sample run:

> cargo run --example nnet-and_gate
   Compiling rusty-machine v0.3.0 (file:///rusty-machine/rusty-machine)
     Running `target/debug/examples/nnet-and_gate`
AND gate learner sample:
Generating 10000 training data and labels...
Training...
Evaluation...
Got  Expected
0.00  0
0.00  0
0.96  1
0.01  0
Hits: 4, Misses: 0
Accuracy: 100%

Naïve Bayes

Dog Classification

Suppose we have a population composed of red dogs and white dogs, whose friendliness, furriness, and speed can be measured. In this example we train a Naïve Bayes model to determine whether a dog is white or red.

The group of white dogs are friendlier, furrier, and slower than the red dogs. Given the color of a dog, friendliness, furriness, and speed are independent of each other (a requirement of the Naïve Bayes model).

In the example code we will generate our own data and then train our model using it. This is a common technique used to validate a model. We generate the data by sampling each of the dogs features from Gaussian random variables. We will have a total of 6 Gaussian random variables representing three features for both colors of dog. As we are using Gaussian random variables we will use a Gaussian Naive Bayes model. Once we have generated our data we will convert it into Matrix structures and train our model.

Sample run:

$ cargo run --example naive_bayes_dogs
...
Predicted: Red; Actual: Red; Accurate? true
Predicted: Red; Actual: Red; Accurate? true
Predicted: White; Actual: Red; Accurate? false
Predicted: Red; Actual: White; Accurate? false
Predicted: Red; Actual: Red; Accurate? true
Predicted: White; Actual: White; Accurate? true
Predicted: White; Actual: White; Accurate? true
Predicted: White; Actual: White; Accurate? true
Predicted: White; Actual: White; Accurate? true
Predicted: Red; Actual: Red; Accurate? true
Accuracy: 822/1000 = 82.2%