blob: 737fe4ad46a2e3cedd9f1e37bfb863e8225531ec [file] [log] [blame]
Ten-pass cross-validation scripts.
tenpass/split-log-into-buckets: Split a mass-check logfile into n
identically-sized buckets, evenly taking messages from all checked corpora and
preserving comments. It does this evenly by running through all buckets
sequentially as each line is read. Output files are named 'out-N.log'.
usage: tenpass/split-log-into-buckets 10 < mass-check.log
10pass-run: the workhorse. Generate a corpus, run this from the 'masses'
directory and leave it overnight. Note that you will need to change
NSBASE and SPBASE at the top of the script, to point to the basename and
path of the split logfiles.
usage: tenpass/10pass-run
10pass-compute-tcr: compute TCR, SpamRecall and SpamPrecision based on results
data from 10pass-run.
usage: tenpass/10pass-compute-tcr