tag | 8f716d26fe075111da0bf1c1d684f2463842d333 | |
---|---|---|
tagger | mvivero091 <mvivero091@gmail.com> | Tue Jun 23 13:01:59 2015 -0700 |
object | 4882e381c10a36bfffb2988a7bccc2e1053c5d01 |
Changed text vectorization implementation for text data n-gram processing speed-up. Algorithms included: Multinomial Naive Bayes, Multinomial Logistic Regression.
commit | 4882e381c10a36bfffb2988a7bccc2e1053c5d01 | [log] [tgz] |
---|---|---|
author | mvivero091 <mvivero091@gmail.com> | Tue Jun 23 13:01:09 2015 -0700 |
committer | mvivero091 <mvivero091@gmail.com> | Tue Jun 23 13:01:09 2015 -0700 |
tree | 24236370724bbc074f9640446124d442c3a5b698 | |
parent | c0f2a10364af93a85a3a1bfbf54929a9d359d77c [diff] |
Update with MLLib hashing and tfidf implementation.
Look at the following tutorial for a Quick Start guide and implementation details.
Modified PreparedData to use MLLib hashing and tf-idf implementations.
Fixed dot product implementation in the predict methods to work with batch predict method for evaluation.
Included three different data sets: e-mail spam, 20 newsgroups, and the rotten tomatoes semantic analysis set. Includes Multinomial Logistic Regression algorithm for text classification.
Fixed import script bug occuring with Python 2.
Changed data import Python script to pull straight from the 20 newsgroups page.