Apache PredictionIO Template Text Classifier Incubator

Clone this repo:
  1. 3d609f8 Re-structure and design preparator and algo. by Kenneth Chan · 1 year, 4 months ago master 4.0
  2. 3dce400 Fix DataSource to read "content", "e-mail", and use label "spam" for tutorial data. by Kenneth Chan · 1 year, 4 months ago 3.1
  3. 55fd981 Merge pull request #3 from EmergentOrder/master by EmergentOrder · 1 year, 7 months ago 3.0
  4. d9d3e49 make spark LR default; fix name by EmergentOrder · 1 year, 8 months ago
  5. bb7b144 Add libs for BID* [not available in public repos] by EmergentOrder · 1 year, 8 months ago

Text Classification Engine

Look at the following tutorial for a Quick Start guide and implementation details.

Release Information

Version 4.0

Re-structure and design preparator and algo. less memory usage and run time is faster. Move BIDMach, VW & SPPMI algo changes to bidmach branch temporarily.

Version 3.1

Fix DataSource to read “content”, “e-mail”, and use label “spam” for tutorial data. Fix engine.json for default algorithm setting.

Version 2.2

Modified PreparedData to use MLLib hashing and tf-idf implementations.

Version 2.1

Fixed dot product implementation in the predict methods to work with batch predict method for evaluation.

Version 2.0

Included three different data sets: e-mail spam, 20 newsgroups, and the rotten tomatoes semantic analysis set. Includes Multinomial Logistic Regression algorithm for text classification.

Version 1.2

Fixed import script bug occuring with Python 2.

Version 1.1 Changes

Changed data import Python script to pull straight from the 20 newsgroups page.