blob: 1a4fd9d927f936458db78bd59c98efdf20d9ee22 [file] [log] [blame]
Mahout Change Log
Release 0.8 - unreleased
MAHOUT-1272: Parallel SGD matrix factorizer for SVDrecommender (Peng Cheng via ssc)
MAHOUT-1271: classify-20newsgroups.sh fails during the seqdirectory step (smarthi)
MAHOUT-1269: Cleanup deprecated Lucene 3.x API calls in lucene2seq utility unit tests (smarthi)
MAHOUT-833: Make conversion to sequence files map-reduce (Josh Patterson, smarthi)
MAHOUT-1268: Wrong output directory for CVB (Mark Wicks via ssc)
MAHOUT-1264: Performance optimizations in RecommenderJob (ssc)
MAHOUT-1262: Cleanup LDA code (ssc)
MAHOUT-1255: Fix for weights in Multinomial sometimes overflowing in BallKMeans (dfilimon)
MAHOUT-1254: Final round of cleanup for StreamingKMeans (dfilimon)
MAHOUT-1263: Serialise/Deserialise Lambda value for OnlineLogisticRegression (Mike Davy via smarthi)
MAHOUT-1258: Another shot at findbugs and checkstyle (ssc)
MAHOUT-1253: Add experiment tools for StreamingKMeans, part 1 (dfilimon)
MAHOUT-884: Matrix Concatenate Utility (Lance Norskog via smarthi)
MAHOUT-1250: Deprecate unused algorithms (ssc)
MAHOUT-1251: Optimize MinHashMapper (ssc)
MAHOUT-1211: Disabled swallowing of IOExceptions is Closeables.close for writers (dfilimon)
MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc)
MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc)
MAHOUT-1163: Make random forest classifier meta-data file human readable (Marty Kube via ssc)
MAHOUT-1243: Dictionary file format in Lucene-Mahout integration is not in SequenceFileFormat (ssc)
MAHOUT-974: org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId (ssc)
MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Elena Smirnova via smarthi)
MAHOUT-1237: Total cluster cost isn't computed properly (dfilimon)
MAHOUT-1196: LogisticModelParameters uses csv.getTargetCategories() even if csv is not used. (Vineet Krishnan via ssc)
MAHOUT-1224: Add the option of running a StreamingKMeans pass in the Reducer before BallKMeans (dfilimon)
MAHOUT-993: Some vector dumper flags are expecting arguments. (Andrew Look via robinanil)
MAHOUT-1228: Cleanup .gitignore (Stevo Slavic via ssc)
MAHOUT-1047: CVB hangs after completion (Angel Martinez Gonzalez via smarthi)
MAHOUT-1235: ParallelALSFactorizationJob does not use VectorSumCombiner (ssc)
MAHOUT-1230: SparceMatrix.clone() is not deep copy (Maysam Yabandeh via tdunning)
MAHOUT-1232: VectorHelper.topEntries() throws a NPE when number of NonZero elements in vector < maxEntries (smarthi)
MAHOUT-1229: Conf directory content from Mahout distribution archives cannot be unpacked (Stevo Slavic via smarthi)
MAHOUT-1213: SSVD job doesn't clean it's temp dir, and fails when seeing it again (smarthi)
MAHOUT-1223: Fixed point skipped in StreamingKMeans when iterating through centroids from a reducer (dfilimon)
MAHOUT-1222: Fix total weight in FastProjectionSearch (dfilimon)
MAHOUT-1219: Remove LSHSearcher from StreamingKMeansTest. It causes it to sometimes fail (dfilimon)
MAHOUT-1221: SparseMatrix.viewRow is sometimes readonly. (Maysam Yabandeh via smarthi)
MAHOUT-1219: Remove LSHSearcher from SearchQualityTest. It causes it to fail, but the failure is not very meaningful (dfilimon)
MAHOUT-1217: Nearest neighbor searchers sometimes fail to remove points: fix in FastProjectionSearch's searchFirst (dfilimon)
MAHOUT-1216: Add locality sensitive hashing and a LocalitySensitiveHash searcher (dfilimon)
MAHOUT-1181: Adding StreamingKMeans MapReduce classes (dfilimon)
MAHOUT-1212: Incorrect classify-20newsgroups.sh file description (Julian Ortega via smarthi)
MAHOUT-1209: DRY out maven-compiler-plugin configuration (Stevo Slavic via smarthi)
MAHOUT-1207: Fix typos in description in parent pom (Stevo Slavic via smarthi)
MAHOUT-1199: Improve javadoc comments of mahout-integration (Angel Martinez Gonzalez via smarthi)
MAHOUT-1162: Adding BallKMeans and StreamingKMeans clustering algorithms (dfilimon)
MAHOUT-1205: ParallelALSFactorizationJob should leverage the distributed cache (ssc)
MAHOUT-1156: Adding nearest neighbor Searchers (dfilimon)
MAHOUT-1202: Speed up Vector operations (dfilimon)
MAHOUT-1155: Make MatrixSlice a Vector (and fix Centroid cloning; MAHOUT-1202) (dfilimon)
MAHOUT-1189: CosineDistanceMeasure doesn't return 0 for two 0 vectors (dfilimon)
MAHOUT-1180: Multinomial<T> throws ConcurrentModificationException when iterating and setting probabilities (dfilimon)
MAHOUT-1192: Speed up Vector Operations (robinanil)
MAHOUT-1191: Cleanup Vector Benchmarks make it less variable (robinanil)
MAHOUT-1190: SequentialAccessSparseVector function assignment is very slow and other iterator woes (robinanil)
MAHOUT-1188: Inconsistent reference to Lucene versions in code and POM (smarthi)
MAHOUT-1161: Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception (ssc)
MAHOUT-1187: Update Commons Lang to Commons Lang3 (smarthi)
MAHOUT-1184 Another take at pmd, findbugs and checkstyle (ssc)
MAHOUT-1182: Remove useless append (Dave Brosius via tdunning)
MAHOUT-1176: Introduce a changelog file to raise contributors attribution (ssc)
MAHOUT-1108: Allows cluster-reuters.sh example to be executed on a cluster (elmer.garduno via gsingers)
MAHOUT-961: Fix issue in decision forest tree visualizer to properly show stems of tree (Ikumasa Mukai via gsingers)
MAHOUT-944: Create SequenceFiles out of Lucene document storage (no term vectors required) (Frank Scholten, gsingers)
MAHOUT-958: Fix issue with globs in RepresentativePointsDriver (Adam Baron, Vikram Dixit K, ehgjr via gsingers)
MAHOUT-1084: Fixed issue with too many clusters in synthetic control example (liutengfei, gsingers)
MAHOUT-1103: Fixed issue with splitting clusters on Hadoop (Matt Molek, gsingers)
MAHOUT-1126: Filter out bad META-INF files in job packaging (Pat Ferrel, gsingers)
MAHOUT-1211: Change deprecated Closeables.closeQuietly calls (smarthi, gsingers, srowen, dlyubimov)