blob: 5cd8af54a4a05cd4ac79266e5aca317d88441ee3 [file] [log] [blame]
Mahout Change Log
Release 0.12.0 - unreleased
MAHOUT-1775: FileNotFoundException caused by aborting the process of downloading Wikipedia dataset (Bowei Zhang via smarthi)
MAHOUT-1771: Cluster dumper omits indices and 0 elements for dense vector or sparse containing 0s (srowen)
MAHOUT-1613: classifier.df.tools.Describe does not handle -D parameters (haohui mai via smarthi)
MAHOUT-1642: Iterator class within SimilarItems class always misses the first element (Oleg Zotov via smarthi)
MAHOUT-1675: Remove MLP from codebase (ZJaffe via smarthi)
Release 0.11.0 - 2015-08-07
MAHOUT-1744: Deprecate lucene2seq (apalumbo)
MAHOUT-1761: Upgraded to Apache parent pom v17 (sslavic)
MAHOUT-1745: Purge deprecated ConcatVectorsJob from codebase (apalumbo)
MAHOUT-1757: small fix in spca formula (smarthi)
MAHOUT-1756: Missing +=: and *=: operators on vectors (smarthi)
NOJIRA: Clean up CLI help for spark-rowsimilarity and fixed test that intermitently failed (pferrel)
MAHOUT-1685: Move Mahout shell to Spark 1.3+ (dlyubimov, apalumbo)
MAHOUT-1653: Spark 1.3 (pferrel, apalumbo)
MAHOUT-1754: Distance and squared distance matrices routines (dlyubimov)
MAHOUT-1753: First and second moment routines (dlyubimov)
MAHOUT-1746: mxA ^ 2, mxA ^ 0.5 to mean the same thing as mxA * mxA and mxA ::= sqrt _ (dlyubimov)
MAHOUT-1736: Implement allreduceBlock() on H2O (avati)
MAHOUT-1752: Implement CbindScalar operator on H2O (avati)
MAHOUT-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf (dlyubimov)
MAHOUT-1713: Performance and parallelization improvements for AB', A'B, A'A spark physical operators (dlyubimov)
MAHOUT-1714: Add MAHOUT_OPTS environment when running Spark shell (dlyubimov)
MAHOUT-1715: Closeable API for broadcast tensors (dlyubimov)
MAHOUT-1716: Scala logging style (dlyubimov)
MAHOUT-1717: allreduceBlock() operator api and Spark implementation (dlyubimov)
MAHOUT-1718: Support for conversion of any type-keyed DRM into ordinally-keyed DRM (dlyubimov)
MAHOUT-1719: Unary elementwise function operator and function fusions (dlyubimov)
MAHOUT-1720: Support 1 cbind X, X cbind 1 etc. for both Matrix and DRM (dlyubimov)
MAHOUT-1721: rowSumsMap() summary for non-int-keyed DRMs (dlyubimov)
MAHOUT-1722: DRM row sampling api (dlyubimov)
MAHOUT-1723: Optional structural "flavor" abstraction for in-core matrices (dlyubimov)
MAHOUT-1724: Optimizations of matrix-matrix in-core multiplication based on structural flavors (dlyubimov)
MAHOUT-1725: elementwise power operator ^ (dlyubimov)
MAHOUT-1726: R-like vector concatenation operator (dlyubimov)
MAHOUT-1727: Elementwise analogues of scala.math functions for tensor types (dlyubimov)
MAHOUT-1728: In-core functional assignments (dlyubimov)
MAHOUT-1729: Straighten out behavior of Matrix.iterator() and iterateNonEmpty() (dlyubimov)
MAHOUT-1730: New mutable transposition view for in-core matrices (dlyubimov)
MAHOUT-1731: Deprecate SparseColumnMatrix (dlyubimov)
MAHOUT-1732: Native support for kryo serialization of tensor types (dlyubimov)
Release 0.10.1 - 2015-05-31
MAHOUT-1704: Pare down dependency jar for h2o (apalumbo)
MAHOUT-1697: Fixed paths to which math-scala and spark modules docs get packaged under in bin distribution archive (sslavic)
MAHOUT-1696: QRDecomposition.solve(...) can return incorrect Matrix types (apalumbo)
MAHOUT-1690: CLONE - Some vector dumper flags are expecting arguments. (smarthi)
MAHOUT-1693: FunctionalMatrixView materializes row vectors in scala shell (apalumbo)
MAHOUT-1680: Renamed mahout-distribution to apache-mahout-distribution (sslavic)
Release 0.10.0 - 2015-04-11
MAHOUT-1630: Incorrect SparseColumnMatrix.numSlices() causes IndexException in toString() (Oleg Nitz, smarthi)
MAHOUT-1665: Update hadoop commands in example scripts (akm)
MAHOUT-1676: Deprecate MLP, ConcatenateVectorsJob and ConcatenateVectorsReducer in the codebase (apalumbo)
MAHOUT-1622: MultithreadedBatchItemSimilarities outputs incorrect number of similarities (Jesse Daniels, Anand Avati via smarthi)
MAHOUT-1605: Make VisualizerTest locale independent (Frank Rosner, Anand Avati via smarthi)
MAHOUT-1635: Getting an exception when I provide classification labels manually for Naive Bayes (apalumbo)
MAHOUT-1662: Potential Path bug in SequenceFileVaultIterator breaks DisplaySpectralKMeans (Shannon Quinn)
MAHOUT-1656: Change SNAPSHOT version from 1.0 to 0.10.0 (smarthi)
MAHOUT-1593: cluster-reuters.sh does not work complaining java.lang.IllegalStateException (smarthi via akm)
MAHOUT-1661: All Lanczos modules marked as @Deprecated and slated for removal in future releases (Shannon Quinn)
MAHOUT-1638: H2O bindings fail at drmParallelizeWithRowLabels(...) (Anand Avati via apalumbo)
MAHOUT-1667: Hadoop 1 and 2 profile in POM (sslavic)
MAHOUT-1564: Naive Bayes Classifier for New Text Documents (apalumbo)
MAHOUT-1524: Script to auto-generate and view the Mahout website on a local machine (Saleem Ansari via apalumbo)
MAHOUT-1589: Deprecate mahout.cmd due to lack of support
MAHOUT-1655: Refactors mr-legacy into mahout-hdfs and mahout-mr, Spark now depends on much reduced mahout-hdfs
MAHOUT-1522: Handle logging levels via log4j.xml (akm)
MAHOUT-1602: Euclidean Distance Similarity Math (Leonardo Fernandez Sanchez, smarthi)
MAHOUT-1619: HighDFWordsPruner overwrites cache files (Burke Webster, smarthi)
MAHOUT-1516: classify-20newsgroups.sh failed: /tmp/mahout-work-jpan/20news-all does not exists in hdfs. (Jian Pan via apalumbo)
MAHOUT-1559: Add documentation for and clean up the wikipedia classifier example (apalumbo)
MAHOUT-1598: extend seq2sparse to handle multiple text blocks of same document (Wolfgang Buchnere via akm)
MAHOUT-1659: Remove deprecated Lanczos solver from spectral clustering in mr-legacy (Shannon Quinn)
MAHOUT-1612: NullPointerException happens during JSON output format for clusterdumper (smarthi, Manoj Awasthi)
MAHOUT-1652: Java 7 update (smarthi)
MAHOUT-1639: Streaming kmeans doesn't properly validate estimatedNumMapClusters -km (smarthi)
MAHOUT-1493: Port Naive Bayes to Scala DSL (apalumbo)
MAHOUT-1611: Preconditions.checkArgument in org.apache.mahout.utils.ConcatenateVectorsJob (Haishou Ma via smarthi)
MAHOUT-1615: SparkEngine drmFromHDFS returning the same Key for all Key,Vec Pairs for Text-Keyed SequenceFiles (Anand Avati, dlyubimov, apalumbo)
MAHOUT-1610: Update tests to pass in Java 8 (srowen)
MAHOUT-1608: Add option in WikipediaToSequenceFile to remove category labels from documents (apalumbo)
MAHOUT-1604: Spark version of rowsimilarity driver and associated additions to SimilarityAnalysis.scala (pferrel)
MAHOUT-1500: H2O Integration (Anand Avati via apalumbo)
MAHOUT-1606 - Add rowSums, rowMeans and diagonal extraction operations to distributed matrices (dlyubimov)
MAHOUT-1603: Tweaks for Spark 1.0.x (dlyubimov & pferrel)
MAHOUT-1596: implement rbind() operator (Anand Avati and dlyubimov)
MAHOUT-1597: A + 1.0 (element-wise scala operation) gives wrong result if rdd is missing rows, Spark side (dlyubimov)
MAHOUT-1595: MatrixVectorView - implement a proper iterateNonZero() (Anand Avati via dlyubimov)
MAHOUT-1590 Mahout unit test failures due to guava version conflict on hadoop 2 (Venkat Ranganathan via sslavic)
MAHOUT-1529(e): Move dense/sparse matrix test in mapBlock into spark (Anand Avati via dlyubimov)
MAHOUT-1583: cbind() operator for Scala DRMs (dlyubimov)
MAHOUT-1563: Eliminated warnings about multiple scala versions (sslavic)
MAHOUT-1541, MAHOUT-1568, MAHOUT-1569: Created text-delimited file I/O traits and classes on spark, a MahoutDriver for a CLI and a ItemSimilairtyDriver using the CLI
MAHOUT-1573: More explicit parallelism adjustments in math-scala DRM apis; elements of automatic parallelism management (dlyubimov)
MAHOUT-1580: Optimize getNumNonZeroElements() (ssc)
MAHOUT-1464: Cooccurrence Analysis on Spark (pat)
MAHOUT-1578: Optimizations in matrix serialization (ssc)
MAHOUT-1572: blockify() to detect (naively) the data sparsity in the loaded data (dlyubimov)
MAHOUT-1571: Functional Views are not serialized as dense/sparse correctly (dlyubimov)
MAHOUT-1566: (Experimental) Regular ALS factorizer with conversion tests, optimizer enhancements and bug fixes (dlyubimov)
MAHOUT-1537: Minor fixes to spark-shell (Anand Avati via dlyubimov)
MAHOUT-1529: Finalize abstraction of distributed logical plans from backend operations (dlyubimov)
MAHOUT-1489: Interactive Scala & Spark Bindings Shell & Script processor (dlyubimov)
MAHOUT-1346: Spark Bindings (DRM) (dlyubimov)
MAHOUT-1555: Exception thrown when a test example has the label not present in training examples (Karol Grzegorczyk via smarthi)
MAHOUT-1446: Create an intro for matrix factorization (Jian Wang via ssc)
MAHOUT-1480: Clean up website on 20 newsgroups (Andrew Palumbo via ssc)
MAHOUT-1561: cluster-syntheticcontrol.sh not running locally with MAHOUT_LOCAL=true (Andrew Palumbo via ssc)
MAHOUT-1558: Clean up classify-wiki.sh and add in a binary classification problem (Andrew Palumbo via ssc)
MAHOUT-1560: Last batch is not filled correctly in MultithreadedBatchItemSimilarities (Jarosław Bojar)
MAHOUT-1554: Provide more comprehensive classification statistics (Karol Grzegorczyk via ssc)
MAHOUT-1548: Fix broken links in quickstart webpage (Andrew Palumbo via ssc)
MAHOUT-1542: Tutorial for playing with Mahout's Spark shell (ssc)
MAHOUT-1533: Remove Frequent Pattern Mining (ssc)
MAHOUT-1532: Add solve() function to the Scala DSL (ssc)
MAHOUT-1530: Custom prompt and welcome message for the Spark Shell (ssc)
MAHOUT-1527: Fix wikipedia classifier example (Andrew Palumbo via ssc)
MAHOUT-1526: Ant file in examples (ssc)
MAHOUT-1523: Remove @author tags in sparkbindings (ssc)
MAHOUT-1521: lucene2seq - Error trying to load data from stored field (when non-indexed) (Terry Blankers via frankscholten)
MAHOUT-1520: Fix links in Mahout website documentation (Saleem Ansari via smarthi)
MAHOUT-1519: Remove StandardThetaTrainer (Andrew Palumbo via ssc)
MAHOUT-1517: Remove casts to int in ALSWRFactorizer (ssc)
MAHOUT-1513: Deprecate Canopy Clustering (ssc)
MAHOUT-1511: Renaming core to mrlegacy (frankscholten)
MAHOUT-1510: Goodbye MapReduce (ssc)
MAHOUT-1509: Invalid URL in link from "quick start/basics" page (Nick Martin, smarthi)
MAHOUT-1508: Performance problems with sparse matrices (ssc)
MAHOUT-1505: structure of clusterdump's JSON output (akm)
MAHOUT-1504: Enable/fix thetaSummer job in TrainNaiveBayesJob (Andrew Palumbo, smarthi)
MAHOUT-1503: TestNaiveBayesDriver fails in sequential mode (Andrew Palumbo, smarthi)
MAHOUT-1502: Update Naive Bayes Webpage to Current Implementation (Andrew Palumbo via ssc)
MAHOUT-1501: ClusterOutputPostProcessorDriver has private default constructor (ssc)
MAHOUT-1498: DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed using oozie (Sergey via ssc)
MAHOUT-1497: mahout resplit not producing splited files (ssc)
MAHOUT-1496: Create a website describing the distributed ALS recommender (Jian Wang via ssc)
MAHOUT-1491: Spectral KMeans Clustering doesn't clean its /tmp dir and fails when seeing it again (smarthi)
MAHOUT-1488: DisplaySpectralKMeans fails: examples/output/clusteredPoints/part-m-00000 does not exist (Saleem Ansari via smarthi)
MAHOUT-1483: Organize links in web site navigation bar (akm)
MAHOUT-1482: Rework quickstart website (Jian Wang via ssc)
MAHOUT-1476: Cleanup website on Hidden Markov Models (akm)
MAHOUT-1475: Cleanup website on Naive Bayes (smarthi)
MAHOUT-1472: Cleanup website on fuzzy kmeans (smarthi)
MAHOUT-1471: Cleanup website for Canopy clustering (smarthi)
MAHOUT-1468: Creating a new page for StreamingKMeans documentation on mahout website (Maxim Arap and Pavan Kumar via akm)
MAHOUT-1467: ClusterClassifier readPolicy leaks file handles (Avi Shinnar, smarthi)
MAHOUT-1466: Cluster visualization fails to execute (ssc)
MAHOUT-1465: Clean up README (akm)
MAHOUT-1463: Modify OnlineSummarizers to use the TDigest dependency from Maven Central (tdunning, smarthi)
MAHOUT-1460: Remove reference to Dirichlet in ClusterIterator (frankscholten)
MAHOUT-1459: Move Hadoop related code out of CanopyClusterer (frankscholten)
MAHOUT-1458: Remove KMeansConfigKeys and FuzzyKMeansConfigKeys (frankscholten)
MAHOUT-1457: Move EigenSeedGenerator into spectral kmeans package (frankscholten)
MAHOUT-1455: Forkcount config causes JVM crashes during build (frankscholten)
MAHOUT-1451: Cleaning up the examples for clustering on the website (Gaurav Misra via ssc)
MAHOUT-1450: Cleaning up clustering documentation on mahout website (Pavan Kumar)
MAHOUT-1449: Update the Known Issues in Random Forests Page (Manoj Awasthi via ssc)
MAHOUT-1448: In Random Forest, the training does not support multiple input files. The input dataset must be one single file. (Manoj Awasthi via ssc)
MAHOUT-1447: ImplicitFeedbackAlternatingLeastSquaresSolver tests and features (Adam Ilardi via ssc)
MAHOUT-1445: Create an intro for item based recommender (Nick Martin via ssc)
MAHOUT-1440: Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans (Andrew Palumbo via ssc)
MAHOUT-1438: "quickstart" tutorial for building a simple recommender (Maciej Mazur and Steve Cook via ssc)
MAHOUT-1434: Dead links on the web ste (Kevin Moulart, smarthi)
MAHOUT-1433: Make SVDRecommender look at all unknown items of a user per default (ssc)
MAHOUT-1429: Parallelize YtransposeY in ImplicitFeedbackAlternatingLeastSquaresSolver (Adam Ilardi via ssc)
MAHOUT-1428: Recommending already consumed items (Dodi Hakim via ssc)
MAHOUT-1425: SGD classifier example with bank marketing dataset. (frankscholten)
MAHOUT-1420: Add solr-recommender to examples (Pat Ferrel via akm)
MAHOUT-1419: Random decision forest is excessively slow on numeric features (srowen)
MAHOUT-1417: Random decision forest implementation fails in Hadoop 2 (srowen)
MAHOUT-1416: Make access of DecisionForest.read(dataInput) less restricted (Manoj Awasthi via smarthi)
MAHOUT-1415: Clone method on sparse matrices fails if there is an empty row which has not been set explicitly (till.rohrmann via ssc)
MAHOUT-1413: Rework Algorithms page (ssc)
MAHOUT-1388: Add command line support and logging for MLP (Yexi Jiang via ssc)
MAHOUT-1385: Caching Encoders don't cache (Johannes Schulte, Manoj Awasthi via ssc)
MAHOUT-1356: Ensure unit tests fail fast when writing outside mvn target directory (isabel, smarthi, dweiss, frankscholten, akm)
MAHOUT-1329: Mahout for hadoop 2 (gcapan, Sergey Svinarchuk)
MAHOUT-1310: Mahout support windows (Sergey Svinarchuk via ssc)
MAHOUT-1278: Upgraded to apache parent pom version 16 (sslavic)
Release 0.9 - 2014-02-01
MAHOUT-1387: Create page for release notes (ssc)
MAHOUT-1411: Random test failures from TDigestTest (smarthi)
MAHOUT-1410: clusteredPoints do not contain a vector id (smarthi, Andrew Musselman)
MAHOUT-1409: MatrixVectorView has index check error (tdunning)
MAHOUT-1402: Zero clusters using streaming k-means option in cluster-reuters.sh (smarthi)
MAHOUT-1401: Resurrect Frequent Pattern mining (smarthi)
MAHOUT-1400: Remove references to deprecated and removed algorithms from examples scripts (ssc)
MAHOUT-1399: Fixed multiple slf4j bindings when running Mahout examples issue (sslavic)
MAHOUT-1398: FileDataModel should provide a constructor with a delimiterPattern (Roy Guo via ssc)
MAHOUT-1396: Accidental use of commons-math won't work with next Hadoop 2 release (srowen)
MAHOUT-1394: Undeprecate Lanczos (ssc)
MAHOUT-1393: Remove duplicated code from getTopTerms and getTopFeatures in AbstractClusterWriter (Diego Carrion via smarthi)
MAHOUT-1392: Streaming KMeans should write centroid output to a 'part-r-xxxx' file when executed in sequential mode (smarthi)
MAHOUT-1390: SVD hangs for certain inputs (tdunning)
MAHOUT-1389: Complementary Naive Bayes Classifier not getting called when "-c" option is activated (Gouri Shankar Majumdar via smarthi)
MAHOUT-1384: Executing the MR version of Naive Bayes/CNB of classify_20newgroups.sh fails in seqdirectory step (smarthi)
MAHOUT-1382: Upgrade Mahout third party jars for 0.9 Release (smarthi)
MAHOUT-1380: Streaming KMeans fails when executed in Sequential Mode (smarthi)
MAHOUT-1379: ClusterQualitySummarizer fails with the new T-Digest for clusters with 1 data point (smarthi)
MAHOUT-1378: Running Random Forest with Ignored features fails when loading feature descriptor from JSON file (Sam Wu via smarthi)
MAHOUT-1377: Exclude JUnit.jar from tarball (Sergey Svinarchuk via smarthi)
MAHOUT-1374: Ability to provide input file with userid, itemid pair (Aliaksei Litouka via ssc)
MAHOUT-1371: Arff loader can misinterpret nominals with integer, real or string (Mansur Iqbal via smarthi)
MAHOUT-1370: Vectordump doesn't write to output file in MapReduce Mode (smarthi)
MAHOUT-1368: Convert OnlineSummarizer to use the new TDigest (tdunning)
MAHOUT-1367: WikipediaXmlSplitter --> Exception in thread "main" java.lang.NullPointerException (smarthi)
MAHOUT-1364: Upgrade Mahout codebase to Lucene 4.6 (Frank Scholten)
MAHOUT-1363: Rebase packages in mahout-scala (dlyubimov)
MAHOUT-1362: Remove examples/bin/build-reuters.sh (smarthi)
MAHOUT-1361: Online algorithm for computing accurate Quantiles using 1-D clustering (tdunning)
MAHOUT-1358: StreamingKMeansThread throws IllegalArgumentException when REDUCE_STREAMING_KMEANS is set to true (smarthi)
MAHOUT-1355: InteractionValueEncoder produces wrong traceDictionary entries (Johannes Schulte via smarthi)
MAHOUT-1353: Visibility of preparePreferenceMatrix directory location (Pat Ferrel, ssc)
MAHOUT-1352: Option to change RecommenderJob output format (Pat Ferrel, ssc)
MAHOUT-1351: Adding DenseVector support to AbstractCluster (David DeBarr via smarthi)
MAHOUT-1349: Clusterdumper/loadTermDictionary crashes when highest index in (sparse) dictionary vector is larger than dictionary vector size (Andrew Musselman via smarthi)
MAHOUT-1347: Add Streaming K-Means clustering algorithm to examples/bin/cluster-reuters.sh (smarthi)
MAHOUT-1345: Enable randomised testing for all Mahout modules (Dawid Weiss, Isabel, sslavic, Frank Scholten, smarthi)
MAHOUT-1343: JSON output format support in cluster dumper (Telvis Calhoun via sslavic)
MAHOUT-1333: Fixed examples bin directory permissions in distribution archives (Mike Percy via sslavic)
MAHOUT-1319: seqdirectory -filter argument silently ignored when run as MR (smarthi)
MAHOUT-1317: Clarify some of the messages in Preconditions.checkArgument (Nikolai Grinko, smarthi)
MAHOUT-1314: StreamingKMeansReducer throws NullPointerException when REDUCE_STREAMING_KMEANS is set to true (smarthi)
MAHOUT-1313: Fixed unwanted integral division bug in RowSimilarityJob downsampling code where precision should have been retained (sslavic)
MAHOUT-1312: LocalitySensitiveHashSearch does not limit search results (sslavic)
MAHOUT-1308: Cannot extend CandidateItemsStrategy due to restricted visibility (David Geiger, smarthi)
MAHOUT-1301: toString() method of SequentialAccessSparseVector has excess comma at the end (Alexander Senov, smarthi)
MAHOUT-1297: New module for linear algebra scala DSL (dlyubimov)
MAHOUT-1296: Remove deprecated algorithms (ssc)
MAHOUT-1295: Excluded all Maven's target directories from distribution archives (sslavic)
MAHOUT-1294: Cleanup previously installed artifacts from CI server local repository (sslavic)
MAHOUT-1293: Source distribution tar.gz archive cannot be unpacked on Linux (sslavic)
MAHOUT-1292: lucene2seq should validate the 'id' field (Frank Scholten via smarthi)
MAHOUT-1291: MahoutDriver yields cosmetically suboptimal exception when bin/mahout runs without args, on some Hadoop versions (srowen)
MAHOUT-1290: Issue when running Mahout Recommender Demo (Helder Garay Martins via smarthi)
MAHOUT-1289: Move downsampling code into RowSimilarityJob (ssc)
MAHOUT-1287: classifier.sgd.CsvRecordFactory incorrectly parses CSV format (Alex Franchuk via smarthi)
MAHOUT-1285: Arff loader can misparse string data as double (smarthi)
MAHOUT-1284: DummyRecordWriter's bug with reused Writables (Maysam Yabandeh via smarthi)
MAHOUT-1275: Dropped bz2 distribution format for source and binaries (sslavic)
MAHOUT-1265: Multilayer Perceptron (Yexi Jiang via smarthi)
MAHOUT-1261: TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE (Carl Clark, smarthi)
MAHOUT-1242: No key redistribution function for associative maps (Tharindu Rusira via smarthi)
MAHOUT-1030: Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable (Andrew Musselman, Pat Ferrel, Jeff Eastman, Lars Norskog, smarthi)
Release 0.8 - 2013-07-25
MAHOUT-1272: Parallel SGD matrix factorizer for SVDrecommender (Peng Cheng via ssc)
MAHOUT-1271: classify-20newsgroups.sh fails during the seqdirectory step (smarthi)
MAHOUT-1269: Cleanup deprecated Lucene 3.x API calls in lucene2seq utility unit tests (smarthi)
MAHOUT-833: Make conversion to sequence files map-reduce (Josh Patterson, smarthi)
MAHOUT-1268: Wrong output directory for CVB (Mark Wicks via ssc)
MAHOUT-1264: Performance optimizations in RecommenderJob (ssc)
MAHOUT-1262: Cleanup LDA code (ssc)
MAHOUT-1255: Fix for weights in Multinomial sometimes overflowing in BallKMeans (dfilimon)
MAHOUT-1254: Final round of cleanup for StreamingKMeans (dfilimon)
MAHOUT-1263: Serialise/Deserialise Lambda value for OnlineLogisticRegression (Mike Davy via smarthi)
MAHOUT-1258: Another shot at findbugs and checkstyle (ssc)
MAHOUT-1253: Add experiment tools for StreamingKMeans, part 1 (dfilimon)
MAHOUT-884: Matrix Concatenate Utility (Lance Norskog via smarthi)
MAHOUT-1250: Deprecate unused algorithms (ssc)
MAHOUT-1251: Optimize MinHashMapper (ssc)
MAHOUT-1211: Disabled swallowing of IOExceptions is Closeables.close for writers (dfilimon)
MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc)
MAHOUT-1164: Make ARFF integration generate meta-data in JSON format (Marty Kube via ssc)
MAHOUT-1163: Make random forest classifier meta-data file human readable (Marty Kube via ssc)
MAHOUT-1243: Dictionary file format in Lucene-Mahout integration is not in SequenceFileFormat (ssc)
MAHOUT-974: org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use integer as userId and itemId (ssc)
MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Elena Smirnova via smarthi)
MAHOUT-1237: Total cluster cost isn't computed properly (dfilimon)
MAHOUT-1196: LogisticModelParameters uses csv.getTargetCategories() even if csv is not used. (Vineet Krishnan via ssc)
MAHOUT-1224: Add the option of running a StreamingKMeans pass in the Reducer before BallKMeans (dfilimon)
MAHOUT-993: Some vector dumper flags are expecting arguments. (Andrew Look via robinanil)
MAHOUT-1228: Cleanup .gitignore (Stevo Slavic via ssc)
MAHOUT-1047: CVB hangs after completion (Angel Martinez Gonzalez via smarthi)
MAHOUT-1235: ParallelALSFactorizationJob does not use VectorSumCombiner (ssc)
MAHOUT-1230: SparceMatrix.clone() is not deep copy (Maysam Yabandeh via tdunning)
MAHOUT-1232: VectorHelper.topEntries() throws a NPE when number of NonZero elements in vector < maxEntries (smarthi)
MAHOUT-1229: Conf directory content from Mahout distribution archives cannot be unpacked (Stevo Slavic via smarthi)
MAHOUT-1213: SSVD job doesn't clean it's temp dir, and fails when seeing it again (smarthi)
MAHOUT-1223: Fixed point skipped in StreamingKMeans when iterating through centroids from a reducer (dfilimon)
MAHOUT-1222: Fix total weight in FastProjectionSearch (dfilimon)
MAHOUT-1219: Remove LSHSearcher from StreamingKMeansTest. It causes it to sometimes fail (dfilimon)
MAHOUT-1221: SparseMatrix.viewRow is sometimes readonly. (Maysam Yabandeh via smarthi)
MAHOUT-1219: Remove LSHSearcher from SearchQualityTest. It causes it to fail, but the failure is not very meaningful (dfilimon)
MAHOUT-1217: Nearest neighbor searchers sometimes fail to remove points: fix in FastProjectionSearch's searchFirst (dfilimon)
MAHOUT-1216: Add locality sensitive hashing and a LocalitySensitiveHash searcher (dfilimon)
MAHOUT-1181: Adding StreamingKMeans MapReduce classes (dfilimon)
MAHOUT-1212: Incorrect classify-20newsgroups.sh file description (Julian Ortega via smarthi)
MAHOUT-1209: DRY out maven-compiler-plugin configuration (Stevo Slavic via smarthi)
MAHOUT-1207: Fix typos in description in parent pom (Stevo Slavic via smarthi)
MAHOUT-1199: Improve javadoc comments of mahout-integration (Angel Martinez Gonzalez via smarthi)
MAHOUT-1162: Adding BallKMeans and StreamingKMeans clustering algorithms (dfilimon)
MAHOUT-1205: ParallelALSFactorizationJob should leverage the distributed cache (ssc)
MAHOUT-1156: Adding nearest neighbor Searchers (dfilimon)
MAHOUT-1202: Speed up Vector operations (dfilimon)
MAHOUT-1155: Make MatrixSlice a Vector (and fix Centroid cloning; MAHOUT-1202) (dfilimon)
MAHOUT-1189: CosineDistanceMeasure doesn't return 0 for two 0 vectors (dfilimon)
MAHOUT-1180: Multinomial<T> throws ConcurrentModificationException when iterating and setting probabilities (dfilimon)
MAHOUT-1192: Speed up Vector Operations (robinanil)
MAHOUT-1191: Cleanup Vector Benchmarks make it less variable (robinanil)
MAHOUT-1190: SequentialAccessSparseVector function assignment is very slow and other iterator woes (robinanil)
MAHOUT-1188: Inconsistent reference to Lucene versions in code and POM (smarthi)
MAHOUT-1161: Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception (ssc)
MAHOUT-1187: Update Commons Lang to Commons Lang3 (smarthi)
MAHOUT-1184 Another take at pmd, findbugs and checkstyle (ssc)
MAHOUT-1182: Remove useless append (Dave Brosius via tdunning)
MAHOUT-1176: Introduce a changelog file to raise contributors attribution (ssc)
MAHOUT-1108: Allows cluster-reuters.sh example to be executed on a cluster (elmer.garduno via gsingers)
MAHOUT-961: Fix issue in decision forest tree visualizer to properly show stems of tree (Ikumasa Mukai via gsingers)
MAHOUT-944: Create SequenceFiles out of Lucene document storage (no term vectors required) (Frank Scholten, gsingers)
MAHOUT-958: Fix issue with globs in RepresentativePointsDriver (Adam Baron, Vikram Dixit K, ehgjr via gsingers)
MAHOUT-1084: Fixed issue with too many clusters in synthetic control example (liutengfei, gsingers)
MAHOUT-1103: Fixed issue with splitting clusters on Hadoop (Matt Molek, gsingers)
MAHOUT-1126: Filter out bad META-INF files in job packaging (Pat Ferrel, gsingers)
MAHOUT-1211: Change deprecated Closeables.closeQuietly calls (smarthi, gsingers, srowen, dlyubimov)