layout: default title: Mahout Wiki


Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license.

{toc:style=disc|minlevel=2}

General

Overview -- Mahout? What's that supposed to be?

Quickstart -- learn how to quickly setup Apache Mahout for your project.

FAQ -- Frequent questions encountered on the mailing lists.

Developer Resources -- overview of the Mahout development infrastructure.

How To Contribute -- get involved with the Mahout community.

How To Become A Committer -- become a member of the Mahout development community.

Hadoop -- several of our implementations depend on Hadoop.

Machine Learning Open Source Software -- other projects implementing Open Source Machine Learning libraries.

Mahout -- The name, history and its pronunciation

Community

Who we are -- who are the developers behind Apache Mahout?

Books, Tutorials, Talks, Articles, News, Background Reading, etc. on Mahout

Issue Tracker -- see what features people are working on, submit patches and file bugs.

Source Code (SVN) -- [Fisheye|http://fisheye6.atlassian.com/browse/mahout] -- download the Mahout source code from svn.

Mailing lists and IRC -- links to our mailing lists, IRC channel and archived design and algorithm discussions, maybe your questions was answered there already?

Version Control -- where we track our code.

Powered By Mahout -- who is using Mahout in production?

Professional Support -- who is offering professional support for Mahout?

Mahout and Google Summer of Code -- All you need to know about Mahout and GSoC.

Glossary of commonly used terms and abbreviations

Installation/Setup

System Requirements -- what do you need to run Mahout?

Quickstart -- get started with Mahout, run the examples and get pointers to further resources.

Downloads -- a list of Mahout releases.

Download and installation -- build Mahout from the sources.

Mahout on Amazon's EC2 Service -- run Mahout on Amazon's EC2.

Mahout on Amazon's EMR -- Run Mahout on Amazon's Elastic Map Reduce

Integrating Mahout into an Application -- integrate Mahout's capabilities in your application.

Examples

  1. ASF Email Examples -- Examples of recommenders, clustering and classification all using a public domain collection of 7 million emails.

Implementation Background

Requirements and Design

Matrix and Vector Needs -- requirements for Mahout vectors.

Collection(De-)Serialization

Collections and Algorithms

Learn more about mahout-collections , containers for efficient storage of primitive-type data and open hash tables.

Learn more about the Algorithms discussed and employed by Mahout.

Learn more about the Mahout recommender implementation .

Utilities

This section describes tools that might be useful for working with Mahout.

Converting Content -- Mahout has some utilities for converting content such as logs to formats more amenable for consumption by Mahout. Creating Vectors -- Mahout's algorithms operate on vectors. Learn more on how to generate these from raw data. Viewing Result -- How to visualize the result of your trained algorithms.

Data

Collections -- To try out and test Mahout's algorithms you need training data. We are always looking for new training data collections.

Benchmarks

Mahout Benchmarks

Committer's Resources

  • Testing -- Information on test plans and ideas for testing

Project Resources

Additional Resources

How To Edit This Wiki

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

  • Create an account by clicking the “Login” link at the top of any page, and picking a username and password.
  • Edit any page by pressing Edit at the top of the page

There are some conventions used on the Mahout wiki:

* {noformat}+*TODO:*+{noformat} (+*TODO:*+ ) is used to denote sections

that definitely need to be cleaned up. * {noformat}+Mahout_(version)+{noformat} (+Mahout_0.2+) is used to draw attention to which version of Mahout a feature was (or will be) added to Mahout.