<!doctype html>
<html class="no-js" dir="ltr" lang="en-US">
<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=1100">
	<title>MADlib</title>

	<script src="https://use.typekit.net/qbv8hok.js"></script>
	<script>try{Typekit.load({ async: true });}catch(e){}</script>

	<link rel="shortcut icon" href="favicon.ico" />

	<link rel='stylesheet' href='style.css' type='text/css' media='all' />

	<script type='text/javascript' src='http://code.jquery.com/jquery-1.10.2.min.js'></script>
	<script type="text/javascript" src="html5lightbox.js"></script>
	<script type='text/javascript' src='master.js'></script>

</head>
<body class="page page-id-15 page-template page-template-default">
	<div class="header">
		<div class="container">
			<a href="index.html" class="logo">
				Home
			</a>
			<div class="nav">
				<div class="menu-primary-navigation-container"><ul id="menu-primary-navigation" class="menu"><li id="menu-item-27" class="menu-item menu-item-type-post_type menu-item-object-page page_item page-item-18 current_page_item menu-item-27"><a href="index.html">Home</a></li>
					<li id="menu-item-28" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-28"><a href="product.html">Product</a></li>
					<li id="menu-item-25" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-25"><a title="Documentation" href="documentation.html">Documentation</a></li>
					<li id="menu-item-24" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-24"><a href="community.html">Community</a></li>
					<li id="menu-item-26" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-26 nav-button last"><a href="download.html">Download</a></li>
					</ul>
				</div>
			</div>
		</div>
	</div>
	<div class="primary-content">
	<div class="container section-head">
		<div class="section-name">
			<h1 class="h2">Product</h1>
		</div>
			</div>
<div class="primary-content">
<div class="container section-head with-nav por">
<div class="row">
<div class="section-name span2">
<h2>Product</h2>
&nbsp;

</div>
<div class="sub-nav span10">
<div class="menu-secondary-navigation-container">
<ul id="menu-secondary-navigation" class="menu">
	<li id="menu-item-69" class="menu-item menu-item-type-post_type menu-item-object-page current-page-ancestor current-page-parent menu-item-69 current-menu-item"><a id="overview-link" style="cursor: pointer;">Overview</a></li>
	<li id="menu-item-68" class="menu-item menu-item-type-post_type menu-item-object-page page_item page-item-61 current_page_item menu-item-68"><a id="features-link" style="cursor: pointer;">Features</a></li>
</ul>
</div>
</div>
&nbsp;

</div>
&nbsp;

</div>
<div class="container overview-tab">
<div class="row">
<div class="span4 heading"><a href="https://www.youtube.com/watch?v=DGPZwpB92Aw"><img src="_media/video-product.png" alt="" /></a></div>
<div class="span8">
<p class="intro-text">In a world of ever increasing data size, many existing analytics solutions are not up to the task. The MADlib project seeks to address this need by creating a framework built to take advantage of modern computing capabilities to provide robust solutions that scale with the needs of the business.</p>
<p class="intro-text">Our approach is to leverage the efforts of commercial practice, academic research, and the open-source development community.  Please watch the short video below for more details on the product.</p>

<h3>Key philosophies driving the architecture of MADlib:</h3>
<ul>
	<li>Operate on the data locally in-database.  Do not move data between multiple runtime environments unnecessarily.</li>
	<li>Utilize best of breed database engines, but separate the machine learning logic from database specific implementation details.</li>
	<li>Leverage MPP shared nothing technology, such as the Greenplum Database, to provide parallelism and scalability.</li>
	<li>Open implementation maintaining active ties into Apache community and ongoing academic research.</li>
</ul>
</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-73 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/classification.png" alt="classification" width="96" height="117" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Classification</h3>
<p class="exerpt">When the desired output is categorical in nature, we use classification methods to build a model that predicts which of the various categories a new result would fall into. The goal of classification is to be able to correctly label incoming records with the correct class for the record.</p>
<p class="body"><strong>Example:</strong> If we had data that described various demographic data and other features of individuals applying for loans, and we had historical data that included what past loans had defaulted, then we could build a model that described the likelihood that a new set of demographic data would result in a loan default. In this case, the categories are “will default” or “won’t default” which are two discrete classes of output.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-82 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/regression.png" alt="regression" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Regression</h3>
<p class="exerpt">When the desired output is continuous in nature, we use regression methods to build a model that predicts the output value.</p>
<p class="body"><strong>Example:</strong> If we had data that described properties of real estate listings, then we could build a model to predict the sale value for homes based on the known characteristics of the houses. This is a regression problem because the output response is continuous in nature, rather than categorical.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-82 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/neural-net.png" alt="neural-net" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Deep Learning</h3>
<p class="exerpt">Deep learning is a type of machine learning, inspired by biology of the brain, that uses a class of algorithms called artificial neural networks.  These networks are effective at solving a wide variety of problems, primarily in the area of supervised learning.  GPU acceleration is widely used to speed the training of deep neural nets.</p>
<p class="body"><strong>Example:</strong> If we want to match a video of an employee entering an office lobby with her picture on file, we could use a convolutional neural network to do this.  This would save her from having to get out her employee badge and swipe it into a machine.  It could also help reduce queues in the lobby during the morning rush.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-170 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/clustering.png" alt="clustering" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Clustering</h3>
<p class="exerpt">Here we are trying to identify groups of data such that the items within one cluster are more similar to each other than they are to the items in any other cluster.</p>
<p class="body"><strong>Example:</strong> In customer segmentation analysis, the goal is to identify specific groups of customers that behave in a similar fashion, so that various marketing campaigns can be designed to reach these markets.   When the customer segments are known in advance this would be a supervised classification task. When we let the data itself identify the segments, this becomes a clustering task.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-186 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/topic-modelling.png" alt="topic-modelling" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Topic Modeling</h3>
<p class="exerpt">Topic modeling is similar to clustering in that it attempts to identify clusters of documents that are similar to each other, but it is more specific to the text domain where it is also trying to identify the main themes of those documents.</p>
<p class="body">&nbsp;</p>

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-189 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/rule-mining.png" alt="rule-mining" width="96" height="118" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Association Rule Mining</h3>
<p class="exerpt">Also called market basket analysis or frequent itemset mining, this is attempting to identify which items tend to occur together more frequently than random chance would indicate, suggesting an underlying relationship between the items.</p>
<p class="body"><strong>Example:</strong> In an online web store, association rule mining can be used to identify what products tend to be purchased together.  This can then be used as input into a product recommendation engine to suggest items that may be of interest to the customer and provide upsell opportunities.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-191 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/descriptive-statistics.png" alt="descriptive-statistics" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Descriptive Statistics</h3>
<p class="exerpt">Descriptive statistics don&#8217;t provide a model and thus are not considered a learning method.  However, they can be helpful in providing information to an analyst to understand the underlying data, and can provide valuable insights into the data that may influence choice of data model.</p>
<p class="body"><strong>Example:</strong> Calculating the distribution of data within each variable of a dataset can help an analyst understand which variables should be treated as categorical variables, and which should be treated as continuous variables, including the sort of distribution the values fall in.</p>
&nbsp;

</div>
&nbsp;

</div>
</div>
<div class="container features-posts">
<div class="post-193 post type-post status-publish format-standard hentry category-feature row">
<div class="span4">
<div class="entry-thumbnail"><img class="attachment-post-thumbnail wp-post-image" src="_media/featured/validation.png" alt="validation" width="96" height="115" /></div>
&nbsp;

</div>
<div class="span8">
<h3>Validation</h3>
<p class="exerpt">Using a model without understanding the accuracy of that model can lead to a poor outcome.  For that reason, it is important to understand the error of a model and to evaluate the model for accuracy on test data.  Frequently in data analysis, a separation is made between training data and test data solely for the purpose of providing statistically valid analysis of the validity of the model, and assessment that the model is not over-fitting the training data.  N-fold cross validation is also frequently utilized.</p>
<p class="body">&nbsp;</p>

</div>
&nbsp;

</div>
</div>
</div>			</div>
	<div class="footer">
	  <div class="container">
		 	<img src='http://apache.org/images/asf-logo.gif' width="310" height="80"/>
		  <br/>
		  <br/>
	    <p>
	      Copyright &copy; <script>	var d = new Date();document.write(d.getFullYear());</script> <a href='http://www.apache.org/'>The Apache Software Foundation</a>
	      <br>
	      Apache, Apache MADlib, the Apache feather and the MADlib logo are trademarks of The Apache Software Foundation
	    </p>
	  </div>
	</div>
</body>
</html>
