blob: ae80358cd9bbe84292835a6e94295b9c810b1bc5 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8.1
| Rendered using Apache Maven Fluido Skin 1.6
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache Hivemall &#x2013; Apache Hivemall Overview</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.6.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.6.min.js"></script>
</head>
<body class="topBarEnabled">
<a href="https://github.com/apache/incubator-hivemall">
<img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png"
alt="Fork me on GitHub">
</a>
<div id="topbar" class="navbar navbar-fixed-top navbar-inverse">
<div class="navbar-inner">
<div class="container"><div class="nav-collapse">
<ul class="nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Project <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="index.html" title="Home">Home</a></li>
<li><a href="download.html" title="Download">Download</a></li>
<li><a href="licenses.html" title="Licenses">Licenses</a></li>
<li><a href="team.html" title="Team">Team</a></li>
<li><a href="poweredby.html" title="Powered By">Powered By</a></li>
<li><a href="http://incubator.apache.org/projects/hivemall.html" title="Incubation Status">Incubation Status</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="userguide/index.html" title="User Guide">User Guide</a></li>
<li><a href="overview.html" title="Overview">Overview</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/HIVEMALL" target="_blank" title="Wiki">Wiki</a></li>
<li><a href="faq.html" title="FAQ">FAQ</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Get Involved <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="mailing-lists.html" title="Mailing Lists">Mailing Lists</a></li>
<li><a href="https://issues.apache.org/jira/browse/HIVEMALL" target="_blank" title="Issues (Jira)">Issues (Jira)</a></li>
<li><a href="repository.html" title="Source (Git)">Source (Git)</a></li>
<li><a href="https://travis-ci.org/apache/incubator-hivemall" target="_blank" title="Travis CI">Travis CI</a></li>
<li><a href="contributing.html" title="Contributing">Contributing</a></li>
<li><a href="release-guide.html" title="Release Guide">Release Guide</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/foundation/how-it-works.html" target="_blank" title="How Apache Works">How Apache Works</a></li>
<li><a href="http://www.apache.org/foundation/" target="_blank" title="Foundation">Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html" target="_blank" title="Sponsoring Apache">Sponsoring Apache</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" target="_blank" title="Thanks">Thanks</a></li>
</ul>
</li>
</ul>
<ul class="nav pull-right"><li>
<a href="https://twitter.com/ApacheHivemall" class="twitter-follow-button" data-show-count="false" data-align="right" data-size="large" data-show-screen-name="true" data-lang="en">Follow ApacheHivemall</a>
<script type="text/javascript">!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</li></ul>
</div>
</div>
</div>
</div>
<div class="container">
<div id="banner">
<div class="pull-left"><div id="bannerLeft"><h2>Apache Hivemall</h2>
</div>
</div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li id="publishDate">Last Published: 2019-10-31<span class="divider">|</span>
</li>
<li id="projectVersion">Version: 0.6.0-incubating-SNAPSHOT</li>
</ul>
</div>
<div id="bodyColumn" >
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
--><h1>Apache Hivemall Overview</h1>
<p>Apache Hivemall is a scalable machine learning library that runs on Apache Hive/Pig/Spark. Apache Hivemall is designed to be scalable to the number of training instances as well as the number of training features.</p>
<div class="section">
<h2><a name="Supported_Algorithms"></a>Supported Algorithms</h2>
<p>Apache Hivemall provides machine learning functionality as well as feature engineering functions through UDFs/UDAFs/UDTFs of Hive. </p></div>
<div class="section">
<h2><a name="Binary_Classification"></a>Binary Classification</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Perceptron">Perceptron</a></p></li>
<li>
<p>Passive Aggressive (PA, PA1, PA2)</p></li>
<li>
<p>Confidence Weighted (CW)</p></li>
<li>
<p>Adaptive Regularization of Weight Vectors (AROW)</p></li>
<li>
<p>Soft Confidence Weighted (SCW1, SCW2)</p></li>
<li>
<p>AdaGradRDA (w/ hinge loss)</p></li>
<li>
<p>Factorization Machine (w/ logistic loss)</p></li>
</ul>
<p><i>My recommendation is AROW, SCW1, AdaGradRDA, and Factorization Machine while it depends.</i></p></div>
<div class="section">
<h2><a name="Multi-class_Classification"></a>Multi-class Classification</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Perceptron">Perceptron</a></p></li>
<li>
<p>Passive Aggressive (PA, PA1, PA2)</p></li>
<li>
<p>Confidence Weighted (CW)</p></li>
<li>
<p>Adaptive Regularization of Weight Vectors (AROW)</p></li>
<li>
<p>Soft Confidence Weighted (SCW1, SCW2)</p></li>
<li>
<p>Random Forest Classifier</p></li>
<li>
<p>Gradient Tree Boosting (<i>Experimental</i>)</p></li>
</ul>
<p><i>My recommendation is AROW and SCW while it depends.</i></p></div>
<div class="section">
<h2><a name="Regression"></a>Regression</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Logistic_regression">Logistic Regression</a> using <a class="externalLink" href="http://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic Gradient Descent</a></p></li>
<li>
<p>AdaGrad, AdaDelta (w/ logistic Loss)</p></li>
<li>
<p>Passive Aggressive Regression (PA1, PA2)</p></li>
<li>
<p>AROW regression</p></li>
<li>
<p>Random Forest Regressor</p></li>
<li>
<p>Factorization Machine (w/ squared loss)</p></li>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Polynomial_regression">Polynomial Regression</a></p></li>
</ul>
<p><i>My recommendation for is AROW regression, AdaDelta, and Factorization Machine while it depends.</i></p></div>
<div class="section">
<h2><a name="Recommendation"></a>Recommendation</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/MinHash">Minhash</a> (<a class="externalLink" href="http://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH</a> with jaccard index)</p></li>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Matrix_decomposition">Matrix Factorization</a> (sgd, adagrad)</p></li>
<li>
<p>Factorization Machine (squared loss for rating prediction)</p></li>
</ul></div>
<div class="section">
<h2><a name="k-Nearest_Neighbor"></a>k-Nearest Neighbor</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/MinHash">Minhash</a> (<a class="externalLink" href="http://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH</a> with jaccard index)</p></li>
<li>
<p>b-Bit minhash</p></li>
<li>
<p>Brute-force search using Cosine similarity</p></li>
</ul></div>
<div class="section">
<h2><a name="Anomaly_Detection"></a>Anomaly Detection</h2>
<ul>
<li><a class="externalLink" href="http://en.wikipedia.org/wiki/Local_outlier_factor">Local Outlier Factor (LOF)</a></li>
</ul></div>
<div class="section">
<h2><a name="Natural_Language_Processing"></a>Natural Language Processing</h2>
<ul>
<li>English/Japanese Text Tokenizer</li>
</ul></div>
<div class="section">
<h2><a name="Feature_engineering"></a>Feature engineering</h2>
<ul>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Feature_hashing">Feature Hashing</a> (MurmurHash, SHA1)</p></li>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Feature_scaling">Feature scaling</a> (Min-Max Normalization, Z-Score)</p></li>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Polynomial_kernel">Polynomial Features</a></p></li>
<li>
<p>Feature instances amplifier that reduces iterations on training</p></li>
<li>
<p><a class="externalLink" href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF</a> vectorizer</p></li>
<li>
<p>Bias clause</p></li>
<li>
<p>Data generator for one-vs-the-rest classifiers</p></li>
</ul></div>
<div class="section">
<h2><a name="System_requirements"></a>System requirements</h2>
<ul>
<li>
<p>Hive 0.13 or later</p></li>
<li>
<p>Java 7 or later</p></li>
<li>
<p>Spark 2.1 or later for Apache Hivemall on Spark</p></li>
<li>
<p>Pig 0.15 or later for Apache Hivemall on Pig</p></li>
</ul>
<p>More detail in <a class="externalLink" href="http://hivemall-docs.readthedocs.io/en/latest/">documentation</a>.</p></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row">
<p>
<small>
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by <a href="http://incubator.apache.org/">the Apache Incubator</a>.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications,
and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code,
it does indicate that the project has yet to be fully endorsed by the ASF.
</small>
</p>
</div>
<p id="poweredBy" class="pull-right"> <a href="http://incubator.apache.org/projects/hivemall.html" title="Apache Incubator" class="builtBy"><img class="builtBy" alt="Apache Incubator" src="images/apache-incubator-logo.png" /></a>
</p>
</div>
</footer>
</body>
</html>