| <!DOCTYPE html> |
| <!-- |
| | Generated by Apache Maven Doxia Site Renderer 1.8.1 |
| | Rendered using Apache Maven Fluido Skin 1.6 |
| --> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> |
| <meta http-equiv="Content-Language" content="en" /> |
| <title>Apache Hivemall – Apache Hivemall Overview</title> |
| <link rel="stylesheet" href="./css/apache-maven-fluido-1.6.min.css" /> |
| <link rel="stylesheet" href="./css/site.css" /> |
| <link rel="stylesheet" href="./css/print.css" media="print" /> |
| <script type="text/javascript" src="./js/apache-maven-fluido-1.6.min.js"></script> |
| </head> |
| <body class="topBarEnabled"> |
| <a href="https://github.com/apache/incubator-hivemall"> |
| <img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;" |
| src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png" |
| alt="Fork me on GitHub"> |
| </a> |
| <div id="topbar" class="navbar navbar-fixed-top navbar-inverse"> |
| <div class="navbar-inner"> |
| <div class="container"><div class="nav-collapse"> |
| <ul class="nav"> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="index.html" title="Home">Home</a></li> |
| <li><a href="download.html" title="Download">Download</a></li> |
| <li><a href="licenses.html" title="Licenses">Licenses</a></li> |
| <li><a href="team.html" title="Team">Team</a></li> |
| <li><a href="poweredby.html" title="Powered By">Powered By</a></li> |
| <li><a href="http://incubator.apache.org/projects/hivemall.html" title="Incubation Status">Incubation Status</a></li> |
| </ul> |
| </li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="userguide/index.html" title="User Guide">User Guide</a></li> |
| <li><a href="overview.html" title="Overview">Overview</a></li> |
| <li><a href="https://cwiki.apache.org/confluence/display/HIVEMALL" target="_blank" title="Wiki">Wiki</a></li> |
| <li><a href="faq.html" title="FAQ">FAQ</a></li> |
| </ul> |
| </li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown">Get Involved <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="mailing-lists.html" title="Mailing Lists">Mailing Lists</a></li> |
| <li><a href="https://issues.apache.org/jira/browse/HIVEMALL" target="_blank" title="Issues (Jira)">Issues (Jira)</a></li> |
| <li><a href="repository.html" title="Source (Git)">Source (Git)</a></li> |
| <li><a href="https://travis-ci.org/apache/incubator-hivemall" target="_blank" title="Travis CI">Travis CI</a></li> |
| <li><a href="contributing.html" title="Contributing">Contributing</a></li> |
| <li><a href="release-guide.html" title="Release Guide">Release Guide</a></li> |
| </ul> |
| </li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="http://www.apache.org/foundation/how-it-works.html" target="_blank" title="How Apache Works">How Apache Works</a></li> |
| <li><a href="http://www.apache.org/foundation/" target="_blank" title="Foundation">Foundation</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html" target="_blank" title="Sponsoring Apache">Sponsoring Apache</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html" target="_blank" title="Thanks">Thanks</a></li> |
| </ul> |
| </li> |
| </ul> |
| <ul class="nav pull-right"><li> |
| <a href="https://twitter.com/ApacheHivemall" class="twitter-follow-button" data-show-count="false" data-align="right" data-size="large" data-show-screen-name="true" data-lang="en">Follow ApacheHivemall</a> |
| <script type="text/javascript">!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script> |
| </li></ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="container"> |
| <div id="banner"> |
| <div class="pull-left"><div id="bannerLeft"><h2>Apache Hivemall</h2> |
| </div> |
| </div> |
| <div class="pull-right"></div> |
| <div class="clear"><hr/></div> |
| </div> |
| |
| <div id="breadcrumbs"> |
| <ul class="breadcrumb"> |
| <li id="publishDate">Last Published: 2019-10-31<span class="divider">|</span> |
| </li> |
| <li id="projectVersion">Version: 0.6.0-incubating-SNAPSHOT</li> |
| </ul> |
| </div> |
| <div id="bodyColumn" > |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --><h1>Apache Hivemall Overview</h1> |
| <p>Apache Hivemall is a scalable machine learning library that runs on Apache Hive/Pig/Spark. Apache Hivemall is designed to be scalable to the number of training instances as well as the number of training features.</p> |
| <div class="section"> |
| <h2><a name="Supported_Algorithms"></a>Supported Algorithms</h2> |
| <p>Apache Hivemall provides machine learning functionality as well as feature engineering functions through UDFs/UDAFs/UDTFs of Hive. </p></div> |
| <div class="section"> |
| <h2><a name="Binary_Classification"></a>Binary Classification</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Perceptron">Perceptron</a></p></li> |
| |
| <li> |
| <p>Passive Aggressive (PA, PA1, PA2)</p></li> |
| |
| <li> |
| <p>Confidence Weighted (CW)</p></li> |
| |
| <li> |
| <p>Adaptive Regularization of Weight Vectors (AROW)</p></li> |
| |
| <li> |
| <p>Soft Confidence Weighted (SCW1, SCW2)</p></li> |
| |
| <li> |
| <p>AdaGradRDA (w/ hinge loss)</p></li> |
| |
| <li> |
| <p>Factorization Machine (w/ logistic loss)</p></li> |
| </ul> |
| <p><i>My recommendation is AROW, SCW1, AdaGradRDA, and Factorization Machine while it depends.</i></p></div> |
| <div class="section"> |
| <h2><a name="Multi-class_Classification"></a>Multi-class Classification</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Perceptron">Perceptron</a></p></li> |
| |
| <li> |
| <p>Passive Aggressive (PA, PA1, PA2)</p></li> |
| |
| <li> |
| <p>Confidence Weighted (CW)</p></li> |
| |
| <li> |
| <p>Adaptive Regularization of Weight Vectors (AROW)</p></li> |
| |
| <li> |
| <p>Soft Confidence Weighted (SCW1, SCW2)</p></li> |
| |
| <li> |
| <p>Random Forest Classifier</p></li> |
| |
| <li> |
| <p>Gradient Tree Boosting (<i>Experimental</i>)</p></li> |
| </ul> |
| <p><i>My recommendation is AROW and SCW while it depends.</i></p></div> |
| <div class="section"> |
| <h2><a name="Regression"></a>Regression</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Logistic_regression">Logistic Regression</a> using <a class="externalLink" href="http://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic Gradient Descent</a></p></li> |
| |
| <li> |
| <p>AdaGrad, AdaDelta (w/ logistic Loss)</p></li> |
| |
| <li> |
| <p>Passive Aggressive Regression (PA1, PA2)</p></li> |
| |
| <li> |
| <p>AROW regression</p></li> |
| |
| <li> |
| <p>Random Forest Regressor</p></li> |
| |
| <li> |
| <p>Factorization Machine (w/ squared loss)</p></li> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Polynomial_regression">Polynomial Regression</a></p></li> |
| </ul> |
| <p><i>My recommendation for is AROW regression, AdaDelta, and Factorization Machine while it depends.</i></p></div> |
| <div class="section"> |
| <h2><a name="Recommendation"></a>Recommendation</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/MinHash">Minhash</a> (<a class="externalLink" href="http://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH</a> with jaccard index)</p></li> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Matrix_decomposition">Matrix Factorization</a> (sgd, adagrad)</p></li> |
| |
| <li> |
| <p>Factorization Machine (squared loss for rating prediction)</p></li> |
| </ul></div> |
| <div class="section"> |
| <h2><a name="k-Nearest_Neighbor"></a>k-Nearest Neighbor</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/MinHash">Minhash</a> (<a class="externalLink" href="http://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH</a> with jaccard index)</p></li> |
| |
| <li> |
| <p>b-Bit minhash</p></li> |
| |
| <li> |
| <p>Brute-force search using Cosine similarity</p></li> |
| </ul></div> |
| <div class="section"> |
| <h2><a name="Anomaly_Detection"></a>Anomaly Detection</h2> |
| |
| <ul> |
| |
| <li><a class="externalLink" href="http://en.wikipedia.org/wiki/Local_outlier_factor">Local Outlier Factor (LOF)</a></li> |
| </ul></div> |
| <div class="section"> |
| <h2><a name="Natural_Language_Processing"></a>Natural Language Processing</h2> |
| |
| <ul> |
| |
| <li>English/Japanese Text Tokenizer</li> |
| </ul></div> |
| <div class="section"> |
| <h2><a name="Feature_engineering"></a>Feature engineering</h2> |
| |
| <ul> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Feature_hashing">Feature Hashing</a> (MurmurHash, SHA1)</p></li> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Feature_scaling">Feature scaling</a> (Min-Max Normalization, Z-Score)</p></li> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Polynomial_kernel">Polynomial Features</a></p></li> |
| |
| <li> |
| <p>Feature instances amplifier that reduces iterations on training</p></li> |
| |
| <li> |
| <p><a class="externalLink" href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF</a> vectorizer</p></li> |
| |
| <li> |
| <p>Bias clause</p></li> |
| |
| <li> |
| <p>Data generator for one-vs-the-rest classifiers</p></li> |
| </ul></div> |
| <div class="section"> |
| <h2><a name="System_requirements"></a>System requirements</h2> |
| |
| <ul> |
| |
| <li> |
| <p>Hive 0.13 or later</p></li> |
| |
| <li> |
| <p>Java 7 or later</p></li> |
| |
| <li> |
| <p>Spark 2.1 or later for Apache Hivemall on Spark</p></li> |
| |
| <li> |
| <p>Pig 0.15 or later for Apache Hivemall on Pig</p></li> |
| </ul> |
| <p>More detail in <a class="externalLink" href="http://hivemall-docs.readthedocs.io/en/latest/">documentation</a>.</p></div> |
| </div> |
| </div> |
| <hr/> |
| <footer> |
| <div class="container"> |
| <div class="row"> |
| <p> |
| <small> |
| Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by <a href="http://incubator.apache.org/">the Apache Incubator</a>. |
| Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, |
| and decision making process have stabilized in a manner consistent with other successful ASF projects. |
| While incubation status is not necessarily a reflection of the completeness or stability of the code, |
| it does indicate that the project has yet to be fully endorsed by the ASF. |
| </small> |
| </p> |
| </div> |
| <p id="poweredBy" class="pull-right"> <a href="http://incubator.apache.org/projects/hivemall.html" title="Apache Incubator" class="builtBy"><img class="builtBy" alt="Apache Incubator" src="images/apache-incubator-logo.png" /></a> |
| </p> |
| </div> |
| </footer> |
| </body> |
| </html> |