blob: d6a5c43c6712dd9e1adc8eba294317a7e433da29 [file] [log] [blame]
<!doctype html>
<!--[if lt IE 7]><html lang="en-US" class="no-js lt-ie9 lt-ie8 lt-ie7"><![endif]-->
<!--[if (IE 7)&!(IEMobile)]><html lang="en-US" class="no-js lt-ie9 lt-ie8"><![endif]-->
<!--[if (IE 8)&!(IEMobile)]><html lang="en-US" class="no-js lt-ie9"><![endif]-->
<!--[if gt IE 8]><!-->
<html lang="en-US" class="no-js">
<!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Machine Learning - Apache Spot</title>
<meta name="HandheldFriendly" content="True">
<meta name="MobileOptimized" content="320">
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<link rel="apple-touch-icon" href="../../library/images/apple-touch-icon.png">
<link rel="icon" href="../../favicon.png">
<!--[if IE]>
<link rel="shortcut icon" href="http://spot.incubator.apache.org/favicon.ico">
<![endif]-->
<meta name="msapplication-TileColor" content="#f01d4f">
<meta name="msapplication-TileImage" content="../../library/images/win8-tile-icon.png">
<meta name="theme-color" content="#121212">
<link rel='dns-prefetch' href='//fonts.googleapis.com' />
<link rel='dns-prefetch' href='//s.w.org' />
<link rel="alternate" type="application/rss+xml" title="Apache Spot &raquo; Feed" href="../../feed/" />
<link rel='stylesheet' id='googleFonts-css' href='http://fonts.googleapis.com/css?family=Lato%3A400%2C700%2C400italic%2C700italic' type='text/css' media='all' />
<link rel='stylesheet' id='bones-stylesheet-css' href='../../library/css/style.css' type='text/css' media='all' />
<!--[if lt IE 9]>
<link rel='stylesheet' id='bones-ie-only-css' href='http://spot.incubator.apache.org/library/css/ie.css' type='text/css' media='all' />
<![endif]-->
<link rel='stylesheet' id='mm-css-css' href='../../library/css/meanmenu.css' type='text/css' media='all' />
<script type='text/javascript' src='../../library/js/libs/modernizr.custom.min.js'></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script type='text/javascript' src='../../library/js/jquery-migrate.min.js'></script>
<script type='text/javascript' src='../../library/js/jquery.meanmenu.js'></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-87470508-1', 'auto');
ga('send', 'pageview');
</script>
</head>
<body class="page">
<div id="container">
<header class="header">
<div id="inner-header" class="wrap cf">
<p id="logo" class="h1" itemscope itemtype="http://schema.org/Organization">
<a href="http://spot.incubator.apache.org/" rel="nofollow"><img src="../../library/images/logo.png" alt="Apache Spot" /></a>
</p>
<nav>
<ul id="menu-main-menu" class="nav top-nav cf">
<li id="menu-item-129" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-129">
<a href="../../get-started">Get Started</a>
<ul class="sub-menu">
<li><a href="../../get-started">Get Started</a></li>
<li><a href="../../get-started/supporting-apache">Supporting Apache</a></li>
<li><a href="../../get-started/environment">Environment</a></li>
<li><a href="../../get-started/architecture">Architecture</a></li>
<li><a href="../../get-started/demo">Demo</a></li>
</ul>
</li>
<li id="menu-item-5" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-5">
<a href="../../download">Download</a>
</li>
<li id="menu-item-130" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-130">
<a href="../../community">Community</a>
<ul class="sub-menu com-sm">
<li class="dropmenu-head">Get in Touch</li>
<li><a href="../../community" class="mail">Mailing Lists</a></li>
<li class="divider"></li>
<li><a href="../../community/committers">Project Committers</a></li>
<li><a href="../../community/contribute">How to Contribute</a></li>
<li class="divider"></li>
<li class="dropmenu-head">Developer Resources</li>
<li><a href="https://github.com/apache/incubator-spot" target="_blank" class="github">Github</a></li>
<li><a href="https://issues.apache.org/jira/browse/SPOT/" target="_blank" class="jira">JIRA Issue Tracker</a></li>
<li><a href="https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=SPOT&title=Apache+Spot+%28Incubating%29+Home" target="_blank" class="">Confluence Site</a></li> <li class="divider"></li>
<li class="dropmenu-head">Social Media</li>
<li><a href="https://twitter.com/ApacheSpot" target="_blank" class="twitter-icon">Twitter</a></li>
</ul>
</li>
<li id="menu-item-106" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-106">
<a href="../../doc">Documentation</a>
</li>
<li class="menu-item menu-item-has-children active">
<a href="#">Project Components</a>
<ul class="sub-menu">
<li><a href="../../project-components/ingestion">Ingestion</a></li>
<li class="active"><a href="../../project-components/machine-learning">Machine Learning</a></li>
<li><a href="../../project-components/suspicious-connects-analysis">Suspicous Connects Analysis</a></li>
<li><a href="../../project-components/visualization">Visualization</a></li>
<li class="under-dev">Under Development</li>
<li><a href="../../project-components/open-data-models">Open Data Models</a></li>
</ul>
</li>
<li id="menu-item-13" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-13">
<a href="../../blog">Blog</a>
</li>
</ul>
</nav>
</div>
</header>
<div id="mobile-nav"></div>
<!--
<div id="masthead">
<div class="wrap cf">
<div class="m-all d-1of2 right-center">
<h1>Lorem ispum dolor sit amet, consectetur adipisicing elit</h1>
</div>
</div>
</div>-->
<div id="content">
<div class="wrap cf"><!--if page has sidebar, add class "with-sidebar"-->
<div class="main">
<h1 class="page-title">Apache Spot Machine Learning</h1>
<p>The machine learning component of Apache Spot contains routines for performing suspicious connections analyses on netflow, DNS or proxy logs gathered from a network. These analyses consume a collection of network events and produce a list of the events that are considered to be the least probable, and these are consider the most suspicious. They rely on the ingest component of Spot to collect and load netflow, DNS, and proxy records.</p>
<p>Apache Spot uses topic modeling to discover normal and abnormal behavior. It treats the collection of logs related to an IP as a document and uses Latent Dirichlet Allocation (LDA) to discover hidden semantic structures in the collection of such documents. </p>
<p>LDA is a generative probabilistic model used for discrete data, such as text corpora. LDA is a three-level Bayesian model in which each word of a document is generated from a mixture of an underlying set of topics [1]. We apply LDA to network traffic by converting network log entries into words through aggregation and discretization. In this manner, documents correspond to IP addresses, words to log entries (related to an IP address) and topics to profiles of common network activity.</p>
<p>Apache Spot infers a probabilistic model for the network behavior of each IP address. Each network log entry is assigned an estimated probability (score) by the model. The events with lower scores are flagged as “suspicious” for further analysis.</p>
<p class="citation">[1] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3, no. Jan (2003): 993-1022.</p>
</div>
</div>
</div>
<div id="more-info">
<div class="wrap cf">
<p>
<a href="https://github.com/apache/incubator-spot" class="y-btn" target="_blank">More Info</a>
</p>
<p style="margin-top:50px;"><img src="../../library/images/apache-incubator.png" alt="Apache Incubator" />
</p>
<p class="disclaimer">
Apache Spot is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
</p>
<p class="disclaimer">
The contents of this website are © 2016 Apache Software Foundation under the terms of the Apache License v2. Apache Spot and its logo are trademarks of the Apache Software Foundation.
</p>
</div>
</div>
<footer class="footer" role="contentinfo" itemscope itemtype="http://schema.org/WPFooter">
<div id="inner-footer" class="wrap cf">
<p class="source-org copyright" style="text-align:center;">
&copy; 2019 Apache Spot.
</p>
</div>
</footer>
</div>
<a href="#0" class="cd-top">Top</a>
<script type='text/javascript' src='../../library/js/scripts.js'></script>
</body>
</html>