current-book/use-cases/forensic_clustering/index.html - metron - Git at Google

 <!DOCTYPE html>
 <!--
  | Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/markdown/use-cases/forensic_clustering/index.md at 2019-05-14
  | Rendered using Apache Maven Fluido Skin 1.7
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="Date-Revision-yyyymmdd" content="20190514" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Metron &#x2013; Problem Statement</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.7.min.css" />
     <link rel="stylesheet" href="../../css/site.css" />
     <link rel="stylesheet" href="../../css/print.css" media="print" />
     <script type="text/javascript" src="../../js/apache-maven-fluido-1.7.min.js"></script>
 <script type="text/javascript">
               $( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );
             </script>
   </head>
   <body class="topBarDisabled">
     <div class="container-fluid">
       <div id="banner">
         <div class="pull-left"><a href="http://metron.apache.org/" id="bannerLeft"><img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/></a></div>
         <div class="pull-right"></div>
         <div class="clear"><hr/></div>
       </div>

       <div id="breadcrumbs">
         <ul class="breadcrumb">
       <li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
       <li class=""><a href="http://metron.apache.org/" class="externalLink" title="Metron">Metron</a><span class="divider">/</span></li>
       <li class=""><a href="../../index.html" title="Documentation">Documentation</a><span class="divider">/</span></li>
     <li class="active ">Problem Statement</li>
         <li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-05-14</li>
           <li id="projectVersion" class="pull-right">Version: 0.7.1</li>
         </ul>
       </div>
       <div class="row-fluid">
         <div id="leftColumn" class="span2">
           <div class="well sidebar-nav">
     <ul class="nav nav-list">
       <li class="nav-header">User Documentation</li>
     <li><a href="../../index.html" title="Metron"><span class="icon-chevron-down"></span>Metron</a>
     <ul class="nav nav-list">
     <li><a href="../../CONTRIBUTING.html" title="CONTRIBUTING"><span class="none"></span>CONTRIBUTING</a></li>
     <li><a href="../../Upgrading.html" title="Upgrading"><span class="none"></span>Upgrading</a></li>
     <li><a href="../../metron-analytics/index.html" title="Analytics"><span class="icon-chevron-right"></span>Analytics</a></li>
     <li><a href="../../metron-contrib/metron-docker/index.html" title="Docker"><span class="none"></span>Docker</a></li>
     <li><a href="../../metron-contrib/metron-performance/index.html" title="Performance"><span class="none"></span>Performance</a></li>
     <li><a href="../../metron-deployment/index.html" title="Deployment"><span class="icon-chevron-right"></span>Deployment</a></li>
     <li><a href="../../metron-interface/index.html" title="Interface"><span class="icon-chevron-right"></span>Interface</a></li>
     <li><a href="../../metron-platform/index.html" title="Platform"><span class="icon-chevron-right"></span>Platform</a></li>
     <li><a href="../../metron-sensors/index.html" title="Sensors"><span class="icon-chevron-right"></span>Sensors</a></li>
     <li><a href="../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example"><span class="none"></span>Stellar-3rd-party-example</a></li>
     <li><a href="../../metron-stellar/stellar-common/index.html" title="Stellar-common"><span class="icon-chevron-right"></span>Stellar-common</a></li>
     <li><a href="../../metron-stellar/stellar-zeppelin/index.html" title="Stellar-zeppelin"><span class="none"></span>Stellar-zeppelin</a></li>
     <li><a href="../../use-cases/index.html" title="Use-cases"><span class="icon-chevron-down"></span>Use-cases</a>
     <ul class="nav nav-list">
     <li class="active"><a href="#"><span class="none"></span>Forensic_clustering</a></li>
     <li><a href="../../use-cases/geographic_login_outliers/index.html" title="Geographic_login_outliers"><span class="none"></span>Geographic_login_outliers</a></li>
     <li><a href="../../use-cases/parser_chaining/index.html" title="Parser_chaining"><span class="none"></span>Parser_chaining</a></li>
     <li><a href="../../use-cases/typosquat_detection/index.html" title="Typosquat_detection"><span class="none"></span>Typosquat_detection</a></li>
     </ul>
 </li>
     </ul>
 </li>
 </ul>
           <hr />
           <div id="poweredBy">
             <div class="clear"></div>
             <div class="clear"></div>
             <div class="clear"></div>
             <div class="clear"></div>
 <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" /></a>
             </div>
           </div>
         </div>
         <div id="bodyColumn"  class="span10" >
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
 <h1>Problem Statement</h1>
 <p><a name="Problem_Statement"></a></p>
 <p>Having a forensic hash, such as <a class="externalLink" href="https://github.com/trendmicro/tlsh">TLSH</a>, is a useful tool in cybersecurity. In short, the notion is that semantically similar documents should hash to a value which also similar.  Contrast this with your standard cryptographic hashes, such as SHA and MD, where small deviations in the input data will yield large deviations in the hashes.</p>
 <p>The traditional use-case is to hash input documents or binaries and compare against a known blacklist of malicious hashes.  A sufficiently similar hash will indicate a match.  This will avoid malicious parties fuzzing input data to avoid detection.</p>
 <p>While this is interesting, it still requires metric-space searches in a blacklist. I envisioned a slightly more interesting streaming use-case of on-the-fly clustering of data.  While the TLSH hashes created do not necessarily hash to precisely the same value on similar documents, more traditional non-forensic hashes <i>do</i> collide when sufficiently similar. Namely, the Hamming distance <a class="externalLink" href="https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Bit_sampling_for_Hamming_distance">LSH</a> applied to the TLSH hash would give us a way to bin semantic hashes such that similar hashes (by hamming distance) have the same hash.</p>
 <p>Inspired by a good <a class="externalLink" href="https://github.com/fluenda/dataworks_summit_iot_botnet/blob/master/dws-fucs-lopresto.pdf">talk</a> by Andy LoPresto and Andre Fucs de Miranda from Apache NiFi, we will proceed to take logs from the Cowrie honeypot and compute TLSH hashes and semantic bins so that users can easily find similarly malicious activity to known threats in logs.</p>
 <p>Consider the following excerpts from the Cowrie logs the authors above have shared:</p>

 <div>
 <div>
 <pre class="source">{
   &quot;eventid&quot;: &quot;cowrie.command.success&quot;
 , &quot;timestamp&quot;: &quot;2017-09-18T11:45:25.028091Z&quot;
 , &quot;message&quot;: &quot;Command found: /bin/busybox LSUCT&quot;
 , &quot;system&quot;: &quot;CowrieTelnetTransport,787,121.237.129.163&quot;
 , &quot;isError&quot;: 0
 , &quot;src_ip&quot;: &quot;121.237.129.163&quot;
 , &quot;session&quot;: &quot;21caf72c6358&quot;
 , &quot;input&quot;: &quot;/bin/busybox LSUCT&quot;
 , &quot;sensor&quot;: &quot;a927e8b28666&quot;
 }
 </pre></div></div>

 <p>and</p>

 <div>
 <div>
 <pre class="source">{
   &quot;eventid&quot;: &quot;cowrie.command.success&quot;
 , &quot;timestamp&quot;: &quot;2017-09-17T04:06:39.673206Z&quot;
 , &quot;message&quot;: &quot;Command found: /bin/busybox XUSRH&quot;
 , &quot;system&quot;: &quot;CowrieTelnetTransport,93,94.51.110.74&quot;
 , &quot;isError&quot;: 0
 , &quot;src_ip&quot;: &quot;94.51.110.74&quot;
 , &quot;session&quot;: &quot;4c047bbc016c&quot;
 , &quot;input&quot;: &quot;/bin/busybox XUSRH&quot;
 , &quot;sensor&quot;: &quot;a927e8b28666&quot;
 }
 </pre></div></div>

 <p>You will note the <tt>/bin/busybox</tt> call with a random selection afterwards.<br />
 Excerpting from an analysis of an IOT exploit <a class="externalLink" href="https://isc.sans.edu/diary/21543">here</a>:</p>

 <div>
 <div>
 <pre class="source">The use of the command &quot;busybox ECCHI&quot; appears to have two functions.
 First of all, cowrie, and more &quot;complete&quot; Linux distrubtions then
 commonly found on DVRs will respond with a help screen if a wrong module
 is used. So this way, &quot;ECCHI&quot; can be used to detect honeypots and
 irrelevant systems if the reply isn't simply &quot;ECCHI: applet not found&quot;.
 Secondly, the command is used as a market to indicate that the prior
 command finished. Later, the attacker adds &quot;/bin/busybox ECCHI&quot; at the
 end of each line, following the actual command to be executed.
 </pre></div></div>

 <p>We have a few options at our disposal:</p>
 <ul>

 <li>If we were merely filtering and alerting on the execution of <tt>/bin/busybox</tt> we would include false positives.</li>
 <li>If we looked at <tt>/bin/busybox XUSRH</tt>, we&#x2019;d miss many attempts with a <i>different</i> value as <tt>XUSRH</tt> is able to be swapped out for another random sequence to foil overly strict rules.</li>
 <li>If we looked for <tt>/bin/busybox *</tt> then we&#x2019;d capture this scenario well, but it&#x2019;d be nice to be able to not be specific to detecting the <tt>/bin/busybox</tt> style of exploits.</li>
 </ul>
 <p>Indeed, this is precisely what semantic hashing and binning allows us, the ability to group by semantic similarity without being too specific about what we mean of as &#x201c;semantic&#x201d; or &#x201c;similar&#x201d;.  We want to cast a wide net, but not pull back every fish in the sea.</p>
 <p>For this demonstration, we will</p>
 <ul>

 <li>ingest some 400 cowrie records</li>
 <li>tag records from an IP blacklist for known malicious actors</li>
 <li>use the alerts UI to investigate and find similar attacks.</li>
 </ul>
 <div class="section">
 <h2><a name="Preliminaries"></a>Preliminaries</h2>
 <p>We assume that the following environment variables are set:</p>
 <ul>

 <li><tt>METRON_HOME</tt> - the home directory for metron</li>
 <li><tt>ZOOKEEPER</tt> - The zookeeper quorum (comma separated with port specified: e.g. <tt>node1:2181</tt> for full-dev)</li>
 <li><tt>BROKERLIST</tt> - The Kafka broker list (comma separated with port specified: e.g. <tt>node1:6667</tt> for full-dev)</li>
 <li><tt>ES_HOST</tt> - The elasticsearch master (and port) e.g. <tt>node1:9200</tt> for full-dev.</li>
 </ul>
 <p>Also, this does not assume that you are using a kerberized cluster.  If you are, then the parser start command will adjust slightly to include the security protocol.</p>
 <p>Before editing configurations, be sure to pull the configs from zookeeper locally via</p>

 <div>
 <div>
 <pre class="source">$METRON_HOME/bin/zk_load_configs.sh --mode PULL -z $ZOOKEEPER -o $METRON_HOME/config/zookeeper/ -f
 </pre></div></div>
 </div>
 <div class="section">
 <h2><a name="Setting_up_the_Data"></a>Setting up the Data</h2>
 <p>First we must set up the cowrie log data in our cluster&#x2019;s access node.</p>
 <ul>

 <li>Download the data from the github repository for the talk mentioned above <a class="externalLink" href="https://github.com/fluenda/dataworks_summit_iot_botnet/blob/master/180424243034750.tar.gz">here</a>. Ensure that&#x2019;s moved into your home directory on the metron node.</li>
 <li>Create a directory called <tt>cowrie</tt> in ~ and untar the tarball into that directory via:</li>
 </ul>

 <div>
 <div>
 <pre class="source">mkdir ~/cowrie
 cd ~/cowrie
 tar xzvf ~/180424243034750.tar.gz
 </pre></div></div>
 </div>
 <div class="section">
 <h2><a name="Configuring_the_Parser"></a>Configuring the Parser</h2>
 <p>The Cowrie data is coming in as simple JSON blobs, so it&#x2019;s easy to parse.  We really just need to adjust the timestamp and a few fields and we have valid data.</p>
 <ul>

 <li>Create <tt>$METRON_HOME/config/zookeeper/parsers/cowrie.json</tt> with the following content:</li>
 </ul>

 <div>
 <div>
 <pre class="source">{
   &quot;parserClassName&quot;:&quot;org.apache.metron.parsers.json.JSONMapParser&quot;,
   &quot;sensorTopic&quot;:&quot;cowrie&quot;,
   &quot;fieldTransformations&quot; : [
     {
     &quot;transformation&quot; : &quot;STELLAR&quot;
    ,&quot;output&quot; : [ &quot;timestamp&quot;]
    ,&quot;config&quot; : {
       &quot;timestamp&quot; : &quot;TO_EPOCH_TIMESTAMP( timestamp, 'yyyy-MM-dd\\'T\\'HH:mm:ss.SSS')&quot;
                }
     }
                            ]

 }

 </pre></div></div>

 <p>Before we start, we will want to install ES template mappings so ES knows how to interpret our fields:</p>

 <div>
 <div>
 <pre class="source">curl -XPUT $ES_HOST'/_template/cowrie_index' -d '
 {
   &quot;template&quot;: &quot;cowrie_index*&quot;,
   &quot;mappings&quot;: {
     &quot;cowrie_doc&quot;: {
         &quot;dynamic_templates&quot;: [
         {
           &quot;geo_location_point&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:location_point&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;geo_point&quot;
             }
           }
         },
         {
           &quot;geo_country&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:country&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;keyword&quot;
             }
           }
         },
         {
           &quot;geo_city&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:city&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;keyword&quot;
             }
           }
         },
         {
           &quot;geo_location_id&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:locID&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;keyword&quot;
             }
           }
         },
         {
           &quot;geo_dma_code&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:dmaCode&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;keyword&quot;
             }
           }
         },
         {
           &quot;geo_postal_code&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:postalCode&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;keyword&quot;
             }
           }
         },
         {
           &quot;geo_latitude&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:latitude&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;float&quot;
             }
           }
         },
         {
           &quot;geo_longitude&quot;: {
             &quot;match&quot;: &quot;enrichments:geo:*:longitude&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;float&quot;
             }
           }
         },
         {
           &quot;timestamps&quot;: {
             &quot;match&quot;: &quot;*:ts&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;,
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;date&quot;,
               &quot;format&quot;: &quot;epoch_millis&quot;
             }
           }
         },
         {
           &quot;threat_triage_score&quot;: {
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;float&quot;
             },
             &quot;match&quot;: &quot;threat:triage:*score&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;
           }
         },
         {
           &quot;threat_triage_reason&quot;: {
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;text&quot;,
               &quot;fielddata&quot;: &quot;true&quot;
             },
             &quot;match&quot;: &quot;threat:triage:rules:*:reason&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;
           }
         },
         {
           &quot;threat_triage_name&quot;: {
             &quot;mapping&quot;: {
               &quot;type&quot;: &quot;text&quot;,
               &quot;fielddata&quot;: &quot;true&quot;
             },
             &quot;match&quot;: &quot;threat:triage:rules:*:name&quot;,
             &quot;match_mapping_type&quot;: &quot;*&quot;
           }
         }
         ],
         &quot;properties&quot; : {
           &quot;blacklisted&quot; : {
             &quot;type&quot; : &quot;boolean&quot;
           },
           &quot;compCS&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;data&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;dst_ip&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;dst_port&quot; : {
             &quot;type&quot; : &quot;long&quot;
           },
           &quot;duration&quot; : {
             &quot;type&quot; : &quot;double&quot;
           },
           &quot;encCS&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;eventid&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;guid&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;input&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;isError&quot; : {
             &quot;type&quot; : &quot;long&quot;
           },
           &quot;is_alert&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;kexAlgs&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;keyAlgs&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;macCS&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;message&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;original_keyword&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;password&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;sensor&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;session&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;similarity_bin&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;size&quot; : {
             &quot;type&quot; : &quot;long&quot;
           },
           &quot;source:type&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;src_ip&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;src_port&quot; : {
             &quot;type&quot; : &quot;long&quot;
           },
           &quot;system&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;timestamp&quot;: {
             &quot;type&quot;: &quot;date&quot;,
             &quot;format&quot;: &quot;epoch_millis&quot;
           },
           &quot;tlsh&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;ttylog&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;username&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;version&quot; : {
             &quot;type&quot; : &quot;keyword&quot;
           },
           &quot;metron_alert&quot; : {
             &quot;type&quot; : &quot;nested&quot;
           }
         }
      }
   }
 }
 '
 </pre></div></div>

 <ul>

 <li>Create the <tt>cowrie</tt> kafka topic via:</li>
 </ul>

 <div>
 <div>
 <pre class="source">/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --create --topic cowrie --partitions 1 --replication-factor 1
 </pre></div></div>
 </div>
 <div class="section">
 <h2><a name="Import_the_Blacklist"></a>Import the Blacklist</h2>
 <p>Here, to build out a scenario, we will assume that we have a blacklist of known malicious hosts.  For our purposes, we&#x2019;ll choose one particular host IP to be malicious.</p>
 <ul>

 <li>Create <tt>~/blacklist.csv</tt> to contain the following:</li>
 </ul>

 <div>
 <div>
 <pre class="source">94.51.110.74
 </pre></div></div>

 <ul>

 <li>Create <tt>~/blacklist_extractor.json</tt> to contain the following:</li>
 </ul>

 <div>
 <div>
 <pre class="source">{
   &quot;config&quot; : {
     &quot;columns&quot; : {
        &quot;ip&quot; : 0
     },
     &quot;indicator_column&quot; : &quot;ip&quot;,
     &quot;type&quot; : &quot;blacklist&quot;,
     &quot;separator&quot; : &quot;,&quot;
   },
   &quot;extractor&quot; : &quot;CSV&quot;
 }
 </pre></div></div>

 <ul>

 <li>Import the data <tt>$METRON_HOME/bin/flatfile_loader.sh -i ~/blacklist.csv -t threatintel -c t -e ~/blacklist_extractor.json</tt></li>
 </ul>
 <p>This will create a new enrichment type &#x201c;blacklist&#x201d; with a single entry &#x201c;94.51.110.74&#x201d;.</p></div>
 <div class="section">
 <h2><a name="Configure_Enrichments"></a>Configure Enrichments</h2>
 <p>We will want to do the following:</p>
 <ul>

 <li>Add enrichments to faciliate binning
 <ul>

 <li>Construct what we consider to be a sufficient representation of the thing we want to cluster.  For our purposes, this is centered around the input command, so that would be:
 <ul>

 <li>The <tt>message</tt> field</li>
 <li>The <tt>input</tt> field</li>
 <li>The <tt>isError</tt> field</li>
 </ul>
 </li>
 <li>Compute the TLSH hash of this representation, called <tt>tlsh</tt></li>
 <li>Compute the locality sensitive hash of the TLSH hash suitable for binning, called <tt>similarity_bin</tt></li>
 </ul>
 </li>
 <li>Set up the threat intelligence to use the blacklist
 <ul>

 <li>Set an alert if the message is from an IP address in the threat intelligence blacklist.</li>
 <li>Score blacklisted messages with <tt>10</tt>.  In production, this would be more complex.</li>
 </ul>
 </li>
 </ul>
 <p>Now, we can create the enrichments thusly by creating <tt>$METRON_HOME/config/zookeeper/enrichments/cowrie.json</tt> with the following content:</p>

 <div>
 <div>
 <pre class="source">{
   &quot;enrichment&quot;: {
     &quot;fieldMap&quot;: {
       &quot;stellar&quot; : {
         &quot;config&quot; : [
           &quot;characteristic_rep := JOIN([ 'message', exists(message)?message:'', 'input', exists(input)?input:'', 'isError', exists(isError)?isError:''], '|')&quot;,
           &quot;forensic_hashes := HASH(characteristic_rep, 'tlsh', { 'hashes' : 16, 'bucketSize' : 128 })&quot;,
           &quot;similarity_bin := MAP_GET('tlsh_bin', forensic_hashes)&quot;,
           &quot;tlsh := MAP_GET('tlsh', forensic_hashes)&quot;,
           &quot;forensic_hashes := null&quot;,
           &quot;characteristic_rep := null&quot;
         ]
       }
    }
   ,&quot;fieldToTypeMap&quot;: { }
   },
   &quot;threatIntel&quot;: {
     &quot;fieldMap&quot;: {
       &quot;stellar&quot; : {
         &quot;config&quot; : [
           &quot;blacklisted := ENRICHMENT_EXISTS( 'blacklist', src_ip, 'threatintel', 't')&quot;,
           &quot;is_alert := is_alert || blacklisted&quot;
         ]
       }

     },
     &quot;fieldToTypeMap&quot;: { },
     &quot;triageConfig&quot; : {
       &quot;riskLevelRules&quot; : [
         {
           &quot;name&quot; : &quot;Blacklisted Host&quot;,
           &quot;comment&quot; : &quot;Determine if a host is blacklisted&quot;,
           &quot;rule&quot; : &quot;blacklisted != null &amp;&amp; blacklisted&quot;,
           &quot;score&quot; : 10,
           &quot;reason&quot; : &quot;FORMAT('IP %s is blacklisted', src_ip)&quot;
         }
       ],
       &quot;aggregator&quot; : &quot;MAX&quot;
     }
   }
 }
 </pre></div></div>

 <div class="section">
 <h3><a name="A_Note_About_Similarity_Hashes_and_TLSH"></a>A Note About Similarity Hashes and TLSH</h3>
 <p>Notice that we have specified a number of hash functions of <tt>16</tt> when constructing the similarity bin.<br />
 I arrived at that by trial and error, which is not always tenable, frankly.  What is more sensible is likely to construct <i>multiple</i> similarity bins of size <tt>8</tt>, <tt>16</tt>, <tt>32</tt> at minimum.</p>
 <ul>

 <li>The smaller the number of hashes, the more loose the notion of similarity (more possibly dissimilar things would get grouped together).</li>
 <li>The larger the number of hashes, the more strict (similar things may not be grouped together).</li>
 </ul></div></div>
 <div class="section">
 <h2><a name="Create_the_Data_Loader"></a>Create the Data Loader</h2>
 <p>We want to pull a snapshot of the cowrie logs, so create <tt>~/load_data.sh</tt> with the following content:</p>

 <div>
 <div>
 <pre class="source">COWRIE_HOME=~/cowrie
 for i in cowrie.1626302-1636522.json cowrie.16879981-16892488.json cowrie.21312194-21331475.json cowrie.698260-710913.json cowrie.762933-772239.json cowrie.929866-939552.json cowrie.1246880-1248235.json cowrie.19285959-19295444.json cowrie.16542668-16581213.json cowrie.5849832-5871517.json cowrie.6607473-6609163.json;do
   echo $i
   cat $COWRIE_HOME/$i | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic cowrie
   sleep 2
 done
 </pre></div></div>

 <ul>

 <li>Set the <tt>+x</tt> bit on the executable via:</li>
 </ul>

 <div>
 <div>
 <pre class="source">chmod +x ~/load_data.sh
 </pre></div></div>
 </div>
 <div class="section">
 <h2><a name="Execute_Demonstration"></a>Execute Demonstration</h2>
 <p>From here, we&#x2019;ve set up our configuration and can push the configs:</p>
 <ul>

 <li>Push the configs to zookeeper via</li>
 </ul>

 <div>
 <div>
 <pre class="source">$METRON_HOME/bin/zk_load_configs.sh --mode PUSH -z $ZOOKEEPER -i $METRON_HOME/config/zookeeper/
 </pre></div></div>

 <ul>

 <li>Start the parser via:</li>
 </ul>

 <div>
 <div>
 <pre class="source">$METRON_HOME/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s cowrie
 </pre></div></div>

 <ul>

 <li>Push cowrie data into the <tt>cowrie</tt> topic via</li>
 </ul>

 <div>
 <div>
 <pre class="source">~/load_data.sh
 </pre></div></div>

 <p>Once this data is loaded, we can use the Alerts UI, starting from known malicious actors, to find others doing similar things.</p>
 <ul>

 <li>

 <p>First we can look at the alerts directly and find an instance of our <tt>/bin/busybox</tt> activity: <img src="../../images/find_alerts.png" alt="Alerts" /></p>
 </li>
 <li>

 <p>We can now pivot and look for instances of messages with the same <tt>semantic_hash</tt> but who are <i>not</i> alerts: <img src="../../images/clustered.png" alt="Pivot" /></p>
 </li>
 </ul>
 <p>As you can see, we have found a few more malicious actors:</p>
 <ul>

 <li>177.239.192.172</li>
 <li>180.110.69.182</li>
 <li>177.238.236.21</li>
 <li>94.78.80.45</li>
 </ul>
 <p>Now we can look at <i>other</i> things that they&#x2019;re doing to build and refine our definition of what an alert is without resorting to hard-coding of rules.  Note that nothing in our enrichments actually used the string <tt>busybox</tt>, so this is a more general purpose way of navigating similar things.</p>
 <div class="section">
 <h3><a name="Version_Info"></a>Version Info</h3>
 <p>Verified against:</p>
 <ul>

 <li>METRON_VERSION=0.5.0</li>
 <li>ELASTIC_VERSION=5.6.2</li>
 </ul></div></div>
         </div>
       </div>
     </div>
     <hr/>
     <footer>
       <div class="container-fluid">
         <div class="row-fluid">
 Â© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo,
             and the Apache Metron project logo are trademarks of The Apache Software Foundation.
         </div>
       </div>
     </footer>
   </body>
 </html>