blob: a5ac1585a274dc4cfc1e644211763329a38f904a [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/markdown/metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md at 2019-05-14
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20190514" />
<meta http-equiv="Content-Language" content="en" />
<title>Metron &#x2013; Parser Chaining</title>
<link rel="stylesheet" href="../../../css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="../../../css/site.css" />
<link rel="stylesheet" href="../../../css/print.css" media="print" />
<script type="text/javascript" src="../../../js/apache-maven-fluido-1.7.min.js"></script>
<script type="text/javascript">
$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );
</script>
</head>
<body class="topBarDisabled">
<div class="container-fluid">
<div id="banner">
<div class="pull-left"><a href="http://metron.apache.org/" id="bannerLeft"><img src="../../../images/metron-logo.png" alt="Apache Metron" width="148px" height="48px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="http://metron.apache.org/" class="externalLink" title="Metron">Metron</a><span class="divider">/</span></li>
<li class=""><a href="../../../index.html" title="Documentation">Documentation</a><span class="divider">/</span></li>
<li class="active ">Parser Chaining</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-05-14</li>
<li id="projectVersion" class="pull-right">Version: 0.7.1</li>
</ul>
</div>
<div class="row-fluid">
<div id="leftColumn" class="span2">
<div class="well sidebar-nav">
<ul class="nav nav-list">
<li class="nav-header">User Documentation</li>
<li><a href="../../../index.html" title="Metron"><span class="icon-chevron-down"></span>Metron</a>
<ul class="nav nav-list">
<li><a href="../../../CONTRIBUTING.html" title="CONTRIBUTING"><span class="none"></span>CONTRIBUTING</a></li>
<li><a href="../../../Upgrading.html" title="Upgrading"><span class="none"></span>Upgrading</a></li>
<li><a href="../../../metron-analytics/index.html" title="Analytics"><span class="icon-chevron-right"></span>Analytics</a></li>
<li><a href="../../../metron-contrib/metron-docker/index.html" title="Docker"><span class="none"></span>Docker</a></li>
<li><a href="../../../metron-contrib/metron-performance/index.html" title="Performance"><span class="none"></span>Performance</a></li>
<li><a href="../../../metron-deployment/index.html" title="Deployment"><span class="icon-chevron-right"></span>Deployment</a></li>
<li><a href="../../../metron-interface/index.html" title="Interface"><span class="icon-chevron-right"></span>Interface</a></li>
<li><a href="../../../metron-platform/index.html" title="Platform"><span class="icon-chevron-down"></span>Platform</a>
<ul class="nav nav-list">
<li><a href="../../../metron-platform/Performance-tuning-guide.html" title="Performance-tuning-guide"><span class="none"></span>Performance-tuning-guide</a></li>
<li><a href="../../../metron-platform/metron-common/index.html" title="Common"><span class="none"></span>Common</a></li>
<li><a href="../../../metron-platform/metron-data-management/index.html" title="Data-management"><span class="none"></span>Data-management</a></li>
<li><a href="../../../metron-platform/metron-elasticsearch/index.html" title="Elasticsearch"><span class="none"></span>Elasticsearch</a></li>
<li><a href="../../../metron-platform/metron-enrichment/index.html" title="Enrichment"><span class="icon-chevron-right"></span>Enrichment</a></li>
<li><a href="../../../metron-platform/metron-hbase-server/index.html" title="Hbase-server"><span class="none"></span>Hbase-server</a></li>
<li><a href="../../../metron-platform/metron-indexing/index.html" title="Indexing"><span class="none"></span>Indexing</a></li>
<li><a href="../../../metron-platform/metron-job/index.html" title="Job"><span class="none"></span>Job</a></li>
<li><a href="../../../metron-platform/metron-management/index.html" title="Management"><span class="none"></span>Management</a></li>
<li><a href="../../../metron-platform/metron-parsing/index.html" title="Parsing"><span class="icon-chevron-down"></span>Parsing</a>
<ul class="nav nav-list">
<li><a href="../../../metron-platform/metron-parsing/metron-parsers/index.html" title="Parsers"><span class="icon-chevron-right"></span>Parsers</a></li>
<li><a href="../../../metron-platform/metron-parsing/metron-parsers-common/index.html" title="Parsers-common"><span class="icon-chevron-down"></span>Parsers-common</a>
<ul class="nav nav-list">
<li><a href="../../../metron-platform/metron-parsing/metron-parsers-common/3rdPartyParser.html" title="3rdPartyParser"><span class="none"></span>3rdPartyParser</a></li>
<li class="active"><a href="#"><span class="none"></span>ParserChaining</a></li>
<li><a href="../../../metron-platform/metron-parsing/metron-parsers-common/message-parser-implementation-notes.html" title="message-parser-implementation-notes"><span class="none"></span>message-parser-implementation-notes</a></li>
<li><a href="../../../metron-platform/metron-parsing/metron-parsers-common/parser-testing.html" title="parser-testing"><span class="none"></span>parser-testing</a></li>
<li><a href="../../../metron-platform/metron-parsing/metron-parsers-common/src/test/java/org/apache/metron/parsers/paloalto/index.html" title="Paloalto"><span class="none"></span>Paloalto</a></li>
</ul>
</li>
<li><a href="../../../metron-platform/metron-parsing/metron-parsing-storm/index.html" title="Parsing-storm"><span class="none"></span>Parsing-storm</a></li>
</ul>
</li>
<li><a href="../../../metron-platform/metron-pcap-backend/index.html" title="Pcap-backend"><span class="none"></span>Pcap-backend</a></li>
<li><a href="../../../metron-platform/metron-solr/index.html" title="Solr"><span class="none"></span>Solr</a></li>
<li><a href="../../../metron-platform/metron-writer/index.html" title="Writer"><span class="none"></span>Writer</a></li>
</ul>
</li>
<li><a href="../../../metron-sensors/index.html" title="Sensors"><span class="icon-chevron-right"></span>Sensors</a></li>
<li><a href="../../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example"><span class="none"></span>Stellar-3rd-party-example</a></li>
<li><a href="../../../metron-stellar/stellar-common/index.html" title="Stellar-common"><span class="icon-chevron-right"></span>Stellar-common</a></li>
<li><a href="../../../metron-stellar/stellar-zeppelin/index.html" title="Stellar-zeppelin"><span class="none"></span>Stellar-zeppelin</a></li>
<li><a href="../../../use-cases/index.html" title="Use-cases"><span class="icon-chevron-right"></span>Use-cases</a></li>
</ul>
</li>
</ul>
<hr />
<div id="poweredBy">
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="../../../images/logos/maven-feather.png" /></a>
</div>
</div>
</div>
<div id="bodyColumn" class="span10" >
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h1>Parser Chaining</h1>
<p><a name="Parser_Chaining"></a></p>
<p>Aggregating many different types sensors into a single data source (e.g. syslog) and ingesting that aggregate sensor into Metron is a common pattern. It is not obvious precisely how to manage these types of aggregate sensors as they require two-pass parsing. This document will walk through an example of supporting this kind of multi-pass ingest.</p>
<p>Multi-pass parser involves the following requirements:</p>
<ul>
<li>The enveloping parser (e.g. the aggregation format such as syslog or plain CSV) may contain metadata which should be ingested along with the data.</li>
<li>The enveloping sensor contains many different sensor types</li>
</ul>
<p><a name="High_Level_Solution"></a></p>
<h1>High Level Solution</h1>
<p><img src="../../../images/message_routing_high_level.svg" alt="High Level Approach" /></p>
<p>At a high level, we continue to maintain the architectural invariant of a 1-1 relationship between logical sensors and storm topologies. Eventually this relationship may become more complex, but at the moment the approach is to construct a routing parser which will have two responsibilities:</p>
<ul>
<li>Parse the envelope (e.g. syslog data) and extract any metadata fields from the envelope to pass along</li>
<li>Route the unfolded data to the appropriate kafka topic associated with the enveloped sensor data</li>
</ul>
<p>Because the data emitted from the routing parser is just like any data emitted from any other parser, in that it is a JSON blob like any data emitted from any parser, we will need to adjust the downstream parsers to extract the enveloped data from the JSON blob and treat it as the data to parse.</p>
<p><a name="Aggregated_Parsers_with_Parser_Chaining"></a></p>
<h1>Aggregated Parsers with Parser Chaining</h1>
<p>Chained parsers can be run as aggregated parsers. These parsers continue to use the sensor specific Kafka topics, and do not do internal routing to the appropriate sensor.</p>
<p>Say, there were three sensors (<tt>bro</tt>, <tt>snort</tt> and <tt>yaf</tt>). Instead of creating a topology per sensor, all 3 can be run in a single aggregated parser. It is also possible to aggregate a subset of these parsers (e.g. run <tt>bro</tt> as it&#x2019;s own topology, and aggregate the other 2).</p>
<p>The step to start an aggregated parsers then becomes</p>
<div>
<div>
<pre class="source">$METRON_HOME/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s bro,snort,yaf
</pre></div></div>
<p>which will result in a single storm topology named <tt>bro__snort__yaf</tt> to run.</p>
<p>Aggregated parsers can be specified using the Ambari Metron config as well under Services -&gt; Metron -&gt; Configs -&gt; &#x2018;Parsers&#x2019; tab -&gt; &#x2018;Metron Parsers&#x2019; field. The grouping is configured by enclosing the desired parsers in double quotes.</p>
<p>Some examples of specifying aggregated parsers are as follows:</p>
<ul>
<li>&#x201c;bro,snort,yaf&#x201d; &#x2013;&gt; Will start a single topology named <tt>bro__snort__yaf</tt></li>
<li>&#x201c;ciscopixA,ciscopixB&#x201d;,yaf,&#x201c;squid,ciscopixC&#x201d; &#x2013;&gt; Will start three topologies viz. <tt>ciscopixA__ciscopixB</tt>, <tt>yaf</tt> and <tt>squid__ciscopixC</tt></li>
</ul>
<p><a name="Architecting_a_Parser_Chaining_Solution_in_Metron"></a></p>
<h1>Architecting a Parser Chaining Solution in Metron</h1>
<p>Currently the approach to fulfill this requirement involves a couple knobs in the Parser infrastructure for Metron.</p>
<p>Consider the case, for instance, where we have many different TYPES of messages wrapped inside of syslog. As an architectural abstraction, we would want to have the following properties:</p>
<ul>
<li>separate the concerns of parsing the individual types of messages from each other</li>
<li>separate the concerns of parsing the individual types of messages from parsing the envelope</li>
</ul>
<div class="section">
<h2><a name="Data_Dependent_Parser_Writing"></a>Data Dependent Parser Writing</h2>
<p>Parsers allow users to configure the topic which the kafka producer uses in a couple of ways (from the parser config in an individual parser):</p>
<ul>
<li><tt>kafka.topic</tt> - Specify the topic in the config. This can be updated by updating the config, but it is data independent (e.g. not dependent on the data in a message).</li>
<li><tt>kafka.topicField</tt> - Specify the topic as the value of a particular field. If unpopulated, then the message is dropped. This is inherrently data dependent.</li>
</ul>
<p>The <tt>kafka.topicField</tt> parameter allows for data dependent topic selection and this inherrently enables the routing capabilities necessary for handling enveloped data.</p></div>
<div class="section">
<h2><a name="Flexibly_Interpreting_Data"></a>Flexibly Interpreting Data</h2>
<div class="section">
<h3><a name="Aside:_The_Role_of_Metadata_in_Metron"></a>Aside: The Role of Metadata in Metron</h3>
<p>Before we continue, let&#x2019;s briefly talk about metadata. We have exposed the ability to pass along metadata and interact with metadata in a decoupled way from the actual parser logic (i.e. the GrokParser should not have to consider how to interpret metadata).</p>
<p>There are three choices about manipulating metadata in Metron:</p>
<ul>
<li>Should you merge metadata into the downstream message?</li>
<li>If you do, should you use a key prefix to set it off from the message by default?</li>
</ul>
<p>This enables users to specify metadata independent of the data that is persisted downstream and can inform the operations of enrichment and the profiler.</p></div>
<div class="section">
<h3><a name="Interpretation"></a>Interpretation</h3>
<p>Now that we have an approach which enables the routing of the data, the remaining question is how to decouple <i>parsing</i> data from <i>interpreting</i> data and metadata. By default, Metron operates like so:</p>
<ul>
<li>The kafka record key (as a JSON Map) is considered metadata</li>
<li>The kafka record value is considered data</li>
</ul>
<p>Beyond that, we presume defaults for this default strategy around handling metadata. In particular, by default we do not merge metadata and use a <tt>metron.metadata</tt> prefix for all metadata.</p>
<p>In order to enable chained parser WITH metadata, we allow the following to be specified via strategy in the parser config:</p>
<ul>
<li>How to extract the data from the kafka record</li>
<li>How to extract the metadata from the kafka record</li>
<li>The default operations for merging</li>
<li>The prefix for the metadata key</li>
</ul>
<p>The available strategies, specified by the <tt>rawMessageStrategy</tt> configuration is either<tt>ENVELOPE</tt> or <tt>DEFAULT</tt>.</p>
<p>Specifically, to enable parsing enveloped data (i.e. data in a field of a JSON blob with the other fields being metadata), one can specify the strategy and configuration of that strategy in the parser config. One must specify the <tt>rawMessageStrategy</tt> as <tt>ENVELOPE</tt> in the parser and the <tt>rawMessageStrategyConfig</tt> to indicate the field which contains the data.</p>
<p>Together with routing, we have the complete solution to chain parsers which can:</p>
<ul>
<li>parse the envelope</li>
<li>route the parsed data to specific parsers</li>
<li>have the specific parsers interpret the data via the <tt>rawMessageStrategy</tt> whereby they pull the data out from JSON Map that they receive</li>
</ul>
<p>Together this enables a directed acyclic graph of parsers to handle single or multi-layer parsing.</p></div>
<div class="section">
<h3><a name="Example"></a>Example</h3>
<p>For a complete example, look at the <a href="../../../use-cases/parser_chaining/index.html">parser chaining use-case</a>, however for a simple example the following should suffice.</p>
<p>If I want to configure a CSV parser to parse data which has 3 columns <tt>f1</tt>, <tt>f2</tt> and <tt>f3</tt> and is held in a field called <tt>payload</tt> inside of a JSON Map, I can do so like this:</p>
<div>
<div>
<pre class="source">{
&quot;parserClassName&quot; : &quot;org.apache.metron.parsers.csv.CSVParser&quot;
,&quot;sensorTopic&quot; : &quot;my_topic&quot;
,&quot;rawMessageStrategy&quot; : &quot;ENVELOPE&quot;
,&quot;rawMessageStrategyConfig&quot; : {
&quot;messageField&quot; : &quot;payload&quot;,
&quot;metadataPrefix&quot; : &quot;&quot;
}
, &quot;parserConfig&quot;: {
&quot;columns&quot; : { &quot;f1&quot;: 0,
, &quot;f2&quot;: 1,
, &quot;f3&quot;: 2
}
}
}
</pre></div></div>
<p>This would parse the following message:</p>
<div>
<div>
<pre class="source">{
&quot;meta_f1&quot; : &quot;val1&quot;,
&quot;payload&quot; : &quot;foo,bar,grok&quot;,
&quot;original_string&quot; : &quot;2019 Jul, 01: val1 foo,bar,grok&quot;,
&quot;timestamp&quot; : 10000
}
</pre></div></div>
<p>into</p>
<div>
<div>
<pre class="source">{
&quot;meta_f1&quot; : &quot;val1&quot;,
&quot;f1&quot; : &quot;foo&quot;,
&quot;f2&quot; : &quot;bar&quot;,
&quot;f3&quot; : &quot;grok&quot;,
&quot;original_string&quot; : &quot;2019 Jul, 01: val1 foo,bar,grok&quot;,
&quot;timestamp&quot; : 10002
}
</pre></div></div>
<p>Note a couple of things here:</p>
<ul>
<li>The metadata field <tt>meta_f1</tt> is not prefixed here because we configured the strategy with <tt>metadataPrefix</tt> as empty string.</li>
<li>The <tt>timestamp</tt> is not inherited from the metadata</li>
<li>The <tt>original_string</tt> is inherited from the metadata</li>
</ul></div></div>
</div>
</div>
</div>
<hr/>
<footer>
<div class="container-fluid">
<div class="row-fluid">
© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo,
and the Apache Metron project logo are trademarks of The Apache Software Foundation.
</div>
</div>
</footer>
</body>
</html>