blob: 3f96fbf4805a7dcdbd17ecb94843126007dbd30a [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/markdown/metron-platform/metron-parsing/index.md at 2019-05-14
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20190514" />
<meta http-equiv="Content-Language" content="en" />
<title>Metron &#x2013; Parsers</title>
<link rel="stylesheet" href="../../css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="../../css/site.css" />
<link rel="stylesheet" href="../../css/print.css" media="print" />
<script type="text/javascript" src="../../js/apache-maven-fluido-1.7.min.js"></script>
<script type="text/javascript">
$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );
</script>
</head>
<body class="topBarDisabled">
<div class="container-fluid">
<div id="banner">
<div class="pull-left"><a href="http://metron.apache.org/" id="bannerLeft"><img src="../../images/metron-logo.png" alt="Apache Metron" width="148px" height="48px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="http://metron.apache.org/" class="externalLink" title="Metron">Metron</a><span class="divider">/</span></li>
<li class=""><a href="../../index.html" title="Documentation">Documentation</a><span class="divider">/</span></li>
<li class="active ">Parsers</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-05-14</li>
<li id="projectVersion" class="pull-right">Version: 0.7.1</li>
</ul>
</div>
<div class="row-fluid">
<div id="leftColumn" class="span2">
<div class="well sidebar-nav">
<ul class="nav nav-list">
<li class="nav-header">User Documentation</li>
<li><a href="../../index.html" title="Metron"><span class="icon-chevron-down"></span>Metron</a>
<ul class="nav nav-list">
<li><a href="../../CONTRIBUTING.html" title="CONTRIBUTING"><span class="none"></span>CONTRIBUTING</a></li>
<li><a href="../../Upgrading.html" title="Upgrading"><span class="none"></span>Upgrading</a></li>
<li><a href="../../metron-analytics/index.html" title="Analytics"><span class="icon-chevron-right"></span>Analytics</a></li>
<li><a href="../../metron-contrib/metron-docker/index.html" title="Docker"><span class="none"></span>Docker</a></li>
<li><a href="../../metron-contrib/metron-performance/index.html" title="Performance"><span class="none"></span>Performance</a></li>
<li><a href="../../metron-deployment/index.html" title="Deployment"><span class="icon-chevron-right"></span>Deployment</a></li>
<li><a href="../../metron-interface/index.html" title="Interface"><span class="icon-chevron-right"></span>Interface</a></li>
<li><a href="../../metron-platform/index.html" title="Platform"><span class="icon-chevron-down"></span>Platform</a>
<ul class="nav nav-list">
<li><a href="../../metron-platform/Performance-tuning-guide.html" title="Performance-tuning-guide"><span class="none"></span>Performance-tuning-guide</a></li>
<li><a href="../../metron-platform/metron-common/index.html" title="Common"><span class="none"></span>Common</a></li>
<li><a href="../../metron-platform/metron-data-management/index.html" title="Data-management"><span class="none"></span>Data-management</a></li>
<li><a href="../../metron-platform/metron-elasticsearch/index.html" title="Elasticsearch"><span class="none"></span>Elasticsearch</a></li>
<li><a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment"><span class="icon-chevron-right"></span>Enrichment</a></li>
<li><a href="../../metron-platform/metron-hbase-server/index.html" title="Hbase-server"><span class="none"></span>Hbase-server</a></li>
<li><a href="../../metron-platform/metron-indexing/index.html" title="Indexing"><span class="none"></span>Indexing</a></li>
<li><a href="../../metron-platform/metron-job/index.html" title="Job"><span class="none"></span>Job</a></li>
<li><a href="../../metron-platform/metron-management/index.html" title="Management"><span class="none"></span>Management</a></li>
<li class="active"><a href="#"><span class="icon-chevron-down"></span>Parsing</a>
<ul class="nav nav-list">
<li><a href="../../metron-platform/metron-parsing/metron-parsers/index.html" title="Parsers"><span class="icon-chevron-right"></span>Parsers</a></li>
<li><a href="../../metron-platform/metron-parsing/metron-parsers-common/index.html" title="Parsers-common"><span class="icon-chevron-right"></span>Parsers-common</a></li>
<li><a href="../../metron-platform/metron-parsing/metron-parsing-storm/index.html" title="Parsing-storm"><span class="none"></span>Parsing-storm</a></li>
</ul>
</li>
<li><a href="../../metron-platform/metron-pcap-backend/index.html" title="Pcap-backend"><span class="none"></span>Pcap-backend</a></li>
<li><a href="../../metron-platform/metron-solr/index.html" title="Solr"><span class="none"></span>Solr</a></li>
<li><a href="../../metron-platform/metron-writer/index.html" title="Writer"><span class="none"></span>Writer</a></li>
</ul>
</li>
<li><a href="../../metron-sensors/index.html" title="Sensors"><span class="icon-chevron-right"></span>Sensors</a></li>
<li><a href="../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example"><span class="none"></span>Stellar-3rd-party-example</a></li>
<li><a href="../../metron-stellar/stellar-common/index.html" title="Stellar-common"><span class="icon-chevron-right"></span>Stellar-common</a></li>
<li><a href="../../metron-stellar/stellar-zeppelin/index.html" title="Stellar-zeppelin"><span class="none"></span>Stellar-zeppelin</a></li>
<li><a href="../../use-cases/index.html" title="Use-cases"><span class="icon-chevron-right"></span>Use-cases</a></li>
</ul>
</li>
</ul>
<hr />
<div id="poweredBy">
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" /></a>
</div>
</div>
</div>
<div id="bodyColumn" class="span10" >
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h1>Parsers</h1>
<p><a name="Parsers"></a></p>
<div class="section">
<h2><a name="Contents"></a>Contents</h2>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Parser_Error_Routing">Parser Error Routing</a></li>
<li><a href="#Filtering">Filtering</a></li>
<li><a href="#Parser_Architecture">Parser Architecture</a></li>
<li><a href="#Message_Format">Message Format</a></li>
<li><a href="#Global_Configuration">Global Configuration</a></li>
<li><a href="#Parser_Configuration">Parser Configuration</a></li>
<li><a href="#Parser_Adapters">Parser Adapters</a></li>
<li><a href="#Kafka_Queue">Kafka Queue</a></li>
<li><a href="#JSON_Path">JSON Path</a></li>
</ul></div>
<div class="section">
<h2><a name="Introduction"></a>Introduction</h2>
<p>Parsers are pluggable components which are used to transform raw data (textual or raw bytes) into JSON messages suitable for downstream enrichment and indexing.</p>
<p>There are two general types types of parsers:</p>
<ul>
<li>A parser written in Java which conforms to the <tt>MessageParser</tt> interface. This kind of parser is optimized for speed and performance and is built for use with higher velocity topologies. These parsers are not easily modifiable and in order to make changes to them the entire topology need to be recompiled.</li>
<li>A general purpose parser. This type of parser is primarily designed for lower-velocity topologies or for quickly standing up a parser for a new telemetry before a permanent Java parser can be written for it. As of the time of this writing, we have:
<ul>
<li>Grok parser: <tt>org.apache.metron.parsers.GrokParser</tt> with possible <tt>parserConfig</tt> entries of
<ul>
<li><tt>grokPath</tt> : The path in HDFS (or in the Jar) to the grok statement. By default attempts to load from HDFS, then falls back to the classpath, and finally throws an exception if unable to load a pattern.</li>
<li><tt>patternLabel</tt> : The pattern label to use from the grok statement</li>
<li><tt>multiLine</tt> : The raw data passed in should be handled as a long with multiple lines, with each line to be parsed separately. This setting&#x2019;s valid values are &#x2018;true&#x2019; or &#x2018;false&#x2019;. The default if unset is &#x2018;false&#x2019;. When set the parser will handle multiple lines with successfully processed lines emitted normally, and lines with errors sent to the error topic.</li>
<li><tt>timestampField</tt> : The field to use for timestamp. If your data does not have a field exactly named &#x201c;timestamp&#x201d; this field is required, otherwise the record will not pass validation. If the timestampField is also included in the list of timeFields, it will first be parsed using the provided dateFormat.</li>
<li><tt>timeFields</tt> : A list of fields to be treated as time.</li>
<li><tt>dateFormat</tt> : The date format to use to parse the time fields. Default is &#x201c;yyyy-MM-dd HH:mm:ss.S z&#x201d;.</li>
<li><tt>timezone</tt> : The timezone to use. <tt>UTC</tt> is default.</li>
<li>The Grok parser supports either 1 line to parse per incoming message, or incoming messages with multiple log lines, and will produce a json message per line</li>
</ul>
</li>
<li>CSV Parser: <tt>org.apache.metron.parsers.csv.CSVParser</tt> with possible <tt>parserConfig</tt> entries of
<ul>
<li><tt>timestampFormat</tt> : The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is ms since unix epoch.</li>
<li><tt>columns</tt> : A map of column names you wish to extract from the CSV to their offsets (e.g. <tt>{ 'name' : 1, 'profession' : 3}</tt> would be a column map for extracting the 2nd and 4th columns from a CSV)</li>
<li><tt>separator</tt> : The column separator, <tt>,</tt> by default.</li>
</ul>
</li>
<li>JSON Map Parser: <tt>org.apache.metron.parsers.json.JSONMapParser</tt> with possible <tt>parserConfig</tt> entries of
<ul>
<li><tt>mapStrategy</tt> : A strategy to indicate how to handle multi-dimensional Maps. This is one of
<ul>
<li><tt>DROP</tt> : Drop fields which contain maps</li>
<li><tt>UNFOLD</tt> : Unfold inner maps. So <tt>{ &quot;foo&quot; : { &quot;bar&quot; : 1} }</tt> would turn into <tt>{&quot;foo.bar&quot; : 1}</tt></li>
<li><tt>ALLOW</tt> : Allow multidimensional maps</li>
<li><tt>ERROR</tt> : Throw an error when a multidimensional map is encountered</li>
</ul>
</li>
<li><tt>jsonpQuery</tt> : A <a href="#json_path">JSON Path</a> query string. If present, the result of the JSON Path query should be a list of messages. This is useful if you have a JSON document which contains a list or array of messages embedded in it, and you do not have another means of splitting the message.</li>
<li><tt>wrapInEntityArray</tt> : <tt>&quot;true&quot; or &quot;false&quot;</tt>. If <tt>jsonQuery</tt> is present and this flag is present and set to <tt>&quot;true&quot;</tt>, the incoming message will be wrapped in a JSON entity and array. for example: <tt>{&quot;name&quot;:&quot;value&quot;},{&quot;name2&quot;,&quot;value2}</tt> will be wrapped as <tt>{&quot;message&quot; : [{&quot;name&quot;:&quot;value&quot;},{&quot;name2&quot;,&quot;value2}]}</tt>. This is using the default value for <tt>wrapEntityName</tt> if that property is not set.</li>
<li><tt>wrapEntityName</tt> : Sets the name to use when wrapping JSON using <tt>wrapInEntityArray</tt>. The <tt>jsonpQuery</tt> should reference this name.</li>
<li>A field called <tt>timestamp</tt> is expected to exist and, if it does not, then current time is inserted.</li>
</ul>
</li>
<li>Regular Expressions Parser
<ul>
<li><tt>recordTypeRegex</tt> : A regular expression to uniquely identify a record type.</li>
<li><tt>messageHeaderRegex</tt> : A regular expression used to extract fields from a message part which is common across all the messages.</li>
<li><tt>convertCamelCaseToUnderScore</tt> : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. For example, following conversions will automatically happen:
<div>
<div>
<pre class="source">ipSrcAddr -&gt; ip_src_addr
ipDstAddr -&gt; ip_dst_addr
ipSrcPort -&gt; ip_src_port
</pre></div></div>
<p>Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property.</p></li>
<li>
<p><tt>fields</tt> : A json list of maps contaning a record type to regular expression mapping.</p>
</li>
</ul>
<p>A complete configuration example would look like:</p>
<div>
<div>
<pre class="source">&quot;convertCamelCaseToUnderScore&quot;: true,
&quot;recordTypeRegex&quot;: &quot;kernel|syslog&quot;,
&quot;messageHeaderRegex&quot;: &quot;(&lt;syslogPriority&gt;(&lt;=^&amp;lt;)\\d{1,4}(?=&gt;)).*?(&lt;timestamp&gt;(&lt;=&gt;)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(&lt;syslogHost&gt;(&lt;=\\s).*?(?=\\s))&quot;,
&quot;fields&quot;: [
{
&quot;recordType&quot;: &quot;kernel&quot;,
&quot;regex&quot;: &quot;.*(&lt;eventInfo&gt;(&lt;=\\]|\\w\\:).*?(?=$))&quot;
},
{
&quot;recordType&quot;: &quot;syslog&quot;,
&quot;regex&quot;: &quot;.*(&lt;processid&gt;(&lt;=PID\\s=\\s).*?(?=\\sLine)).*(&lt;filePath&gt;(&lt;=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w)) (&lt;fileName&gt;.*?(?=\&quot;)).*(&lt;eventInfo&gt;(&lt;=\&quot;).*?(?=$))&quot;
}
]
</pre></div></div>
<p><b>Note</b>: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g.</p>
<div>
<div>
<pre class="source">&quot;messageHeaderRegex&quot;: [
&quot;regular expression 1&quot;,
&quot;regular expression 2&quot;
]
</pre></div></div>
<p>Where <b>regular expression 1</b> are valid regular expressions and may have named groups, which would be extracted into fields. This list will be evaluated in order until a matching regular expression is found.</p>
<p><b>messageHeaderRegex</b> is run on all the messages. Yes, all the messages are expected to contain the fields which are being extracted using the <b>messageHeaderRegex</b>. <b>messageHeaderRegex</b> is a sort of HCF (highest common factor) in all messages.</p>
<p><b>recordTypeRegex</b> can be a more advanced regular expression containing named goups. For example</p>
<p>&#x201c;recordTypeRegex&#x201d;: &#x201c;(&lt;process&gt;(&lt;=\s)\b(kernel|syslog)\b(?=\[|:))&#x201d;</p>
<p>Here all the named goups (process in above example) will be extracted as fields.</p>
<p>Though having named group in recordType is completely optional, still one could want extract named groups in recordType for following reasons:</p>
<ol style="list-style-type: decimal">
<li>Since <b>recordType</b> regular expression is already getting matched and we are paying the price for a regular expression match already, we can extract certain fields as a by product of this match.</li>
<li>Most likely the <b>recordType</b> field is common across all the messages. Hence having it extracted in the recordType (or messageHeaderRegex) would reduce the overall complexity of regular expressions in the regex field.</li>
</ol>
<p><b>regex</b> within a field could be a list of regular expressions also. In this case all regular expressions in the list will be attempted to match until a match is found. Once a full match is found remaining regular expressions are ignored.</p>
<div>
<div>
<pre class="source">&quot;regex&quot;: [ &quot;record type specific regular expression 1&quot;,
&quot;record type specific regular expression 2&quot;]
</pre></div></div>
<p><b>timesamp</b></p>
<p>Since this parser is a general purpose parser, it will populate the timestamp field with current UTC timestamp. Actual timestamp value can be overridden later using stellar. For example in case of syslog timestamps, one could use following stellar construct to override the timestamp value. Let us say you parsed actual timestamp from the raw log:</p>
<p><tt>&lt;38&gt;Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod from 55.55.55.55 port 66666 ssh2</tt></p>
<p>syslogTimestamp=&#x201c;Jun 20 15:01:17&#x201d;</p>
<p>Then something like below could be used to override the timestamp.</p>
<div>
<div>
<pre class="source">&quot;timestamp_str&quot;: &quot;FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)&quot;,
&quot;timestamp&quot;:&quot;TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )&quot;
</pre></div></div>
<p>OR, if you want to factor in the timezone</p>
<div>
<div>
<pre class="source">&quot;timestamp&quot;:&quot;TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, timezone_name )&quot;
</pre></div></div>
</li>
</ul>
</li>
</ul></div>
<div class="section">
<h2><a name="Parser_Message_Routing"></a>Parser Message Routing</h2>
<p>Messages are routed to the Kafka <tt>enrichments</tt> topic by default. The output topic can be changed with the <tt>output_topic</tt> option when <a href="metron-parsing-storm/index.html#Starting_the_Parser_Topology">Starting the Parser Topology</a> or with the <tt>outputTopic</tt> <a href="#Parser_Configuration">Parser Configuration</a> setting. The order of precedence from highest to lowest is as follows:</p>
<ol style="list-style-type: decimal">
<li>Parser start script option</li>
<li>Parser configuration setting</li>
<li>Default <tt>enrichments</tt> topic</li>
</ol>
<p>A message can also be routed to other locations besides Kafka with the <tt>writerClassName</tt> <a href="#Parser_Configuration">Parser Configuration</a> setting. Messages can be routed independently for each sensor type when configured with <a href="#Parser_Configuration">Parser Configuration</a> settings.</p></div>
<div class="section">
<h2><a name="Parser_Error_Routing"></a>Parser Error Routing</h2>
<p>Currently, we have a few mechanisms for either deferring processing of messages or marking messages as invalid.</p>
<div class="section">
<h3><a name="Invalidation_Errors"></a>Invalidation Errors</h3>
<p>There are two reasons a message will be marked as invalid:</p>
<ul>
<li>Fail <a href="../../metron-common/index.html#validation-framework">global validation</a></li>
<li>Fail the parser&#x2019;s validate function. Generally, that means not having a <tt>timestamp</tt> field or an <tt>original_string</tt> field.</li>
</ul>
<p>Those messages which are marked as invalid are sent to the error queue with an indication that they are invalid in the error message. The messages will contain &#x201c;error_type&#x201d;:&#x201c;parser_invalid&#x201d;. Note, you will not see additional exceptions in the logs for this type of failure, rather the error messages are written directly to the configured error topic. See <a href="../../metron-common/index.html#Topology_Errors">Topology Errors</a> for more.</p></div>
<div class="section">
<h3><a name="Parser_Errors"></a>Parser Errors</h3>
<p>Errors, which are defined as unexpected exceptions happening during the parse, are sent along to the error queue with a message indicating that there was an error in parse along with a stacktrace. This is to distinguish from the invalid messages.</p></div></div>
<div class="section">
<h2><a name="Filtering"></a>Filtering</h2>
<p>One can also filter a message by specifying a <tt>filterClassName</tt> in the parser config. Filtered messages are just dropped rather than passed through.</p></div>
<div class="section">
<h2><a name="Parser_Architecture"></a>Parser Architecture</h2>
<p><img src="../../images/parser_arch.png" alt="Architecture" /></p>
<p>Data flows through the parser via kafka and into the <tt>enrichments</tt> topology in kafka. Errors are collected with the context of the error (e.g. stacktrace) and original message causing the error and sent to an <tt>error</tt> queue. Invalid messages as determined by global validation functions are also treated as errors and sent to an <tt>error</tt> queue.</p></div>
<div class="section">
<h2><a name="Message_Format"></a>Message Format</h2>
<p>All Metron messages follow a specific format in order to ingest a message. If a message does not conform to this format it will be dropped and put onto an error queue for further examination. The message must be of a JSON format and must have a JSON tag message like so:</p>
<div>
<div>
<pre class="source">{&quot;message&quot; : message content}
</pre></div></div>
<p>Where appropriate there is also a standardization around the 5-tuple JSON fields. This is done so the topology correlation engine further down stream can correlate messages from different topologies by these fields. We are currently working on expanding the message standardization beyond these fields, but this feature is not yet availabe. The standard field names are as follows:</p>
<ul>
<li>ip_src_addr: layer 3 source IP</li>
<li>ip_dst_addr: layer 3 dest IP</li>
<li>ip_src_port: layer 4 source port</li>
<li>ip_dst_port: layer 4 dest port</li>
<li>protocol: layer 4 protocol</li>
<li>timestamp (epoch)</li>
<li>original_string: A human friendly string representation of the message</li>
</ul>
<p>The timestamp and original_string fields are mandatory. The remaining standard fields are optional. If any of the optional fields are not applicable then the field should be left out of the JSON.</p>
<p>So putting it all together a typical Metron message with all 5-tuple fields present would look like the following:</p>
<div>
<div>
<pre class="source">{
&quot;message&quot;: {
&quot;ip_src_addr&quot;: xxxx,
&quot;ip_dst_addr&quot;: xxxx,
&quot;ip_src_port&quot;: xxxx,
&quot;ip_dst_port&quot;: xxxx,
&quot;protocol&quot;: xxxx,
&quot;original_string&quot;: xxx,
&quot;additional-field 1&quot;: xxx
}
}
</pre></div></div>
</div>
<div class="section">
<h2><a name="Global_Configuration"></a>Global Configuration</h2>
<p>There are a few properties which can be managed in the global configuration that have pertinence to parsers and parsing in general.</p>
<div class="section">
<h3><a name="parser.error.topic"></a><tt>parser.error.topic</tt></h3>
<p>The topic where messages which were unable to be parsed due to error are sent. Error messages will be indexed under a sensor type of <tt>error</tt> and the messages will have the following fields:</p>
<ul>
<li><tt>sensor.type</tt>: <tt>error</tt></li>
<li><tt>failed_sensor_type</tt> : The sensor type of the message which wasn&#x2019;t able to be parsed</li>
<li><tt>error_type</tt> : The error type, in this case <tt>parser</tt>.</li>
<li><tt>stack</tt> : The stack trace of the error</li>
<li><tt>hostname</tt> : The hostname of the node where the error happened</li>
<li><tt>raw_message</tt> : The raw message in string form</li>
<li><tt>raw_message_bytes</tt> : The raw message bytes</li>
<li><tt>error_hash</tt> : A hash of the error message</li>
</ul>
<p>When aggregating multiple sensors, all sensors must be using the same error topic.</p></div></div>
<div class="section">
<h2><a name="Parser_Configuration"></a>Parser Configuration</h2>
<p>The configuration for the various parser topologies is defined by JSON documents stored in zookeeper.</p>
<p>The document is structured in the following way</p>
<ul>
<li><tt>parserClassName</tt> : The fully qualified classname for the parser to be used.</li>
<li><tt>filterClassName</tt> : The filter to use. This may be a fully qualified classname of a Class that implements the <tt>org.apache.metron.parsers.interfaces.MessageFilter&lt;JSONObject&gt;</tt> interface. Message Filters are intended to allow the user to ignore a set of messages via custom logic. The existing implementations are:
<ul>
<li><tt>STELLAR</tt> : Allows you to apply a stellar statement which returns a boolean, which will pass every message for which the statement returns <tt>true</tt>. The Stellar statement that is to be applied is specified by the <tt>filter.query</tt> property in the <tt>parserConfig</tt>.
<p>Example Stellar Filter which includes messages which contain a the <tt>field1</tt> field:</p>
<div>
<div>
<pre class="source">{
&quot;filterClassName&quot; : &quot;STELLAR&quot;,
&quot;parserConfig&quot; : {
&quot;filter.query&quot; : &quot;exists(field1)&quot;
}
}
</pre></div></div>
</li>
</ul>
</li>
<li>
<p><tt>writerClassName</tt> : The class used to write messages after they have been parsed. Defaults to <tt>org.apache.metron.writer.kafka.KafkaWriter</tt>.</p>
</li>
<li><tt>sensorTopic</tt> : The kafka topic to that the parser will read messages from. If the topic is prefixed and suffixed by <tt>/</tt> then it is assumed to be a regex and will match any topic matching the pattern (e.g. <tt>/bro.*/</tt> would match <tt>bro_cust0</tt>, <tt>bro_cust1</tt> and <tt>bro_cust2</tt>)</li>
<li><tt>readMetadata</tt> : Boolean indicating whether to read metadata or not (The default is raw message strategy dependent). See below for a discussion about metadata.</li>
<li><tt>mergeMetadata</tt> : Boolean indicating whether to merge metadata with the message or not (The default is raw message strategy dependent). See below for a discussion about metadata.</li>
<li><tt>rawMessageStrategy</tt> : The strategy to use when reading the raw data and metadata. See below for a discussion about message reading strategies.</li>
<li><tt>rawMessageStrategyConfig</tt> : The raw message strategy configuration map. See below for a discussion about message reading strategies.</li>
<li><tt>parserConfig</tt> : A JSON Map representing the parser implementation specific configuration. Also include batch sizing and timeout for writer configuration here.
<ul>
<li><tt>batchSize</tt> : Integer indicating number of records to batch together before sending to the writer. (default to <tt>15</tt>)</li>
<li><tt>batchTimeout</tt> : The timeout after which a batch will be flushed even if batchSize has not been met. Optional. If unspecified, or set to <tt>0</tt>, it defaults to a system-determined duration which is a fraction of the Storm parameter <tt>topology.message.timeout.secs</tt>. Ignored if batchSize is <tt>1</tt>, since this disables batching.</li>
<li>The kafka writer can be configured within the parser config as well. (This is all configured a priori, but this is convenient for overriding the settings). See <a href="../../metron-writer/index.html#kafka-writer">here</a></li>
</ul>
</li>
<li><tt>fieldTransformations</tt> : An array of complex objects representing the transformations to be done on the message generated from the parser before writing out to the kafka topic.</li>
<li><tt>securityProtocol</tt> : The security protocol to use for reading from kafka (this is a string). This can be overridden on the command line and also specified in the spout config via the <tt>security.protocol</tt> key. If both are specified, then they are merged and the CLI will take precedence. If multiple sensors are used, any non &#x201c;PLAINTEXT&#x201d; value will be used.</li>
<li><tt>cacheConfig</tt> : Cache config for stellar field transformations. This configures a least frequently used cache. This is a map with the following keys. If not explicitly configured (the default), then no cache will be used.
<ul>
<li><tt>stellar.cache.maxSize</tt> - The maximum number of elements in the cache. Default is to not use a cache.</li>
<li><tt>stellar.cache.maxTimeRetain</tt> - The maximum amount of time an element is kept in the cache (in minutes). Default is to not use a cache.
<p>Example of a cache config to contain at max <tt>20000</tt> stellar expressions for at most <tt>20</tt> minutes.:</p>
<div>
<div>
<pre class="source">{
&quot;cacheConfig&quot; : {
&quot;stellar.cache.maxSize&quot; : 20000,
&quot;stellar.cache.maxTimeRetain&quot; : 20
}
}
</pre></div></div>
</li>
</ul>
</li>
</ul>
<p>The <tt>fieldTransformations</tt> is a complex object which defines a transformation which can be done to a message. This transformation can</p>
<ul>
<li>Modify existing fields to a message</li>
<li>Add new fields given the values of existing fields of a message</li>
<li>Remove existing fields of a message</li>
</ul>
<p>For platform specific configs, see the README of the appropriate project. This would include settings such as parallelism of individual components and general configuration.</p>
<ul>
<li><a href="metron-parsing-storm/index.html#parser-configuration">Storm</a></li>
</ul>
<div class="section">
<h3><a name="Metadata"></a>Metadata</h3>
<p>Metadata is a useful thing to send to Metron and use during enrichment or threat intelligence.<br />
Consider the following scenarios:</p>
<ul>
<li>You have multiple telemetry sources of the same type that you want to
<ul>
<li>ensure downstream analysts can differentiate</li>
<li>ensure profiles consider independently as they have different seasonality or some other fundamental characteristic</li>
</ul>
</li>
</ul>
<p>As such, there are two types of metadata that we seek to support in Metron:</p>
<ul>
<li>Environmental metadata : Metadata about the system at large
<ul>
<li>Consider the possibility that you have multiple kafka topics being processed by one parser and you want to tag the messages with the kafka topic</li>
<li>At the moment, only the kafka topic is kept as the field name.</li>
</ul>
</li>
<li>Custom metadata: Custom metadata from an individual telemetry source that one might want to use within Metron.</li>
</ul>
<p>Metadata is controlled by the following parser configs:</p>
<ul>
<li><tt>rawMessageStrategy</tt> : This is a strategy which indicates how to read data and metadata. The strategies supported are:
<ul>
<li><tt>DEFAULT</tt> : Data is read directly from the kafka record value and metadata, if any, is read from the kafka record key. This strategy defaults to not reading metadata and not merging metadata. This is the default strategy.</li>
<li><tt>ENVELOPE</tt> : Data from kafka record value is presumed to be a JSON blob. One of these fields must contain the raw data to pass to the parser. All other fields should be considered metadata. The field containing the raw data is specified in the <tt>rawMessageStrategyConfig</tt>. Data held in the kafka key as well as the non-data fields in the JSON blob passed into the kafka value are considered metadata. Note that the exception to this is that any <tt>original_string</tt> field is inherited from the envelope data so that the original string contains the envelope data. If you do not prefer this behavior, remove this field from the envelope data.</li>
</ul>
</li>
<li><tt>rawMessageStrategyConfig</tt> : The configuration (a map) for the <tt>rawMessageStrategy</tt>. Available configurations are strategy dependent:
<ul>
<li><tt>DEFAULT</tt>
<ul>
<li><tt>metadataPrefix</tt> defines the key prefix for metadata (default is <tt>metron.metadata</tt>).</li>
</ul>
</li>
<li><tt>ENVELOPE</tt>
<ul>
<li><tt>metadataPrefix</tt> defines the key prefix for metadata (default is <tt>metron.metadata</tt>)</li>
<li><tt>messageField</tt> defines the field from the envelope to use as the data. All other fields are considered metadata.</li>
</ul>
</li>
</ul>
</li>
<li><tt>readMetadata</tt> : This is a boolean indicating whether metadata will be read and made available to Field transformations (i.e. Stellar field transformations). The default is dependent upon the <tt>rawMessageStrategy</tt>:
<ul>
<li><tt>DEFAULT</tt> : default to <tt>false</tt>.</li>
<li><tt>ENVELOPE</tt> : default to <tt>true</tt>.</li>
</ul>
</li>
<li><tt>mergeMetadata</tt> : This is a boolean indicating whether metadata fields will be merged with the message automatically. That is to say, if this property is set to <tt>true</tt> then every metadata field will become part of the messages and, consequently, also available for use in field transformations. The default is dependent upon the <tt>rawMessageStrategy</tt>:
<ul>
<li><tt>DEFAULT</tt> : default to <tt>false</tt>.</li>
<li><tt>ENVELOPE</tt> : default to <tt>true</tt>.</li>
</ul>
</li>
</ul>
<div class="section">
<h4><a name="Field_Naming"></a>Field Naming</h4>
<p>In order to avoid collisions from metadata fields, metadata fields will be prefixed (the default is <tt>metron.metadata.</tt>, but this is configurable in the <tt>rawMessageStrategyConfig</tt>). So, for instance the kafka topic would be in the field <tt>metron.metadata.topic</tt>.</p></div>
<div class="section">
<h4><a name="Specifying_Custom_Metadata"></a>Specifying Custom Metadata</h4>
<p>Custom metadata is specified by sending a JSON Map in the key. If no key is sent, then, obviously, no metadata will be parsed. For instance, sending a metadata field called <tt>customer_id</tt> could be done by sending</p>
<div>
<div>
<pre class="source">{
&quot;customer_id&quot; : &quot;my_customer_id&quot;
}
</pre></div></div>
<p>in the kafka key. This would be exposed as the field <tt>metron.metadata.customer_id</tt> to stellar field transformations as well, if <tt>mergeMetadata</tt> is <tt>true</tt>, available as a field in its own right.</p></div>
<div class="section">
<h4><a name="Metadata_and_Error_Handling"></a>Metadata and Error Handling</h4>
<p>When a telemetry message fails to parse correctly, a separate error message is produced and sent to the error topic. This error message will contain detailed information to reflect the error that occurred.</p>
<p>If the telemetry message that failed contains metadata, this metadata is included in the error message. For example, here is an error message that contains two metadata fields; <tt>metron.metadata.topic</tt> and <tt>metron.metadata.customer</tt>.</p>
<div>
<div>
<pre class="source">{
&quot;exception&quot;: &quot;java.lang.IllegalStateException: Unable to parse Message: \&quot;this is an invalid synthetic message\&quot; }&quot;,
&quot;stack&quot;: &quot;java.lang.IllegalStateException: Unable to parse Message: \&quot;this is an invalid synthetic message\&quot; ...\n&quot;,
&quot;raw_message&quot;: &quot;\&quot;this is an invalid synthetic message\&quot; }&quot;,
&quot;error_hash&quot;: &quot;3d498968e8df7f28d05db3037d4ad2a3a0095c22c14d881be45fac3f184dbcc3&quot;,
&quot;message&quot;: &quot;Unable to parse Message: \&quot;this is an invalid synthetic message\&quot; }&quot;,
&quot;source.type&quot;: &quot;error&quot;,
&quot;failed_sensor_type&quot;: &quot;bro&quot;,
&quot;hostname&quot;: &quot;node1&quot;,
&quot;error_type&quot;: &quot;parser_error&quot;,
&quot;guid&quot;: &quot;563d8d2a-1493-4758-be2f-5613bfd2d615&quot;,
&quot;timestamp&quot;: 1548366516634,
&quot;metron.metadata.topic&quot;: &quot;bro&quot;,
&quot;metron.metadata.customer&quot;: &quot;acme-inc&quot;
}
</pre></div></div>
<p>By default, error messages are sent to the <tt>indexing</tt> topic. This will cause the errors to be indexed in whichever endpoints you have configured, namely Solr, Elasticsearch, and HDFS. You may need to update your configuration of these endpoints to accurately reflect the metadata fields contained in the error message. For example, you may need to update the schema definition of your Solr Collection for the metadata fields to be accurately indexed in the Error collection.</p></div></div>
<div class="section">
<h3><a name="fieldTransformation_configuration"></a><tt>fieldTransformation</tt> configuration</h3>
<p>The format of a <tt>fieldTransformation</tt> is as follows:</p>
<ul>
<li><tt>input</tt> : An array of fields or a single field representing the input. This is optional; if unspecified, then the whole message is passed as input.</li>
<li><tt>output</tt> : The outputs to produce from the transformation. If unspecified, it is assumed to be the same as inputs.</li>
<li><tt>transformation</tt> : The fully qualified classname of the transformation to be used. This is either a class which implements <tt>FieldTransformation</tt> or a member of the <tt>FieldTransformations</tt> enum.</li>
<li><tt>config</tt> : A String to Object map of transformation specific configuration.</li>
</ul>
<p>The currently implemented fieldTransformations are:</p>
<ul>
<li>
<p><tt>REMOVE</tt> : This transformation removes the specified input fields. If you want a conditional removal, you can pass a Metron Query Language statement to define the conditions under which you want to remove the fields.</p>
<p>Consider the following simple configuration which will remove <tt>field1</tt> unconditionally:</p>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;input&quot; : &quot;field1&quot;
, &quot;transformation&quot; : &quot;REMOVE&quot;
}
]
}
</pre></div></div>
<p>Consider the following simple sensor parser configuration which will remove <tt>field1</tt> whenever <tt>field2</tt> exists and whose corresponding equal to &#x2018;foo&#x2019;:</p>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;input&quot; : &quot;field1&quot;
, &quot;transformation&quot; : &quot;REMOVE&quot;
, &quot;config&quot; : {
&quot;condition&quot; : &quot;exists(field2) and field2 == 'foo'&quot;
}
}
]
}
</pre></div></div>
</li>
<li>
<p><tt>SELECT</tt>: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included.</p>
<p>For example:</p>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;output&quot; : [&quot;field1&quot;, &quot;field2&quot; ]
, &quot;transformation&quot; : &quot;SELECT&quot;
}
]
}
</pre></div></div>
<p>when applied to a message containing keys field1, field2 and field3, will only output the first two. It is also worth noting that two standard fields - timestamp and original_source - will always be passed along whether they are listed in output or not, since they are considered core required fields.</p>
</li>
<li>
<p><tt>IP_PROTOCOL</tt> : This transformation maps IANA protocol numbers to consistent string representations.</p>
<p>Consider the following sensor parser config to map the <tt>protocol</tt> field to a textual representation of the protocol:</p>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;input&quot; : &quot;protocol&quot;
, &quot;transformation&quot; : &quot;IP_PROTOCOL&quot;
}
]
}
</pre></div></div>
<p>This transformation would transform <tt>{ &quot;protocol&quot; : 6, &quot;source.type&quot; : &quot;bro&quot;, ... }</tt> into <tt>{ &quot;protocol&quot; : &quot;TCP&quot;, &quot;source.type&quot; : &quot;bro&quot;, ...}</tt></p>
</li>
<li>
<p><tt>STELLAR</tt> : This transformation executes a set of transformations expressed as <a href="../../metron-common/index.html">Stellar Language</a> statements.</p>
</li>
<li>
<p><tt>RENAME</tt> : This transformation allows users to rename a set of fields. Specifically, the config is presumed to be the mapping. The keys to the config are the existing field names and the values for the config map are the associated new field name.</p>
<p>The following config will rename the fields <tt>old_field</tt> and <tt>different_old_field</tt> to <tt>new_field</tt> and <tt>different_new_field</tt> respectively:</p>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;transformation&quot; : &quot;RENAME&quot;,
, &quot;config&quot; : {
&quot;old_field&quot; : &quot;new_field&quot;,
&quot;different_old_field&quot; : &quot;different_new_field&quot;
}
}
]
}
</pre></div></div>
</li>
<li>
<p><tt>REGEX_SELECT</tt> : This transformation lets users set an output field to one of a set of possibilities based on matching regexes. This transformation is useful when the number or conditions are large enough to make a stellar language match statement unwieldy.</p>
<p>The following config will set the field <tt>logical_source_type</tt> to one of the following, dependent upon the value of the <tt>pix_type</tt> field:</p>
<ul>
<li><tt>cisco-6-302</tt> if <tt>pix_type</tt> starts with either <tt>6-302</tt> or <tt>06-302</tt></li>
<li><tt>cisco-5-304</tt> if <tt>pix_type</tt> starts with <tt>5-304</tt></li>
</ul>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;transformation&quot; : &quot;REGEX_ROUTING&quot;
,&quot;input&quot; : &quot;pix_type&quot;
,&quot;output&quot; : &quot;logical_source_type&quot;
,&quot;config&quot; : {
&quot;cisco-6-302&quot; : [ &quot;^6-302.*&quot;, &quot;^06-302.*&quot;]
&quot;cisco-5-304&quot; : &quot;^5-304.*&quot;
}
}
]
...
}
</pre></div></div>
</li>
</ul></div>
<div class="section">
<h3><a name="Assignment_to_null"></a>Assignment to <tt>null</tt></h3>
<p>If, in your field transformation, you assign a field to <tt>null</tt>, the field will be removed. You can use this capability to rename variables. It is preferred, however, that the <tt>RENAME</tt> field transformation is used in this situation as it is less awkward.</p>
<p>Consider this example:</p>
<div>
<div>
<pre class="source"> &quot;fieldTransformations&quot; : [
{ &quot;transformation&quot; : &quot;STELLAR&quot;
,&quot;output&quot; : [ &quot;new_field&quot;, &quot;old_field&quot;]
,&quot;config&quot; : {
&quot;new_field&quot; : &quot;old_field&quot;
,&quot;old_field&quot; : &quot;null&quot;
}
}
]
</pre></div></div>
<p>This would set <tt>new_field</tt> to the value of <tt>old_field</tt> and remove <tt>old_field</tt>.</p></div>
<div class="section">
<h3><a name="Warning:_Transforming_the_same_field_twice"></a>Warning: Transforming the same field twice</h3>
<p>Currently, the stellar expressions are expressed in the form of a map where the keys define the fields and the values define the Stellar expressions. You order the expression evaluation in the <tt>output</tt> field. A consequence of this choice to store the assignments as a map is that the same field cannot appear in the map as a key twice.</p>
<p>For instance, the following will not function as expected:</p>
<div>
<div>
<pre class="source"> &quot;fieldTransformations&quot; : [
{ &quot;transformation&quot; : &quot;STELLAR&quot;
,&quot;output&quot; : [ &quot;new_field&quot;]
,&quot;config&quot; : {
&quot;new_field&quot; : &quot;TO_UPPER(field1)&quot;
,&quot;new_field&quot; : &quot;TO_LOWER(new_field)&quot;
}
}
]
</pre></div></div>
<p>In the above example, the last instance of <tt>new_field</tt> will win and <tt>TO_LOWER(new_field)</tt> will be evaluated while <tt>TO_UPPER(field1)</tt> will be skipped.</p></div>
<div class="section">
<h3><a name="Example"></a>Example</h3>
<p>Consider the following sensor parser config to add three new fields to a message:</p>
<ul>
<li><tt>utc_timestamp</tt> : The unix epoch timestamp based on the <tt>timestamp</tt> field, a <tt>dc</tt> field which is the data center the message comes from and a <tt>dc2tz</tt> map mapping data centers to timezones</li>
<li><tt>url_host</tt> : The host associated with the url in the <tt>url</tt> field</li>
<li><tt>url_protocol</tt> : The protocol associated with the url in the <tt>url</tt> field</li>
</ul>
<div>
<div>
<pre class="source">{
...
&quot;fieldTransformations&quot; : [
{
&quot;transformation&quot; : &quot;STELLAR&quot;
,&quot;output&quot; : [ &quot;utc_timestamp&quot;, &quot;url_host&quot;, &quot;url_protocol&quot; ]
,&quot;config&quot; : {
&quot;utc_timestamp&quot; : &quot;TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd
HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )&quot;
,&quot;url_host&quot; : &quot;URL_TO_HOST(url)&quot;
,&quot;url_protocol&quot; : &quot;URL_TO_PROTOCOL(url)&quot;
}
}
]
,&quot;parserConfig&quot; : {
&quot;dc2tz&quot; : {
&quot;nyc&quot; : &quot;EST&quot;
,&quot;la&quot; : &quot;PST&quot;
,&quot;london&quot; : &quot;UTC&quot;
}
}
}
</pre></div></div>
<p>Note that the <tt>dc2tz</tt> map is in the parser config, so it is accessible in the functions.</p></div>
<div class="section">
<h3><a name="An_Example_Configuration_for_a_Sensor"></a>An Example Configuration for a Sensor</h3>
<p>Consider the following example configuration for the <tt>yaf</tt> sensor:</p>
<div>
<div>
<pre class="source">{
&quot;parserClassName&quot;:&quot;org.apache.metron.parsers.GrokParser&quot;,
&quot;sensorTopic&quot;:&quot;yaf&quot;,
&quot;fieldTransformations&quot; : [
{
&quot;input&quot; : &quot;protocol&quot;
,&quot;transformation&quot;: &quot;IP_PROTOCOL&quot;
}
],
&quot;parserConfig&quot;:
{
&quot;grokPath&quot;:&quot;/patterns/yaf&quot;,
&quot;patternLabel&quot;:&quot;YAF_DELIMITED&quot;,
&quot;timestampField&quot;:&quot;start_time&quot;,
&quot;timeFields&quot;: [&quot;start_time&quot;, &quot;end_time&quot;],
&quot;dateFormat&quot;:&quot;yyyy-MM-dd HH:mm:ss.S&quot;
}
}
</pre></div></div>
</div></div>
<div class="section">
<h2><a name="Parser_Adapters"></a>Parser Adapters</h2>
<p>Parser adapters are loaded dynamically in each Metron topology. They are defined in the Parser Config (defined above) JSON file in Zookeeper.</p>
<div class="section">
<h3><a name="Java_Parser_Adapters"></a>Java Parser Adapters</h3>
<p>Java parser adapters are intended for higher-velocity topologies and are not easily changed or extended. As the adoption of Metron continues we plan on extending our library of Java adapters to process more log formats. As of this moment the Java adapters included with Metron are:</p>
<ul>
<li>BasicIseParser : Parse ISE messages</li>
<li>org.apache.metron.parsers.bro.BasicBroParser : Parse Bro messages</li>
<li>org.apache.metron.parsers.sourcefire.BasicSourcefireParser : Parse Sourcefire messages</li>
<li>org.apache.metron.parsers.lancope.BasicLancopeParser : Parse Lancope messages</li>
<li>org.apache.metron.parsers.syslog.Syslog5424Parser : Parse Syslog RFC 5424 messages</li>
<li>org.apache.metron.parsers.syslog.Syslog3164Parser : Parse Syslog RFC 3164 messages</li>
</ul></div>
<div class="section">
<h3><a name="Grok_Parser_Adapters"></a>Grok Parser Adapters</h3>
<p>Grok parser adapters are designed primarily for someone who is not a Java coder for quickly standing up a parser adapter for lower velocity topologies. Grok relies on Regex for message parsing, which is much slower than purpose-built Java parsers, but is more extensible. Grok parsers are defined via a config file and the topplogy does not need to be recompiled in order to make changes to them. Example of a Grok parsers are:</p>
<ul>
<li>org.apache.metron.parsers.GrokParser and org.apache.metron.parsers.websphere.GrokWebSphereParser</li>
</ul>
<p>Parsers that derive from GrokParser typically allow the GrokParser to parse the messages, and then override the methods for postParse to do further parsing. When this is the case, and the Parser has not overridden <tt>parse(byte[])</tt> or <tt>parseResultOptional(byte[])</tt> these parsers will gain support for treating byte[] input as multiple lines, with each line parsed as a separate message ( and returned as such). This is enabled by using the <tt>&quot;multiline&quot;:&quot;true&quot;</tt> Parser configuration option.</p>
<p>For more information on the Grok project please refer to the following link:</p>
<p><a class="externalLink" href="https://github.com/thekrakken/java-grok">https://github.com/thekrakken/java-grok</a></p>
<p><a name="Starting_the_Parser"></a></p>
<h1>Starting the Parser</h1>
<p>Starting a particular parser on a running Metron deployment is dependent on the platform being run on. Please see the appropriate platform-specific README.</p>
<ul>
<li><a href="metron-parsing-storm/index.html#starting-the-parser-topology">Storm</a></li>
</ul>
<p>For all platforms, you will need to provide</p>
<ul>
<li>Zookeeper Quorum</li>
<li>Kafka Broker URL</li>
<li>Sensor type</li>
<li>Output topic</li>
<li>Kakfa Security Protocol (Optional)</li>
</ul>
<p><a name="Notes_on_Performance_Tuning"></a></p>
<h1>Notes on Performance Tuning</h1>
<p>Default installed Metron is untuned for production deployment. There are a few knobs to tune to get the most out of your system.</p>
<p>When using aggregated parsers, it&#x2019;s highly recommended to aggregate parsers with similar velocity and parser complexity together.</p>
<p>Platform specific notes can be found in the appropriate README</p>
<ul>
<li><a href="metron-parsing-storm/index.html">Storm</a></li>
</ul>
<p><a name="Notes_on_Adding_a_New_Sensor"></a></p>
<h1>Notes on Adding a New Sensor</h1>
<p>In order to allow for meta alerts to be queries alongside regular alerts in Elasticsearch 2.x, it is necessary to add an additional field to the templates and mapping for existing sensors.</p>
<p>Please see a description of the steps necessary to make this change in the metron-elasticsearch <a href="../../metron-elasticsearch/index.html#Using_Metron_with_Elasticsearch_2.x">Using Metron with Elasticsearch 2.x</a></p>
<p>If Solr is selected as the real-time store, it is also necessary to add additional fields. See the <a href="../../metron-indexing/index.html#Solr">Solr</a> section in metron-indexing for more details.</p></div></div>
<div class="section">
<h2><a name="Kafka_Queue"></a>Kafka Queue</h2>
<p>The kafka queue associated with your parser is a collection point for all of the data sent to your parser. As such, make sure that the number of partitions in the kafka topic is sufficient to handle the throughput that you expect from your parser topology.</p></div>
<div class="section">
<h2><a name="JSON_Path"></a>JSON Path</h2>
<blockquote>
<dl>
<dt>&#x201c;JSONPath expressions always refer to a JSON structure in the same way as XPath expression are used in combination with an XML document.&#x201d;</dt>
<dd>Stefan Goessner</dd>
</dl>
</blockquote>
<ul>
<li><a class="externalLink" href="http://goessner.net/articles/JsonPath/">JSON Path concept</a></li>
<li><a class="externalLink" href="https://github.com/json-path/JsonPath">Read about JSON Path library Apache Metron uses</a></li>
<li><a class="externalLink" href="http://jsonpath.herokuapp.com">Try JSON Path expressions online</a></li>
</ul></div>
</div>
</div>
</div>
<hr/>
<footer>
<div class="container-fluid">
<div class="row-fluid">
© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo,
and the Apache Metron project logo are trademarks of The Apache Software Foundation.
</div>
</div>
</footer>
</body>
</html>