For a variety of components (threat intelligence triage and field transformations) we have the need to do simple computation and transformation using the data from messages as variables.
For those purposes, there exists a simple, scaled down DSL created to do simple computation and transformation.
The query language supports the following:
' or ".', ", \t, \r, \n, and backslash'\'foo\'' would represent 'foo'"\"foo\"" would represent "foo"'foo \\ bar' would represent foo \ barand, not, ortrue or FUNC() would never execute FUNC)*, /, +, - on real numbers or integers<, >, <=, >===, !=if var1 < 10 then 'less than 10' else '10 or more')exists)in operator that works like the in in PythonThe following keywords need to be single quote escaped in order to be used in Stellar expressions:
| not | else | exists | if | then |
| and | or | in | == | != |
| <= | > | >= | + | - |
| < | ? | * | / | , |
Using parens such as: “foo” : “<ok>” requires escaping; “foo”: “'<ok>'”
in and not in)in supports string contains. e.g. 'foo' in 'foobar' == truein supports collection contains. e.g. 'foo' in [ 'foo', 'bar' ] == truein supports map key contains. e.g. 'foo' in { 'foo' : 5} == truenot in is the negation of the in expression. e.g. 'grok' not in 'foobar' == true<, <=, >, >=)==, !=)Below is how the == operator is expected to work:
== expression.The != operator is the negation of the above.
Stellar provides the capability to pass lambda expressions to functions which wish to support that layer of indirection. The syntax is:
(named_variables) -> stellar_expression : Lambda expression with named variablesTO_UPPER on a named argument x could be be expressed as (x) -> TO_UPPER(x).var -> stellar_expression : Lambda expression with a single named variable, varTO_UPPER on a named argument x could be expressed as x -> TO_UPPER(x). Note, this is more succinct but equivalent to the example directly above.() -> stellar_expression : Lambda expression with no named variables.false would be () -> falsewhere
named_variables is a comma separated list of variables to use in the Stellar expressionstellar_expression is an arbitrary stellar expressionIn the core language functions, we support basic functional programming primitives such as
MAP - Applies a lambda expression over a list of input. For instance MAP([ 'foo', 'bar'], (x) -> TO_UPPER(x) ) returns [ 'FOO', 'BAR' ]FILTER - Filters a list by a predicate in the form of a lambda expression. For instance FILTER([ 'foo', 'bar'], (x ) -> x == 'foo' ) returns [ 'foo' ]REDUCE - Applies a function over a list of input. For instance REDUCE([ 1, 2, 3], (sum, x) -> sum + x, 0 ) returns 6APPEND_IF_MISSINGBLOOM_ADDBLOOM_EXISTSBLOOM_INITBLOOM_MERGECHOPCHOMPCOUNT_MATCHESDAY_OF_MONTHDAY_OF_WEEKDAY_OF_YEARDOMAIN_REMOVE_SUBDOMAINSDOMAIN_REMOVE_TLDDOMAIN_TO_TLDENDS_WITHENRICHMENT_EXISTSENRICHMENT_GETFILL_LEFTFILL_RIGHTFILTERFILTER( [ 'foo', 'bar' ] , (x) -> x == 'foo') would yield [ 'foo']FORMATGEO_GETGETGET_FIRSTGET_LASTIN_SUBNETIS_DATEIS_DOMAINIS_EMAILIS_EMPTYIS_INTEGERIS_IPIS_URLJOINKAFKA_GETKAFKA_PROPSKAFKA_PUTKAFKA_TAILLENGTHLIST_ADDMAAS_GET_ENDPOINTMAAS_MODEL_APPLYMAPMAP( [ 'foo', 'bar' ] , (x) -> TO_UPPER(x) ) would yield [ 'FOO', 'BAR' ]MAP_EXISTSMAP_GETMONTHPREPEND_IF_MISSINGPROFILE_GETPROFILE_FIXEDPROFILE_WINDOWPROTOCOL_TO_NAMEREDUCEREDUCE( [ 1, 2, 3 ] , (x, y) -> x + y, 0) would sum the input list, yielding 6.REGEXP_MATCHSTRING_ENTROPYSPLITSTARTS_WITHSYSTEM_ENV_GETSYSTEM_PROPERTY_GETTO_DOUBLETO_EPOCH_TIMESTAMPTO_FOATTO_INTEGERTO_LONGTO_LOWERTO_STRINGTO_UPPERTRIMURL_TO_HOSTURL_TO_PATHURL_TO_PORTURL_TO_PROTOCOLWEEK_OF_MONTHWEEK_OF_YEARYEARThe following is an example query (i.e. a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
ip field is in the 192.168.0.0/24 subnetip field is 10.0.0.1 or 10.0.0.2is_local existsThe following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp and dc field, we want to set the transform the timestamp to an epoch timestamp given a timezone which we will lookup in a separate map, called dc2tz.
This will convert the timestamp field to an epoch timestamp based on the
yyyy-MM-dd HH:mm:ssdc2tz associated with the value associated with field dc, defaulting to UTCA microbenchmarking utility is included to assist in executing microbenchmarks for Stellar functions. The utility can be executed via maven using the exec plugin, like so, from the metron-common directory:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" -Dexec.args="..."
where exec.args can be one of the following:
-e,--expressions <FILE> Stellar expressions
-h,--help Generate Help screen
-n,--num_times <NUM> Number of times to run per expression (after
warmup). Default: 1000
-o,--output <FILE> File to write output.
-p,--percentiles <NUM> Percentiles to calculate per run. Default:
50.0,75.0,95.0,99.0
-v,--variables <FILE> File containing a JSON Map of variables to use
-w,--warmup <NUM> Number of times for warmup per expression.
Default: 100
For instance, to run with a set of Stellar expression in file /tmp/expressions.txt:
# simple functions
TO_UPPER('casey')
TO_LOWER(name)
# math functions
1 + 2*(3 + int_num) / 10.0
1.5 + 2*(3 + double_num) / 10.0
# conditionals
if ('foo' in ['foo']) OR one == very_nearly_one then 'one' else 'two'
1 + 2*(3 + int_num) / 10.0
#Network funcs
DOMAIN_TO_TLD(domain)
DOMAIN_REMOVE_SUBDOMAINS(domain)
And variables in file /tmp/variables.json:
{
"name" : "casey",
"int_num" : 1,
"double_num" : 17.5,
"one" : 1,
"very_nearly_one" : 1.000001,
"domain" : "www.google.com"
}
Written to file /tmp/output.txt would be the following command:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" \ -Dexec.args="-e /tmp/expressions.txt -v /tmp/variables.json -o ./output.json"
The Stellar Shell is a REPL (Read Eval Print Loop) for the Stellar language that helps troubleshooting, learning Stellar or even interacting with a live Metron cluster.
The Stellar DSL (domain specific language) is used to act upon streaming data within Apache Storm. It is difficult to troubleshoot Stellar when it can only be executed within a Storm topology. This REPL is intended to help mitigate that problem by allowing a user to replicate data encountered in production, isolate initialization errors, or understand function resolution problems.
The shell supports customization via ~/.inputrc as it is backed by a proper readline implementation.
Shell-like operations are supported such as
Note: Stellar classpath configuration from the global config is honored here if the REPL knows about zookeeper.
To run the Stellar Shell from within a deployed Metron cluster, run the following command on the host where Metron is installed.
$ $METRON_HOME/bin/stellar
Stellar, Go!
{es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH}
[Stellar]>>> %functions
BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, ...
[Stellar]>>> ?PROTOCOL_TO_NAME
PROTOCOL_TO_NAME
desc: Convert the IANA protocol number to the protocol name
args: IANA Number
ret: The protocol name associated with the IANA number.
[Stellar]>>> ip.protocol := 6
6
[Stellar]>>> PROTOCOL_TO_NAME(ip.protocol)
TCP
$ $METRON_HOME/bin/stellar -h
usage: stellar
-h,--help Print help
-irc,--inputrc <arg> File containing the inputrc if not the default
~/.inputrc
-v,--variables <arg> File containing a JSON Map of variables
-z,--zookeeper <arg> Zookeeper URL
-na,--no_ansi Make the input prompt not use ANSI colors.
-v, --variablesOptional
Optionally load a JSON map which contains variable assignments. This is intended to give you the ability to save off a message from Metron and work on it via the REPL.
-z, --zookeeperOptional
Attempts to connect to Zookeeper and read the Metron global configuration. Stellar functions may require the global configuration to work properly. If found, the global configuration values are printed to the console. If specified, then the classpath may be augmented by the paths specified in the stellar config in the global config.
$ $METRON_HOME/bin/stellar -z node1:2181
Stellar, Go!
{es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH}
[Stellar]>>>
Stellar has no concept of variable assignment. For testing and debugging purposes, it is important to be able to create variables that simulate data contained within incoming messages. The REPL has created a means for a user to perform variable assignment outside of the core Stellar language. This is done via the := operator, such as foo := 1 + 1 would assign the result of the stellar expression 1 + 1 to the variable foo.
[Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> 2 + 2 4.0
The REPL has a set of magic commands that provide the REPL user with information about the Stellar execution environment. The following magic commands are supported.
%functionsThis command lists all functions resolvable in the Stellar environment. Stellar searches the classpath for Stellar functions. This can make it difficult in some cases to understand which functions are resolvable.
[Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, GET, GET_FIRST, GET_LAST, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP_EXISTS, MAP_GET, MONTH, PROTOCOL_TO_NAME, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR [Stellar]>>>
%varsLists all variables in the Stellar environment.
Stellar, Go!
{es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH}
[Stellar]>>> %vars
[Stellar]>>> foo := 2 + 2
4.0
[Stellar]>>> %vars
foo = 4.0
?<function>Returns formatted documentation of the Stellar function. Provides the description of the function along with the expected arguments.
[Stellar]>>> ?BLOOM_ADD BLOOM_ADD desc: Adds an element to the bloom filter passed in args: bloom - The bloom filter, value* - The values to add ret: Bloom Filter [Stellar]>>> ?IS_EMAIL IS_EMAIL desc: Tests if a string is a valid email address args: address - The String to test ret: True if the string is a valid email address and false otherwise. [Stellar]>>>
To run the Stellar Shell directly from the Metron source code, run a command like the following. Ensure that Metron has already been built and installed with mvn clean install -DskipTests.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-platform/metron-enrichment ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... [Stellar]>>> %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
Changing the project passed to the -pl argument will define which dependencies are included and ultimately which Stellar functions are available within the shell environment.
This can be useful for troubleshooting function resolution problems. The previous example defines which functions are available during Enrichment. For example, to determine which functions are available within the Profiler run the following.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-analytics/metron-profiler ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
The format of the global enrichment is a JSON String to Object map. This is intended for configuration which is non sensor specific configuration.
This configuration is stored in zookeeper, but looks something like
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "parser.error.topic": "indexing" "fieldValidations" : [ { "input" : [ "ip_src_addr", "ip_dst_addr" ], "validation" : "IP", "config" : { "type" : "IPV4" } } ] }
Stellar can be configured in a variety of ways from the global config. In particular, there are three main configuration parameters around configuring Stellar:
stellar.function.pathsstellar.function.resolver.includesstellar.function.resolver.excludesstellar.function.pathsIf specified, Stellar will use a custom classloader which will wrap the context classloader and allow for the resolution of classes stored in jars not shipped with Metron and stored in a variety of mediums:
This path is a comma separated list of
{ ... "stellar.function.paths" : "hdfs://node1:8020/apps/metron/stellar/metron-management-0.4.0.jar, hdfs://node1:8020/apps/metron/3rdparty/.*.jar" }
Please be aware that this classloader does not reload functions dynamically and the classpath specified here in the global config is read on topology start. A change in classpath, to be picked up, would necessitate a topology restart at the moment
stellar.function.resolver.{includes,excludes}If specified, this defines one or more regular expressions applied to the classes implementing the Stellar function that specify what should be included when searching for Stellar functions.
stellar.function.resolver.includes defines the list of classes to include.stellar.function.resolver.excludes defines the list of classes to exclude.{ ... "stellar.function.resolver.includes" : "org.apache.metron.*,com.myorg.stellar.*" }
Inside of the global configuration, there is a validation framework in place that enables the validation that messages coming from all parsers are valid. This is done in the form of validation plugins where assertions about fields or whole messages can be made.
The format for this is a fieldValidations field inside of global config. This is associated with an array of field validation objects structured like so:
input : An array of input fields or a single field. If this is omitted, then the whole messages is passed to the validator.config : A String to Object map for validation configuration. This is optional if the validation function requires no configuration.validation : The validation function to be used. This is one ofSTELLAR : Execute a Stellar Language statement. Expects the query string in the condition field of the config.IP : Validates that the input fields are an IP address. By default, if no configuration is set, it assumes IPV4, but you can specify the type by passing in the config by passing in type with either IPV6 or IPV4 or by passing in a list [IPV4,IPV6] in which case the input(s) will be validated against both.DOMAIN : Validates that the fields are all domains.EMAIL : Validates that the fields are all email addressesURL : Validates that the fields are all URLsDATE : Validates that the fields are a date. Expects format in the config.INTEGER : Validates that the fields are an integer. String representation of an integer is allowed.REGEX_MATCH : Validates that the fields match a regex. Expects pattern in the config.NOT_EMPTY : Validates that the fields exist and are not empty (after trimming.)Configurations should be stored on disk in the following structure starting at $BASE_DIR:
sensors : The subdirectory containing sensor enrichment configuration JSON (e.g. snort.json, bro.json)By default, this directory as deployed by the ansible infrastructure is at $METRON_HOME/config/zookeeper
While the configs are stored on disk, they must be loaded into Zookeeper to be used. To this end, there is a utility program to assist in this called $METRON_HOME/bin/zk_load_config.sh
This has the following options:
-f,--force Force operation
-h,--help Generate Help screen
-i,--input_dir <DIR> The input directory containing
the configuration files named
like "$source.json"
-m,--mode <MODE> The mode of operation: DUMP,
PULL, PUSH
-o,--output_dir <DIR> The output directory which will
store the JSON configuration
from Zookeeper
-z,--zk_quorum <host:port,[host:port]*> Zookeeper Quorum URL
(zk1:port,zk2:port,...)
Usage examples:
$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m DUMP$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m PUSH -i $METRON_HOME/config/zookeeper$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m PULL -o $METRON_HOME/config/zookeeper -fErrors generated in Metron topologies are transformed into JSON format and follow this structure:
{
"exception": "java.lang.IllegalStateException: Unable to parse Message: ...",
"failed_sensor_type": "bro",
"stack": "java.lang.IllegalStateException: Unable to parse Message: ...",
"hostname": "node1",
"source:type": "error",
"raw_message": "{\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...",
"error_hash": "f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061",
"error_type": "parser_error",
"message": "Unable to parse Message: {\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...",
"timestamp": 1488809630698
}
Each topology can be configured to send error messages to a specific Kafka topic. The parser topologies retrieve this setting from the the parser.error.topic setting in the global config:
{
"es.clustername": "metron",
"es.ip": "node1",
"es.port": "9300",
"es.date.format": "yyyy.MM.dd.HH",
"parser.error.topic": "indexing"
}
Error topics for enrichment and threat intel errors are passed into the enrichment topology as flux properties named enrichment.error.topic and threat.intel.error.topic. These properties can be found in $METRON_HOME/config/enrichment.properties.
The error topic for indexing errors is passed into the indexing topology as a flux property named index.error.topic. This property can be found in either $METRON_HOME/config/elasticsearch.properties or $METRON_HOME/config/solr.properties depending on the search engine selected.
By default all error messages are sent to the indexing topic so that they are indexed and archived, just like other messages. The indexing config for error messages can be found at $METRON_HOME/config/zookeeper/indexing/error.json.