For those purposes, there exists a simple, scaled down DSL created to do simple computation and transformation.
The query language supports the following:
'
or "
.'
, "
, \t
, \r
, \n
, and backslash'\'foo\''
would represent 'foo'
"\"foo\""
would represent "foo"
'foo \\ bar'
would represent foo \ bar
and
, not
, or
true or FUNC()
would never execute FUNC
)*
, /
, +
, -
on real numbers or integers<
, >
, <=
, >=
==
, !=
if var1 < 10 then 'less than 10' else '10 or more'
)exists
)in
operator that works like the in
in PythonThe following keywords need to be single quote escaped in order to be used in Stellar expressions:
not | else | exists | if | then |
and | or | in | == | != |
<= | > | >= | + | - |
< | ? | * | / | , |
Using parens such as: “foo” : “<ok>” requires escaping; “foo”: “'<ok>'”
in
and not in
)in
supports string contains. e.g. 'foo' in 'foobar' == true
in
supports collection contains. e.g. 'foo' in [ 'foo', 'bar' ] == true
in
supports map key contains. e.g. 'foo' in { 'foo' : 5} == true
not in
is the negation of the in expression. e.g. 'grok' not in 'foobar' == true
<
, <=
, >
, >=
)==
, !=
)Below is how the ==
operator is expected to work:
==
expression.The !=
operator is the negation of the above.
Stellar provides the capability to pass lambda expressions to functions which wish to support that layer of indirection. The syntax is:
(named_variables) -> stellar_expression
: Lambda expression with named variablesTO_UPPER
on a named argument x
could be be expressed as (x) -> TO_UPPER(x)
.var -> stellar_expression
: Lambda expression with a single named variable, var
TO_UPPER
on a named argument x
could be expressed as x -> TO_UPPER(x)
. Note, this is more succinct but equivalent to the example directly above.() -> stellar_expression
: Lambda expression with no named variables.false
would be () -> false
where
named_variables
is a comma separated list of variables to use in the Stellar expressionstellar_expression
is an arbitrary stellar expressionIn the core language functions, we support basic functional programming primitives such as
MAP
- Applies a lambda expression over a list of input. For instance MAP([ 'foo', 'bar'], (x) -> TO_UPPER(x) )
returns [ 'FOO', 'BAR' ]
FILTER
- Filters a list by a predicate in the form of a lambda expression. For instance FILTER([ 'foo', 'bar'], (x ) -> x == 'foo' )
returns [ 'foo' ]
REDUCE
- Applies a function over a list of input. For instance REDUCE([ 1, 2, 3], (sum, x) -> sum + x, 0 )
returns 6
APPEND_IF_MISSING
BLOOM_ADD
BLOOM_EXISTS
BLOOM_INIT
BLOOM_MERGE
CHOP
CHOMP
COUNT_MATCHES
DAY_OF_MONTH
DAY_OF_WEEK
DAY_OF_YEAR
DOMAIN_REMOVE_SUBDOMAINS
DOMAIN_REMOVE_TLD
DOMAIN_TO_TLD
ENDS_WITH
ENRICHMENT_EXISTS
ENRICHMENT_GET
FILL_LEFT
FILL_RIGHT
FILTER
FILTER( [ 'foo', 'bar' ] , (x) -> x == 'foo')
would yield [ 'foo']
FORMAT
GEO_GET
GET
GET_FIRST
GET_LAST
IN_SUBNET
IS_DATE
IS_DOMAIN
IS_EMAIL
IS_EMPTY
IS_INTEGER
IS_IP
IS_URL
JOIN
KAFKA_GET
KAFKA_PROPS
KAFKA_PUT
KAFKA_TAIL
LENGTH
LIST_ADD
MAAS_GET_ENDPOINT
MAAS_MODEL_APPLY
MAP
MAP( [ 'foo', 'bar' ] , (x) -> TO_UPPER(x) )
would yield [ 'FOO', 'BAR' ]
MAP_EXISTS
MAP_GET
MONTH
PREPEND_IF_MISSING
PROFILE_GET
PROFILE_FIXED
PROFILE_WINDOW
PROTOCOL_TO_NAME
REDUCE
REDUCE( [ 1, 2, 3 ] , (x, y) -> x + y, 0)
would sum the input list, yielding 6
.REGEXP_MATCH
STRING_ENTROPY
SPLIT
STARTS_WITH
SYSTEM_ENV_GET
SYSTEM_PROPERTY_GET
TO_DOUBLE
TO_EPOCH_TIMESTAMP
TO_FOAT
TO_INTEGER
TO_LONG
TO_LOWER
TO_STRING
TO_UPPER
TRIM
URL_TO_HOST
URL_TO_PATH
URL_TO_PORT
URL_TO_PROTOCOL
WEEK_OF_MONTH
WEEK_OF_YEAR
YEAR
The following is an example query (i.e. a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
ip
field is in the 192.168.0.0/24
subnetip
field is 10.0.0.1
or 10.0.0.2
is_local
existsThe following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp
and dc
field, we want to set the transform the timestamp to an epoch timestamp given a timezone which we will lookup in a separate map, called dc2tz
.
This will convert the timestamp field to an epoch timestamp based on the
yyyy-MM-dd HH:mm:ss
dc2tz
associated with the value associated with field dc
, defaulting to UTC
A microbenchmarking utility is included to assist in executing microbenchmarks for Stellar functions. The utility can be executed via maven using the exec
plugin, like so, from the metron-common
directory:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" -Dexec.args="..."
where exec.args
can be one of the following:
-e,--expressions <FILE> Stellar expressions -h,--help Generate Help screen -n,--num_times <NUM> Number of times to run per expression (after warmup). Default: 1000 -o,--output <FILE> File to write output. -p,--percentiles <NUM> Percentiles to calculate per run. Default: 50.0,75.0,95.0,99.0 -v,--variables <FILE> File containing a JSON Map of variables to use -w,--warmup <NUM> Number of times for warmup per expression. Default: 100
For instance, to run with a set of Stellar expression in file /tmp/expressions.txt
:
# simple functions TO_UPPER('casey') TO_LOWER(name) # math functions 1 + 2*(3 + int_num) / 10.0 1.5 + 2*(3 + double_num) / 10.0 # conditionals if ('foo' in ['foo']) OR one == very_nearly_one then 'one' else 'two' 1 + 2*(3 + int_num) / 10.0 #Network funcs DOMAIN_TO_TLD(domain) DOMAIN_REMOVE_SUBDOMAINS(domain)
And variables in file /tmp/variables.json
:
{ "name" : "casey", "int_num" : 1, "double_num" : 17.5, "one" : 1, "very_nearly_one" : 1.000001, "domain" : "www.google.com" }
Written to file /tmp/output.txt
would be the following command:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.common.stellar.benchmark.StellarMicrobenchmark" \ -Dexec.args="-e /tmp/expressions.txt -v /tmp/variables.json -o ./output.json"
The Stellar Shell is a REPL (Read Eval Print Loop) for the Stellar language that helps troubleshooting, learning Stellar or even interacting with a live Metron cluster.
The Stellar DSL (domain specific language) is used to act upon streaming data within Apache Storm. It is difficult to troubleshoot Stellar when it can only be executed within a Storm topology. This REPL is intended to help mitigate that problem by allowing a user to replicate data encountered in production, isolate initialization errors, or understand function resolution problems.
The shell supports customization via ~/.inputrc
as it is backed by a proper readline implementation.
Shell-like operations are supported such as
Note: Stellar classpath configuration from the global config is honored here if the REPL knows about zookeeper.
To run the Stellar Shell from within a deployed Metron cluster, run the following command on the host where Metron is installed.
$ $METRON_HOME/bin/stellar Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, ... [Stellar]>>> ?PROTOCOL_TO_NAME PROTOCOL_TO_NAME desc: Convert the IANA protocol number to the protocol name args: IANA Number ret: The protocol name associated with the IANA number. [Stellar]>>> ip.protocol := 6 6 [Stellar]>>> PROTOCOL_TO_NAME(ip.protocol) TCP
$ $METRON_HOME/bin/stellar -h usage: stellar -h,--help Print help -irc,--inputrc <arg> File containing the inputrc if not the default ~/.inputrc -v,--variables <arg> File containing a JSON Map of variables -z,--zookeeper <arg> Zookeeper URL -na,--no_ansi Make the input prompt not use ANSI colors.
-v, --variables
Optional
Optionally load a JSON map which contains variable assignments. This is intended to give you the ability to save off a message from Metron and work on it via the REPL.
-z, --zookeeper
Optional
Attempts to connect to Zookeeper and read the Metron global configuration. Stellar functions may require the global configuration to work properly. If found, the global configuration values are printed to the console. If specified, then the classpath may be augmented by the paths specified in the stellar config in the global config.
$ $METRON_HOME/bin/stellar -z node1:2181 Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>>
Stellar has no concept of variable assignment. For testing and debugging purposes, it is important to be able to create variables that simulate data contained within incoming messages. The REPL has created a means for a user to perform variable assignment outside of the core Stellar language. This is done via the :=
operator, such as foo := 1 + 1
would assign the result of the stellar expression 1 + 1
to the variable foo
.
[Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> 2 + 2 4.0
The REPL has a set of magic commands that provide the REPL user with information about the Stellar execution environment. The following magic commands are supported.
%functions
This command lists all functions resolvable in the Stellar environment. Stellar searches the classpath for Stellar functions. This can make it difficult in some cases to understand which functions are resolvable.
[Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, GET, GET_FIRST, GET_LAST, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP_EXISTS, MAP_GET, MONTH, PROTOCOL_TO_NAME, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR [Stellar]>>>
%vars
Lists all variables in the Stellar environment.
Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %vars [Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> %vars foo = 4.0
?<function>
Returns formatted documentation of the Stellar function. Provides the description of the function along with the expected arguments.
[Stellar]>>> ?BLOOM_ADD BLOOM_ADD desc: Adds an element to the bloom filter passed in args: bloom - The bloom filter, value* - The values to add ret: Bloom Filter [Stellar]>>> ?IS_EMAIL IS_EMAIL desc: Tests if a string is a valid email address args: address - The String to test ret: True if the string is a valid email address and false otherwise. [Stellar]>>>
To run the Stellar Shell directly from the Metron source code, run a command like the following. Ensure that Metron has already been built and installed with mvn clean install -DskipTests
.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-platform/metron-enrichment ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... [Stellar]>>> %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
Changing the project passed to the -pl
argument will define which dependencies are included and ultimately which Stellar functions are available within the shell environment.
This can be useful for troubleshooting function resolution problems. The previous example defines which functions are available during Enrichment. For example, to determine which functions are available within the Profiler run the following.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.common.stellar.shell.StellarShell" \ -pl metron-analytics/metron-profiler ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
The format of the global enrichment is a JSON String to Object map. This is intended for configuration which is non sensor specific configuration.
This configuration is stored in zookeeper, but looks something like
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "parser.error.topic": "indexing" "fieldValidations" : [ { "input" : [ "ip_src_addr", "ip_dst_addr" ], "validation" : "IP", "config" : { "type" : "IPV4" } } ] }
Stellar can be configured in a variety of ways from the global config. In particular, there are three main configuration parameters around configuring Stellar:
stellar.function.paths
stellar.function.resolver.includes
stellar.function.resolver.excludes
stellar.function.paths
If specified, Stellar will use a custom classloader which will wrap the context classloader and allow for the resolution of classes stored in jars not shipped with Metron and stored in a variety of mediums:
This path is a comma separated list of
{ ... "stellar.function.paths" : "hdfs://node1:8020/apps/metron/stellar/metron-management-0.4.0.jar, hdfs://node1:8020/apps/metron/3rdparty/.*.jar" }
Please be aware that this classloader does not reload functions dynamically and the classpath specified here in the global config is read on topology start. A change in classpath, to be picked up, would necessitate a topology restart at the moment
stellar.function.resolver.{includes,excludes}
If specified, this defines one or more regular expressions applied to the classes implementing the Stellar function that specify what should be included when searching for Stellar functions.
stellar.function.resolver.includes
defines the list of classes to include.stellar.function.resolver.excludes
defines the list of classes to exclude.{ ... "stellar.function.resolver.includes" : "org.apache.metron.*,com.myorg.stellar.*" }
Inside of the global configuration, there is a validation framework in place that enables the validation that messages coming from all parsers are valid. This is done in the form of validation plugins where assertions about fields or whole messages can be made.
The format for this is a fieldValidations
field inside of global config. This is associated with an array of field validation objects structured like so:
input
: An array of input fields or a single field. If this is omitted, then the whole messages is passed to the validator.config
: A String to Object map for validation configuration. This is optional if the validation function requires no configuration.validation
: The validation function to be used. This is one ofSTELLAR
: Execute a Stellar Language statement. Expects the query string in the condition
field of the config.IP
: Validates that the input fields are an IP address. By default, if no configuration is set, it assumes IPV4
, but you can specify the type by passing in the config by passing in type
with either IPV6
or IPV4
or by passing in a list [IPV4
,IPV6
] in which case the input(s) will be validated against both.DOMAIN
: Validates that the fields are all domains.EMAIL
: Validates that the fields are all email addressesURL
: Validates that the fields are all URLsDATE
: Validates that the fields are a date. Expects format
in the config.INTEGER
: Validates that the fields are an integer. String representation of an integer is allowed.REGEX_MATCH
: Validates that the fields match a regex. Expects pattern
in the config.NOT_EMPTY
: Validates that the fields exist and are not empty (after trimming.)Configurations should be stored on disk in the following structure starting at $BASE_DIR
:
sensors
: The subdirectory containing sensor enrichment configuration JSON (e.g. snort.json
, bro.json
)By default, this directory as deployed by the ansible infrastructure is at $METRON_HOME/config/zookeeper
While the configs are stored on disk, they must be loaded into Zookeeper to be used. To this end, there is a utility program to assist in this called $METRON_HOME/bin/zk_load_config.sh
This has the following options:
-f,--force Force operation -h,--help Generate Help screen -i,--input_dir <DIR> The input directory containing the configuration files named like "$source.json" -m,--mode <MODE> The mode of operation: DUMP, PULL, PUSH -o,--output_dir <DIR> The output directory which will store the JSON configuration from Zookeeper -z,--zk_quorum <host:port,[host:port]*> Zookeeper Quorum URL (zk1:port,zk2:port,...)
Usage examples:
$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m DUMP
$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m PUSH -i $METRON_HOME/config/zookeeper
$METRON_HOME/bin/zk_load_configs.sh -z node1:2181 -m PULL -o $METRON_HOME/config/zookeeper -f
Errors generated in Metron topologies are transformed into JSON format and follow this structure:
{ "exception": "java.lang.IllegalStateException: Unable to parse Message: ...", "failed_sensor_type": "bro", "stack": "java.lang.IllegalStateException: Unable to parse Message: ...", "hostname": "node1", "source:type": "error", "raw_message": "{\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...", "error_hash": "f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061", "error_type": "parser_error", "message": "Unable to parse Message: {\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...", "timestamp": 1488809630698 }
Each topology can be configured to send error messages to a specific Kafka topic. The parser topologies retrieve this setting from the the parser.error.topic
setting in the global config:
{ "es.clustername": "metron", "es.ip": "node1", "es.port": "9300", "es.date.format": "yyyy.MM.dd.HH", "parser.error.topic": "indexing" }
Error topics for enrichment and threat intel errors are passed into the enrichment topology as flux properties named enrichment.error.topic
and threat.intel.error.topic
. These properties can be found in $METRON_HOME/config/enrichment.properties
.
The error topic for indexing errors is passed into the indexing topology as a flux property named index.error.topic
. This property can be found in either $METRON_HOME/config/elasticsearch.properties
or $METRON_HOME/config/solr.properties
depending on the search engine selected.
By default all error messages are sent to the indexing
topic so that they are indexed and archived, just like other messages. The indexing config for error messages can be found at $METRON_HOME/config/zookeeper/indexing/error.json
.