Metron Profiler Client

This project provides a client API for accessing the profiles generated by the Metron Profiler. This includes both a Java API and Stellar API for accessing the profile data. The primary use case is to extract profile data for use during model scoring.

Stellar Client API

The following are usage examples that show how the Stellar API can be used to read profiles generated by the Metron Profiler. This API would be used in conjunction with other Stellar functions like MAAS_MODEL_APPLY to perform model scoring on streaming data.

These examples assume a profile has been defined called ‘snort-alerts’ that tracks the number of Snort alerts associated with an IP address over time. The profile definition might look similar to the following.

{
  "profiles": [
    {
      "profile": "snort-alerts",
      "foreach": "ip_src_addr",
      "onlyif":  "source.type == 'snort'",
      "update":  { "s": "STATS_ADD(s, 1)" },
      "result":  "STATS_MEAN(s)"
    }
  ]
}

During model scoring the entity being scored, in this case a particular IP address, will be known. The following examples shows how this profile data might be retrieved.

The Stellar client consists of the PROFILE_GET command, which takes the following arguments:

REQUIRED:
    profile - The name of the profile
    entity - The name of the entity
    periods - The list of profile periods to grab.  These are ProfilePeriod objects.
OPTIONAL:
	groups_list - Optional, must correspond to the 'groupBy' list used in profile creation - List (in square brackets) of 
            groupBy values used to filter the profile. Default is the empty list, meaning groupBy was not used when 
            creating the profile.
    config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter
            of the same name. Default is the empty Map, meaning no overrides.

There is an older calling format where groups_list is specified as a sequence of group names, “varargs” style, instead of a List object. This format is still supported for backward compatibility, but it is deprecated, and it is disallowed if the optional config_overrides argument is used.

The periods field is (likely) the output of another Stellar function which defines the times to include.

PROFILE_FIXED: The profiler periods associated with a fixed lookback starting from now. These are ProfilePeriod objects.

REQUIRED:
    durationAgo - How long ago should values be retrieved from?
    units - The units of 'durationAgo'.
OPTIONAL:
    config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter
            of the same name. Default is the empty Map, meaning no overrides.

e.g. To retrieve all the profiles for the last 5 hours.  PROFILE_GET('profile', 'entity', PROFILE_FIXED(5, 'HOURS'))

Groups_list argument

The groups_list argument in the client must exactly correspond to the groupBy configuration in the profile definition. If groupBy was not used in the profile, groups_list must be empty in the client. If groupBy was used in the profile, then the client groups_list is not optional; it must be the same length as the groupBy list, and specify exactly one selected group value for each groupBy criterion, in the same order. For example:

If in Profile, the groupBy criteria are:  [ “DAY_OF_WEEK()”, “URL_TO_PORT()” ]
Then in PROFILE_GET, an allowed groups value would be:  [ “3”, “8080” ]
which will select only records from Tuesdays with port number 8080.

Configuration and the config_overrides argument

By default, the Profiler creates profiles with a period duration of 15 minutes. This means that data is accumulated, summarized and flushed every 15 minutes. The Client API must also have knowledge of this duration to correctly retrieve the profile data. If the Client is expecting 15 minute periods, it will not be able to read data generated by a Profiler that was configured for 1 hour periods, and will return zero results.

Similarly, all six Client configuration parameters listed in the table below must match the Profiler configuration parameter settings from the time the profile was created. The period duration and other configuration parameters from the Profiler topology are stored in local filesystem at $METRON_HOME/config/profiler.properties. The Stellar Client API can be configured correspondingly by setting the following properties in Metron's global configuration, on local filesystem at $METRON_HOME/config/zookeeper/global.json, then uploaded to Zookeeper (at /metron/topology/global) by using zk_load_configs.sh:

```
$ cd $METRON_HOME
$ bin/zk_load_configs.sh -m PUSH -i config/zookeeper/ -z node1:2181
```

Any of these six Client configuration parameters may be overridden at run time using the config_overrides Map argument in PROFILE_GET. The primary use case is when historical profiles have been created with a different Profiler configuration than is currently configured, and the analyst needing to access them does not want to change the global Client configuration so as not to disrupt the work of other analysts working with current profiles.

KeyDescriptionRequiredDefault
profiler.client.period.durationThe duration of each profile period. This value should be defined along with profiler.client.period.duration.units.Optional15
profiler.client.period.duration.unitsThe units used to specify the profile period duration. This value should be defined along with profiler.client.period.duration.OptionalMINUTES
profiler.client.hbase.tableThe name of the HBase table used to store profile data.Optionalprofiler
profiler.client.hbase.column.familyThe name of the HBase column family used to store profile data.OptionalP
profiler.client.salt.divisorThe salt divisor used to store profile data.Optional1000
hbase.provider.implThe name of the HBaseTableProvider implementation class.Optional

Errors

The most common result of incorrect PROFILE_GET arguments or Client configuration parameters is an empty result set, rather than an error. The Client cannot effectively validate the arguments, because the Profiler configuration parameters may be changed and the profile itself does not store them. The person doing the querying must carry forward the knowledge of the Profiler configuration parameters from the time of profile creation, and use corresponding PROFILE_GET arguments and Client configuration parameters when querying the data.

Examples

Retrieve all values of ‘snort-alerts’ from ‘10.0.0.1’ over the past 4 hours.

PROFILE_GET('snort-alerts', '10.0.0.1', 4, 'HOURS')

Retrieve all values of ‘snort-alerts’ from ‘10.0.0.1’ over the past 2 days.

PROFILE_GET('snort-alerts', '10.0.0.1', 2, 'DAYS')

If the profile had been defined to group the data by weekday versus weekend, then the following example would apply:

Retrieve all values of ‘snort-alerts’ from ‘10.0.0.1’ that occurred on ‘weekdays’ over the past month.

PROFILE_GET('snort-alerts', '10.0.0.1', 1, 'MONTHS', ['weekdays'] )

The client may need to use a configuration different from the current Client configuration settings. For example, perhaps you are on a cluster shared with other analysts, and need to access a profile that was constructed 2 months ago using different period duration, while they are accessing more recent profiles constructed with the currently configured period duration. For this situation, you may use the config_overrides argument:

Retrieve all values of ‘snort-alerts’ from ‘10.0.0.1’ over the past 2 days, with no groupBy, and overriding the usual global client configuration parameters for window duration.

PROFILE_GET('profile1', 'entity1', 2, 'DAYS', [], {'profiler.client.period.duration' : '2', 'profiler.client.period.duration.units' : 'MINUTES'})

Retrieve all values of ‘snort-alerts’ from ‘10.0.0.1’ that occurred on ‘weekdays’ over the past month, overriding the usual global client configuration parameters for window duration.

PROFILE_GET('profile1', 'entity1', 1, 'MONTHS', ['weekdays'], {'profiler.client.period.duration' : '2', 'profiler.client.period.duration.units' : 'MINUTES'})

Getting Started

These instructions step through the process of using the Stellar Client API on a live cluster. These instructions assume that the ‘Getting Started’ instructions included with the Metron Profiler have been followed. This will create a Profile called ‘test’ whose data will be retrieved with the Stellar Client API.

To validate that everything is working, login to the server hosting Metron. We will use the Stellar Shell to replicate the execution environment of Stellar running in a Storm topology, like Metron's Parser or Enrichment topology. Replace ‘node1:2181’ with the URL to a Zookeeper Broker.

[root@node1 0.3.1]# bin/stellar -z node1:2181
Stellar, Go!
Please note that functions are loading lazily in the background and will be unavailable until loaded fully.
{es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH}

[Stellar]>>> ?PROFILE_GET
Functions loaded, you may refer to functions now...
PROFILE_GET
Description: Retrieves a series of values from a stored profile.

Arguments:
	profile - The name of the profile.
	entity - The name of the entity.
	durationAgo - How long ago should values be retrieved from?
	units - The units of 'durationAgo'.
	groups_list - Optional, must correspond to the 'groupBy' list used in profile creation - List (in square brackets) of 
            groupBy values used to filter the profile. Default is the empty list, meaning groupBy was not used when 
            creating the profile.
	config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter
            of the same name. Default is the empty Map, meaning no overrides.

Returns: The selected profile measurements.

[Stellar]>>> PROFILE_GET('test','192.168.138.158', 1, 'HOURS')
[12078.0, 8921.0, 12131.0]

The client API call above has retrieved the past hour of the ‘test’ profile for the entity ‘192.168.138.158’.