| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| # Metron Profiler Client |
| |
| This project provides a client API for accessing the profiles generated by the [Metron Profiler](../metron-profiler). This includes both a Java API and Stellar API for accessing the profile data. The primary use case is to extract profile data for use during model scoring. |
| |
| ## Stellar Client API |
| |
| ### `PROFILE_GET` |
| |
| The `PROFILE_GET` command allows you to select all of the profile measurements written. This command takes the following arguments: |
| ``` |
| REQUIRED: |
| profile - The name of the profile |
| entity - The name of the entity |
| periods - The list of profile periods to grab. These are ProfilePeriod objects. |
| OPTIONAL: |
| groups_list - Optional, must correspond to the 'groupBy' list used in profile creation - List (in square brackets) of |
| groupBy values used to filter the profile. Default is the empty list, meaning groupBy was not used when |
| creating the profile. |
| config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter |
| of the same name. Default is the empty Map, meaning no overrides. |
| ``` |
| There is an older calling format where `groups_list` is specified as a sequence of group names, "varargs" style, instead of a List object. This format is still supported for backward compatibility, but it is deprecated, and it is disallowed if the optional `config_overrides` argument is used. |
| |
| The `periods` field is (likely) the output of another Stellar function which defines the times to include. |
| |
| #### Groups_list argument |
| The `groups_list` argument in the client must exactly correspond to the [`groupBy`](../metron-profiler#groupby) configuration in the profile definition. If `groupBy` was not used in the profile, `groups_list` must be empty in the client. If `groupBy` was used in the profile, then the client `groups_list` is <b>not</b> optional; it must be the same length as the `groupBy` list, and specify exactly one selected group value for each `groupBy` criterion, in the same order. For example: |
| ``` |
| If in Profile, the groupBy criteria are: [ “DAY_OF_WEEK()”, “URL_TO_PORT()” ] |
| Then in PROFILE_GET, an allowed groups value would be: [ “3”, “8080” ] |
| which will select only records from Tuesdays with port number 8080. |
| ``` |
| |
| #### Configuration and the config_overrides argument |
| |
| By default, the Profiler creates profiles with a period duration of 15 minutes. This means that data is accumulated, summarized and flushed every 15 minutes. |
| The Client API must also have knowledge of this duration to correctly retrieve the profile data. If the Client is expecting 15 minute periods, it will not be |
| able to read data generated by a Profiler that was configured for 1 hour periods, and will return zero results. |
| |
| Similarly, all six Client configuration parameters listed in the table below must match the Profiler configuration parameter settings from the time the profile |
| was created. The period duration and other configuration parameters from the Profiler topology are stored in local filesystem at `$METRON_HOME/config/profiler.properties`. |
| The Stellar Client API can be configured correspondingly by setting the following properties in Metron's global configuration, on local filesystem at |
| `$METRON_HOME/config/zookeeper/global.json`, then uploaded to Zookeeper (at `/metron/topology/global`) by using `zk_load_configs.sh`: |
| |
| ``` |
| $ cd $METRON_HOME |
| $ bin/zk_load_configs.sh -m PUSH -i config/zookeeper/ -z node1:2181 |
| ``` |
| |
| Any of these six Client configuration parameters may be overridden at run time using the `config_overrides` Map argument in PROFILE_GET. The primary use case is |
| when historical profiles have been created with a different Profiler configuration than is currently configured, and the analyst needing to access them does not |
| want to change the global Client configuration so as not to disrupt the work of other analysts working with current profiles. |
| |
| | Key | Description | Required | Default | |
| | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | |
| | profiler.client.period.duration | The duration of each profile period. This value should be defined along with `profiler.client.period.duration.units`. | Optional | 15 | |
| | profiler.client.period.duration.units | The units used to specify the profile period duration. This value should be defined along with `profiler.client.period.duration`. | Optional | MINUTES | |
| | profiler.client.hbase.table | The name of the HBase table used to store profile data. | Optional | profiler | |
| | profiler.client.hbase.column.family | The name of the HBase column family used to store profile data. | Optional | P | |
| | profiler.client.salt.divisor | The salt divisor used to store profile data. | Optional | 1000 | |
| | profiler.default.value | The default value to be returned if a profile is not written for a given period for a profile and entity. | Optional | null | |
| | hbase.provider.impl | The name of the HBaseTableProvider implementation class. | Optional | | |
| |
| ### Profile Selectors |
| |
| You will notice that the third argument for `PROFILE_GET` is a list of `ProfilePeriod` objects. This list is expected to |
| be produced by another Stellar function. There are a couple options available. |
| |
| #### `PROFILE_FIXED` |
| |
| The profiler periods associated with a fixed lookback starting from now. These are ProfilePeriod objects. |
| ``` |
| REQUIRED: |
| durationAgo - How long ago should values be retrieved from? |
| units - The units of 'durationAgo'. |
| OPTIONAL: |
| config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter |
| of the same name. Default is the empty Map, meaning no overrides. |
| |
| e.g. To retrieve all the profiles for the last 5 hours. PROFILE_GET('profile', 'entity', PROFILE_FIXED(5, 'HOURS')) |
| ``` |
| |
| Note that the `config_overrides` parameter operates exactly as the `config_overrides` argument in `PROFILE_GET`. |
| The only available parameters for override are: |
| * `profiler.client.period.duration` |
| * `profiler.client.period.duration.units` |
| |
| #### `PROFILE_WINDOW` |
| |
| `PROFILE_WINDOW` is intended to provide a finer-level of control over selecting windows for profiles: |
| * Specify windows relative to the data timestamp (see the optional `now` parameter below) |
| * Specify non-contiguous windows to better handle seasonal data (e.g. the last hour for every day for the last month) |
| * Specify profile output excluding holidays |
| * Specify only profile output on a specific day of the week |
| |
| It does this by a domain specific language mimicking natural language that defines the windows excluded. |
| |
| ``` |
| REQUIRED: |
| windowSelector - The statement specifying the window to select. |
| now - Optional - The timestamp to use for now. |
| OPTIONAL: |
| config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter |
| of the same name. Default is the empty Map, meaning no overrides. |
| |
| e.g. To retrieve all the measurements written for 'profile' and 'entity' for the last hour |
| on the same weekday excluding weekends and US holidays across the last 14 days: |
| PROFILE_GET('profile', 'entity', PROFILE_WINDOW('1 hour window every 24 hours starting from 14 days ago including the current day of the week excluding weekends, holidays:us')) |
| ``` |
| |
| Note that the `config_overrides` parameter operates exactly as the `config_overrides` argument in `PROFILE_GET`. |
| The only available parameters for override are: |
| * `profiler.client.period.duration` |
| * `profiler.client.period.duration.units` |
| |
| ##### The Profile Selector Language |
| |
| The domain specific language can be broken into a series of clauses, some optional |
| * <a href="#Temporal_Window_Width"><span style="color:blue">Total Temporal Duration</span></a> - The total range of time in which windows may be specified |
| * <a href="#InclusionExclusion_specifiers"><span style="color:red">Temporal Window Width</span></a> - How large each temporal window |
| * <a href="#Skip_distance"><span style="color:green">Skip distance</span></a> (optional)- How far to skip between when one window starts and when the next begins |
| * <a href="#InclusionExclusion_specifiers"><span style="color:purple">Inclusion/Exclusion specifiers</span></a> (optional) - The set of specifiers to further filter the window |
| |
| One *must* specify either a total temporal duration or a temporal window width. |
| The remaining clauses are optional. |
| During the course of the following discussion, we will color code the clauses in the examples and link them |
| to the relevant section for more detail. |
| |
| From a high level, the language fits the following three forms, which are composed of the clauses above: |
| |
| * <a href="#Temporal_Window_Width"><span style="color:red">time_interval WINDOW?</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">(INCLUDING specifier_list)? (EXCLUDING specifier_list)?</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">time_interval WINDOW?</span></a> <a href="#Skip_distance"><span style="color:green">EVERY time_interval</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">FROM time_interval (TO time_interval)?</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">(INCLUDING specifier_list)? (EXCLUDING specifier_list)?</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">FROM time_interval (TO time_interval)?</span></a> |
| |
| |
| ###### <span style="color:blue">Total Temporal Duration</span> |
| |
| Total temporal duration is specified by a phrase: `FROM time_interval AGO TO time_interval AGO` |
| This indicates the beginning and ending of a time interval. This is an inclusive duration. |
| * `FROM` - Can be the words "from" or "starting from" |
| * `time_interval` - A time amount followed by a unit (e.g. 1 hour). Fractional amounts are not supported. The unit may be "minute", "day", "hour" with any pluralization. |
| * `TO` - Can be the words "until" or "to" |
| * `AGO` - Optionally the word "ago" |
| |
| The `TO time_interval AGO` portion is optional. If unspecified then it is expected that the time interval ends now. |
| |
| Due to the vagaries of the english language, the from and the to portions, if both specified, are interchangeable |
| with regard to which one specifies the start and which specifies the end. |
| |
| In other words "<a href="#Total_Temporal_Duration"><span style="color:blue">starting from 1 hour ago to 30 minutes ago</span></a>" and |
| "<a href="#Total_Temporal_Duration"><span style="color:blue">starting from 30 minutes ago to 1 hour ago</span></a>" specify the same |
| temporal duration. |
| |
| **Examples** |
| |
| * A duration starting 1 hour ago and ending now |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">from 1 hour ago</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">from 1 hour</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 1 hour ago</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 1 hour</span></a> |
| * A duration starting 1 hour ago and ending 30 minutes ago: |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">from 1 hour ago until 30 minutes ago</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">from 30 minutes ago until 1 hour ago</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 1 hour ago to 30 minutes ago</span></a> |
| * <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 1 hour to 30 minutes</span></a> |
| |
| ###### <span style="color:red">Temporal Window Width</span> |
| |
| Temporal window width is the specification of a window. |
| A window is may either repeat within total temporal duration or may fill the total temporal duration. This is an inclusive window. |
| A window is specified by the phrase: `time_interval WINDOW` |
| * `time_interval` - A time amount followed by a unit (e.g. 1 hour). Fractional amounts are not supported. The unit may be "minute", "day", "hour" with any pluralization. |
| * `WINDOW` - Optionally the word "window" |
| |
| **Examples** |
| |
| * A fixed window starting 2 hours ago and going until now |
| * <a href="#Temporal_Window_Width"><span style="color:red">2 hour</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">2 hours</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">2 hours window</span></a> |
| * A repeating 30 minute window starting 2 hours ago and repeating every hour until now. |
| This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 2 hours ago</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute windows</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 2 hours ago</span></a> |
| * A repeating 30 minute window starting 2 hours ago and repeating every hour until 30 minutes ago. |
| This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 2 hours ago until 30 minutes ago</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minutes window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 2 hours ago to 30 minutes ago</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minutes window</span></a> <a href="#Skip_distance"><span style="color:green">for every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 30 minutes ago to 2 hours ago</span></a> |
| |
| ###### <span style="color:green">Skip distance</span> |
| |
| Skip distance is the amount of time between temporal window beginnings that the next window starts. |
| It is, in effect, the window period. |
| |
| It is specified by the phrase `EVERY time_interval` |
| * `time_interval` - A time amount followed by a unit (e.g. 1 hour). Fractional amounts are not supported. The unit may be "minute", "day", "hour" with any pluralization. |
| * `EVERY` - The word/phrase "every" or "for every" |
| |
| **Examples** |
| |
| * A repeating 30 minute window starting 2 hours ago and repeating every hour until now. |
| This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 2 hours ago </span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minutes window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 2 hours ago </span></a> |
| * A repeating 30 minute window starting 2 hours ago and repeating every hour until 30 minutes ago. |
| This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">starting from 2 hours ago until 30 minutes ago</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minutes window</span></a> <a href="#Skip_distance"><span style="color:green">every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 2 hours ago to 30 minutes ago</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minutes window</span></a> <a href="#Skip_distance"><span style="color:green">for every 1 hour</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 30 minutes ago to 2 hours ago</span></a> |
| |
| ###### <span style="color:purple">Inclusion/Exclusion specifiers</span> |
| Inclusion and Exclusion specifiers operate as filters on the set of windows. |
| They operate on the window beginning timestamp. |
| |
| For inclusion specifiers, windows who are passed by _any_ of the set of inclusion specifiers are included. |
| inclusion specifiers. Similarly, windows who are passed by _any_ of the set of exclusion specifiers are excluded. |
| Exclusion specifiers trump inclusion specifiers. |
| |
| Specifiers follow one of the following formats depending on if it is an inclusion or exclusion specifier: |
| * `INCLUSION specifier, specifier, ...` |
| * `INCLUSION` can be "include", "includes" or "including" |
| * `EXCLUSION specifier, specifier, ...` |
| * `EXCLUSION` can be "exclude", "excludes" or "excluding" |
| |
| |
| The specifiers are a set of fixed specifiers available as part of the language: |
| * Fixed day of week-based specifiers - includes or excludes if the window is on the specified day of the week |
| * "monday" or "mondays" |
| * "tuesday" or "tuesdays" |
| * "wednesday" or "wednesdays" |
| * "thursday" or "thursdays" |
| * "friday" or "fridays" |
| * "saturday" or "saturdays" |
| * "sunday" or "sundays" |
| * "weekday" or "weekdays" |
| * "weekend" or ""weekends" |
| * Relative day of week-based specifiers - includes or excludes based on the day of week relative to now |
| * "current day of the week" |
| * "current day of week" |
| * "this day of the week" |
| * "this day of week" |
| * Specified date - includes or excludes based on the specified date |
| * "date" - Takes up to 2 arguments |
| * The day in `yyyy/MM/dd` format if no second argument is provided |
| * Optionally the format to specify the first argument in |
| * Example: `date:2017/12/25` would include or exclude December 25, 2017 |
| * Example: `date:20171225:yyyyMMdd` would include or exclude December 25, 2017 |
| * Holidays - includes or excludes based on if the window starts during a holiday |
| * "holiday" or "holidays" |
| * Arguments form the jollyday hierarchy of holidays. e.g. "us:nyc" would be holidays for New York City, USA |
| * If none is specified, it will choose based on locale. |
| * Countries supported are those supported in [jollyday](https://github.com/svendiedrichsen/jollyday/tree/master/src/main/resources/holidays) |
| * Example: `holiday:us:nyc` would be the holidays of New York City, USA |
| * Example: `holiday:hu` would be the holidays of Hungary |
| |
| **WARNING: Daylight Savings Time effects** |
| |
| While Universal Time (UTC) is nice and constant, many servers are set to local timezones that enable Daylight Savings Time (DST). |
| This means that twice a year, on DST transition weekends, "Sunday" is either 23 or 25 hours long. However, durations specified |
| as "7 days ago" are always interpreted as "7*24 hours ago". This can lead to some surprising effects when using days of the week |
| as inclusion or exclusion specifiers. |
| |
| For example, the profile window specified by the phrase "30 minute window every 24 hours from 7 days ago" |
| will always have 7 thirty-minute intervals, and these will normally occur on 5 weekdays and 2 weekend days. |
| However, if you invoke this window at 12:15am any day during the week following the start of DST, you will get |
| these intervals (supposing you start early on a Wednesday morning): |
| ``` |
| Tuesday 12:15am-12:45am (yesterday) |
| Monday 12:15am-12:45am |
| Saturday 11:15pm-11:45pm (skipped Sunday!) |
| Friday 11:15pm-11:45pm |
| Thursday 11:15pm-11:45pm |
| Wednesday 11:15pm-11:45pm |
| Tuesday 11:15pm-11:45pm |
| ``` |
| |
| Sunday got skipped over because it was only 23 hours long; that is, there were 24 hours between Saturday 11:15pm and Monday 12:15am. |
| So if you specified "excluding weekends", you would get 6 days' intervals instead of the expected 5. There are multiple variations |
| on this theme. |
| |
| Remember that the underlying time is kept in UTC, so the data is always correct. It is only when attempting to interpret UTC as |
| local time, date, and day, that these confusions may occur. They may be eliminated by setting your server timezone to UTC, or otherwise |
| disabling DST. |
| |
| **Examples** |
| |
| Assume these are executed at noon. |
| * A 1 hour window for the past 8 'current day of the week' |
| * <a href="#Temporal_Window_Width"><span style="color:red">1 hour window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 56 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">including this day of the week</span></a> |
| * A 1 hour window for the past 8 tuesdays |
| * <a href="#Temporal_Window_Width"><span style="color:red">1 hour window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 56 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">including tuesdays</span></a> |
| * A 30 minute window every tuesday at noon starting 14 days ago until now |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 14 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">including tuesdays</span></a> |
| * A 30 minute window every day except holidays and weekends at noon starting 14 days ago until now |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 14 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">excluding holidays:us, weekends</span></a> |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 14 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">including weekdays excluding holidays:us, weekends</span></a> |
| * A 30 minute window at noon every day from 7 days ago including saturdays and excluding weekends. |
| Because exclusions trump inclusions, the following will never yield any windows |
| * <a href="#Temporal_Window_Width"><span style="color:red">30 minute window</span></a> <a href="#Skip_distance"><span style="color:green">every 24 hours</span></a> <a href="#Total_Temporal_Duration"><span style="color:blue">from 7 days ago</span></a> <a href="#InclusionExclusion_specifiers"><span style="color:purple">including saturdays excluding weekends</span></a> |
| |
| ### Errors |
| The most common result of incorrect `PROFILE_GET` arguments or Client configuration parameters is an empty result set, rather than an error. |
| The Client cannot effectively validate the arguments, because the Profiler configuration parameters may be changed and the profile itself does not store them. |
| The person doing the querying must carry forward the knowledge of the Profiler configuration parameters from the time of profile creation, and use corresponding `PROFILE_GET` arguments and Client configuration parameters when querying the data. |
| |
| ### Examples |
| |
| The following are usage examples that show how the Stellar API can be used to read profiles generated by the [Metron Profiler](../metron-profiler). This API would be used in conjunction with other Stellar functions like [`MAAS_MODEL_APPLY`](../../metron-stellar/stellar-common#maas_model_apply) to perform model scoring on streaming data. |
| |
| These examples assume a profile has been defined called 'snort-alerts' that tracks the number of Snort alerts associated with an IP address over time. The profile definition might look similar to the following. |
| |
| ``` |
| { |
| "profiles": [ |
| { |
| "profile": "snort-alerts", |
| "foreach": "ip_src_addr", |
| "onlyif": "source.type == 'snort'", |
| "update": { "s": "STATS_ADD(s, 1)" }, |
| "result": "STATS_MEAN(s)" |
| } |
| ] |
| } |
| ``` |
| |
| During model scoring the entity being scored, in this case a particular IP address, will be known. The following examples shows how this profile data might be retrieved. |
| Retrieve all values of 'snort-alerts' from '10.0.0.1' over the past 4 hours. |
| ``` |
| PROFILE_GET('snort-alerts', '10.0.0.1', PROFILE_FIXED(4, 'HOURS')) |
| ``` |
| |
| Retrieve all values of 'snort-alerts' from '10.0.0.1' over the past 2 days. |
| ``` |
| PROFILE_GET('snort-alerts', '10.0.0.1', PROFILE_FIXED(2, 'DAYS')) |
| ``` |
| |
| If the profile had been defined to group the data by weekday versus weekend, then the following example would apply: |
| |
| Retrieve all values of 'snort-alerts' from '10.0.0.1' that occurred on 'weekdays' over the past 30 days. |
| ``` |
| PROFILE_GET('snort-alerts', '10.0.0.1', PROFILE_FIXED(30, 'DAYS'), ['weekdays'] ) |
| ``` |
| |
| The client may need to use a configuration different from the current Client configuration settings. For example, perhaps you are on a cluster shared with other analysts, and need to access a profile that was constructed 2 months ago using different period duration, while they are accessing more recent profiles constructed with the currently configured period duration. For this situation, you may use the `config_overrides` argument: |
| |
| Retrieve all values of 'snort-alerts' from '10.0.0.1' over the past 2 days, with no `groupBy`, and overriding the usual global client configuration parameters for window duration. |
| ``` |
| PROFILE_GET('profile1', 'entity1', PROFILE_FIXED(2, 'DAYS', {'profiler.client.period.duration' : '2', 'profiler.client.period.duration.units' : 'MINUTES'}), []) |
| ``` |
| |
| Retrieve all values of 'snort-alerts' from '10.0.0.1' that occurred on 'weekdays' over the past 30 days, overriding the usual global client configuration parameters for window duration. |
| ``` |
| PROFILE_GET('profile1', 'entity1', PROFILE_FIXED(30, 'DAYS', {'profiler.client.period.duration' : '2', 'profiler.client.period.duration.units' : 'MINUTES'}), ['weekdays'] ) |
| ``` |
| |
| |
| ## Getting Started |
| |
| These instructions step through the process of using the Stellar Client API on a live cluster. These instructions assume that the 'Getting Started' instructions included with the [Metron Profiler](../metron-profiler) have been followed. This will create a Profile called 'test' whose data will be retrieved with the Stellar Client API. |
| |
| To validate that everything is working, login to the server hosting Metron. We will use the Stellar Shell to replicate the execution environment of Stellar running in a Storm topology, like Metron's Parser or Enrichment topology. Replace 'node1:2181' with the URL to a Zookeeper Broker. |
| |
| ``` |
| [root@node1 0.4.2]# bin/stellar -z node1:2181 |
| Stellar, Go! |
| Please note that functions are loading lazily in the background and will be unavailable until loaded fully. |
| {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} |
| |
| [Stellar]>>> ?PROFILE_GET |
| Functions loaded, you may refer to functions now... |
| PROFILE_GET |
| Description: Retrieves a series of values from a stored profile. |
| |
| Arguments: |
| profile - The name of the profile. |
| entity - The name of the entity. |
| durationAgo - How long ago should values be retrieved from? |
| units - The units of 'durationAgo'. |
| groups_list - Optional, must correspond to the 'groupBy' list used in profile creation - List (in square brackets) of |
| groupBy values used to filter the profile. Default is the empty list, meaning groupBy was not used when |
| creating the profile. |
| config_overrides - Optional - Map (in curly braces) of name:value pairs, each overriding the global config parameter |
| of the same name. Default is the empty Map, meaning no overrides. |
| |
| Returns: The selected profile measurements. |
| |
| [Stellar]>>> PROFILE_GET('test','192.168.138.158', 1, 'HOURS') |
| [12078.0, 8921.0, 12131.0] |
| ``` |
| |
| The client API call above has retrieved the past hour of the 'test' profile for the entity '192.168.138.158'. |
| |
| ## Developing Profiles |
| |
| Troubleshooting issues when programming against a live stream of data can be difficult. The Stellar REPL is a powerful tool to help work out the kinds of enrichments and transformations that are needed. The Stellar REPL can also be used to help when developing profiles for the Profiler. |
| |
| Follow these steps in the Stellar REPL to see how it can be used to help create profiles. |
| |
| 1. Take a first pass at defining your profile. As an example, in the editor copy/paste the basic "Hello, World" profile below. |
| ``` |
| [Stellar]>>> conf := SHELL_EDIT() |
| [Stellar]>>> conf |
| { |
| "profiles": [ |
| { |
| "profile": "hello-world", |
| "onlyif": "exists(ip_src_addr)", |
| "foreach": "ip_src_addr", |
| "init": { "count": "0" }, |
| "update": { "count": "count + 1" }, |
| "result": "count" |
| } |
| ] |
| } |
| ``` |
| |
| 1. Initialize the Profiler. |
| ``` |
| [Stellar]>>> profiler := PROFILER_INIT(conf) |
| [Stellar]>>> profiler |
| Profiler{1 profile(s), 0 messages(s), 0 route(s)} |
| ``` |
| The profiler itself will show the number of profiles defined, the number of messages applied, and the number of routes taken. |
| |
| A route is defined when a message is applied to a specific profile. If a message is applied and not needed by any profile, then there are no routes. If a message is needed by one profile, then one route has been defined. If a message is needed by two profiles, then two routes have been defined. |
| |
| 1. Create a message to simulate the type of telemetry that you expect to be profiled. As an example, in the editor copy/paste the JSON below. |
| ``` |
| [Stellar]>>> message := SHELL_EDIT() |
| [Stellar]>>> message |
| { |
| "ip_src_addr": "10.0.0.1", |
| "protocol": "HTTPS", |
| "length": "10", |
| "bytes_in": "234" |
| } |
| ``` |
| |
| 1. Apply some telemetry messages to your profiles. The following applies the same message 3 times. |
| ``` |
| [Stellar]>>> PROFILER_APPLY(message, profiler) |
| Profiler{1 profile(s), 1 messages(s), 1 route(s)} |
| ``` |
| ``` |
| [Stellar]>>> PROFILER_APPLY(message, profiler) |
| Profiler{1 profile(s), 2 messages(s), 2 route(s)} |
| ``` |
| ``` |
| [Stellar]>>> PROFILER_APPLY(message, profiler) |
| Profiler{1 profile(s), 3 messages(s), 3 route(s)} |
| ``` |
| |
| It is also possible to apply multiple messages at once. This is useful when testing against a larger set of data. To do this, create a string that contains a JSON array of messages and pass that to the `PROFILER_APPLY` function. |
| |
| 1. Flush the Profiler to see what has been calculated. A flush is what occurs at the end of each 15 minute period in the Profiler. The result is a list of profile measurements. Each measurement is a map containing detailed information about the profile data that has been generated. |
| ``` |
| [Stellar]>>> values := PROFILER_FLUSH(profiler) |
| [Stellar]>>> values |
| [{period={duration=900000, period=1669628, start=1502665200000, end=1502666100000}, |
| profile=hello-world, groups=[], value=3, entity=10.0.0.1}] |
| ``` |
| |
| This profile simply counts the number of messages by IP source address. Notice that the value is '3' for the entity '10.0.0.1' as we applied 3 messages with an 'ip_src_addr' of '10.0.0.1'. There will always be one measurement for each [profile, entity] pair. |
| |
| 1. If you are unhappy with the data that has been generated, then 'wash, rinse and repeat' this process. Once you are happy with the profile that was created, follow the [Getting Started](../metron-profiler#getting-started) guide to use the profile against your live, streaming data in a Metron cluster. |