blob: 081436a90e355d032c13ff1c3b4c942244a4bf85 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Solr in Metron
## Table of Contents
* [Introduction](#introduction)
* [Configuration](#configuration)
* [Installing](#installing)
* [Schemas](#schemas)
* [Collections](#collections)
## Introduction
Metron ships with Solr 6.6.2 support. Solr Cloud can be used as the real-time portion of the datastore resulting from [metron-indexing](../metron-indexing/README.md).
## Configuration
### The Indexing Topology
Solr is a viable option for indexing data in Metron and, similar to the Elasticsearch Writer, can be configured
via the global config. The following settings are possible as part of the global config:
* `solr.zookeeper`
* The zookeeper quorum associated with the SolrCloud instance. This is a required field with no default.
* `solr.commitPerBatch`
* This is a boolean which defines whether the writer commits every batch. The default is `true`.
* _WARNING_: If you set this to `false`, then commits will happen based on the SolrClient's internal mechanism and
worker failure *may* result data being acknowledged in storm but not written in Solr.
* `solr.commit.soft`
* This is a boolean which defines whether the writer makes a soft commit or a durable commit. See [here](https://lucene.apache.org/solr/guide/6_6/near-real-time-searching.html#NearRealTimeSearching-AutoCommits) The default is `false`.
* _WARNING_: If you set this to `true`, then commits will happen based on the SolrClient's internal mechanism and
worker failure *may* result data being acknowledged in storm but not written in Solr.
* `solr.commit.waitSearcher`
* This is a boolean which defines whether the writer blocks the commit until the data is available to search. See [here](https://lucene.apache.org/solr/guide/6_6/near-real-time-searching.html#NearRealTimeSearching-AutoCommits) The default is `true`.
* _WARNING_: If you set this to `false`, then commits will happen based on the SolrClient's internal mechanism and
worker failure *may* result data being acknowledged in storm but not written in Solr.
* `solr.commit.waitFlush`
* This is a boolean which defines whether the writer blocks the commit until the data is flushed. See [here](https://lucene.apache.org/solr/guide/6_6/near-real-time-searching.html#NearRealTimeSearching-AutoCommits) The default is `true`.
* _WARNING_: If you set this to `false`, then commits will happen based on the SolrClient's internal mechanism and
worker failure *may* result data being acknowledged in storm but not written in Solr.
* `solr.collection`
* The default solr collection (if unspecified, the name is `metron`). By default, sensors will write to a collection associated with the index name in the
indexing config for that sensor. If that index name is the empty string, then the default collection will be used.
* `solr.http.config`
* This is a map which allows users to configure the Solr client's HTTP client.
* Possible fields here are:
* `socketTimeout` : Socket timeout measured in ms, closes a socket if read takes longer than x ms to complete
throws `java.net.SocketTimeoutException: Read timed out exception`
* `connTimeout` : Connection timeout measures in ms, closes a socket if connection cannot be established within x ms
with a `java.net.SocketTimeoutException: Connection timed out`
* `maxConectionsPerHost` : Maximum connections allowed per host
* `maxConnections` : Maximum total connections allowed
* `retry` : Retry http requests on error
* `allowCompression` : Allow compression (deflate,gzip) if server supports it
* `followRedirects` : Follow redirects
* `httpBasicAuthUser` : Basic auth username
* `httpBasicAuthPassword` : Basic auth password
* `solr.ssl.checkPeerName` : Check peer name
## Installing
Solr is installed in the [full dev environment for CentOS](../../metron-deployment/development/centos6) by default but is not started initially. Navigate to `$METRON_HOME/bin`
and start Solr Cloud by running `start_solr.sh`.
Metron's Ambari MPack installs several scripts in `$METRON_HOME/bin` that can be used to manage Solr. A script is also provided for installing Solr Cloud outside of full dev.
The script performs the following tasks
* Stops ES and Kibana
* Downloads Solr
* Installs Solr
* Starts Solr Cloud
_Note: for details on setting up Solr Cloud in production mode, see https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html_
Navigate to `$METRON_HOME/bin` and spin up Solr Cloud by running `install_solr.sh`. After running this script,
Elasticsearch and Kibana will have been stopped and you should now have an instance of Solr Cloud up and running at http://localhost:8983/solr/#/~cloud. This manner of starting Solr
will also spin up an embedded Zookeeper instance at port 9983. More information can be found [here](https://lucene.apache.org/solr/guide/6_6/getting-started-with-solrcloud.html)
Solr can also be installed using [HDP Search 3](https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_solr-search-installation/content/ch_hdp_search_30.html). HDP Search 3 sets the Zookeeper root to
`/solr` so this will need to be added to each url in the comma-separated list in Ambari UI -> Services -> Metron -> Configs -> Index Settings -> Solr Zookeeper Urls. For example, in full dev
this would be `node1:2181/solr`.
## Enabling Solr
Elasticsearch is the real-time store used by default in Metron. Solr can be enabled following these steps:
1. Stop the Metron Indexing component in Ambari.
1. Update Ambari UI -> Services -> Metron -> Configs -> Index Settings -> Solr Zookeeper Urls to match the Solr installation described in the previous section.
1. Change Ambari UI -> Services -> Metron -> Configs -> Indexing -> Index Writer - Random Access -> Random Access Search Engine to `Solr`.
1. Change Ambari UI -> Services -> Metron -> Configs -> REST -> Source Type Field Name to `source.type`.
1. Change Ambari UI -> Services -> Metron -> Configs -> REST -> Threat Triage Score Field Name to `threat.triage.score`.
1. Start the Metron Indexing component in Ambari.
1. Restart Metron REST and the Alerts UI in Ambari.
This will automatically create collections for the schemas shipped with Metron:
* bro
* snort
* yaf
* error (used internally by Metron)
* metaalert (used internall by Metron)
Any other collections must be created manually before starting the Indexing component. Alerts should be present in the Alerts UI after enabling Solr.
## Schemas
As of now, we have mapped out the Schemas in `src/main/config/schema`.
Ambari will eventually install these, but at the moment it's manual and
you should refer to the Solr documentation [https://lucene.apache.org/solr/guide/6_6](here) in general
and [here](https://lucene.apache.org/solr/guide/6_6/documents-fields-and-schema-design.html) if you'd like to know more about schemas in Solr.
In Metron's Solr DAO implementation, document updates involve reading a document, applying the update and replacing the original by reindexing the whole document.
Indexing LatLonType and PointType field types stores data in internal fields that should not be returned in search results. For these fields a dynamic field type matching the suffix needs to be added to store the data points.
Solr 6+ comes with a new LatLonPointSpatialField field type that should be used instead of LatLonType if possible. Otherwise, a LatLongType field should be defined as:
```
<dynamicField name="*.location_point" type="location" multiValued="false" docValues="false"/>
<dynamicField name="*_coordinate" type="pdouble" indexed="true" stored="false" docValues="false"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
```
A PointType field should be defined as:
```
<dynamicField name="*.point" type="point" multiValued="false" docValues="false"/>
<dynamicField name="*_point" type="pdouble" indexed="true" stored="false" docValues="false"/>
<fieldType name="point" class="solr.PointType" subFieldSuffix="_point"/>
```
If any copy fields are defined, stored and docValues should be set to false.
## Collections
Convenience scripts are provided with Metron to create and delete collections. Ambari uses these scripts to automatically create collections. To use them outside of Ambari, a few environment variables must be set first:
```
# Path to the zookeeper node used by Solr
export ZOOKEEPER=node1:2181/solr
# Set to true if Kerberos is enabled
export SECURITY_ENABLED=true
```
The scripts can then be called directly with the collection name as the first argument . For example, to create the bro collection:
```
$METRON_HOME/bin/create_collection.sh bro
```
To delete the bro collection:
```
$METRON_HOME/bin/delete_collection.sh bro
```
The `create_collection.sh` script depends on schemas installed in `$METRON_HOME/config/schema`. There are several schemas that come with Metron:
* bro
* snort
* yaf
* metaalert
* error
Additional schemas should be installed in that location if using the `create_collection.sh` script. Any collection can be deleted with the `delete_collection.sh` script.
These scripts use the [Solr Collection API](http://lucene.apache.org/solr/guide/6_6/collections-api.html).