blob: 1b38bef33d50beeef8e65bc6a0253a4e350c1616 [file] [log] [blame]
= Streaming Expressions
:page-children: stream-source-reference, stream-decorator-reference, stream-evaluator-reference, math-expressions, graph-traversal, stream-api
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Streaming expressions exposes the capabilities of SolrCloud as composable functions.
These functions provide a system for searching, transforming, analyzing, and visualizing data stored in SolrCloud collections.
At a high level there a four main capabilities that will be explored in the documentation:
* *Searching*, sampling and aggregating results from Solr.
* *Transforming* result sets after they are retrieved from Solr.
* *Analyzing* and modeling result sets using probability and statistics and machine learning libraries.
* *Visualizing* result sets, aggregations and statistical models of the data.
== Stream Language Basics
Streaming expressions are comprised of streaming functions which work with a Solr collection.
They emit a stream of tuples (key/value Maps).
Some of the provided streaming functions are designed to work with entire result sets rather than the top N results like normal search.
This is supported by the <<exporting-result-sets.adoc#,/export handler>>.
Some streaming functions act as stream sources to originate the stream flow. Other streaming functions act as stream decorators to wrap other stream functions and perform operations on the stream of tuples. Many streams functions can be parallelized across a worker collection. This can be particularly powerful for relational algebra functions.
=== Streaming Requests and Responses
Solr has a `/stream` request handler that takes streaming expression requests and returns the tuples as a JSON stream. This request handler is implicitly defined, meaning there is nothing that has to be defined in `solrconfig.xml` - see <<implicit-requesthandlers.adoc#,Implicit RequestHandlers>>.
The `/stream` request handler takes one parameter, `expr`, which is used to specify the streaming expression. For example, this curl command encodes and POSTs a simple `search()` expression to the `/stream` handler:
[source,bash]
----
curl --data-urlencode 'expr=search(enron_emails,
q="from:1800flowers*",
fl="from, to",
sort="from asc")' http://localhost:8983/solr/enron_emails/stream
----
Details of the parameters for each function are included below.
For the above example the `/stream` handler responded with the following JSON response:
[source,json]
----
{"result-set":{"docs":[
{"from":"1800flowers.133139412@s2u2.com","to":"lcampbel@enron.com"},
{"from":"1800flowers.93690065@s2u2.com","to":"jtholt@ect.enron.com"},
{"from":"1800flowers.96749439@s2u2.com","to":"alewis@enron.com"},
{"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
{"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
{"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
{"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
{"EOF":true,"RESPONSE_TIME":33}]}
}
----
Note the last tuple in the above example stream is `{"EOF":true,"RESPONSE_TIME":33}`. The `EOF` indicates the end of the stream. To process the JSON response, you'll need to use a streaming JSON implementation because streaming expressions are designed to return the entire result set which may have millions of records. In your JSON client you'll need to iterate each doc (tuple) and check for the EOF tuple to determine the end of stream.
=== Configuration
Timeouts for Streaming Expressions can be configured with the `socketTimeout` and `connTimeout` startup parameters.
== Elements of the Lanaguage
=== Stream Sources
Stream sources originate streams. There are rich set of searching, sampling and aggregation stream sources to choose from.
A full reference to all available source expressions is available in <<stream-source-reference.adoc#,Stream Source Reference>>.
=== Stream Decorators
Stream decorators wrap stream sources and other stream decorators to transform a stream.
A full reference to all available decorator expressions is available in <<stream-decorator-reference.adoc#,Stream Decorator Reference>>.
=== Math Expressions
Math expressions are a vector and matrix math library that can be combined with streaming expressions to perform analysis and build mathematical models
of the result sets.
From a language standpoint math expressions are a sub-language of streaming expressions that don't return streams of tuples.
Instead they operate on and return numbers, vectors, matrices and mathematical models.
The documentation will show how to combine streaming expressions and math
expressions.
The math expressions user guide is available in <<>>
From a language standpoint math expressions are referred to as *stream evaluators*.
A full reference to all available evaluator expressions is available in <<stream-evaluator-reference.adoc#,Stream Evaluator Reference>>.
=== Visualization
Visualization of both streaming expressions and math expressions is done using Apache Zeppelin and the Zeppelin-Solr Interpreter.
Visualizing Streaming expressions and setting up of Apache Zeppelin is documented in <<math-start.adoc#zeppelin-solr-interpreter,Zeppelin-Solr Interpreter>>.
The <<math-expressions.adoc#,Math Expressions User Guide>> has in depth coverage of visualization techniques.