| = Transforming Data |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| |
| Streaming Expressions provides a powerful set of functions for transforming result sets. This section |
| of the user guide provides an overview of useful transformations applied to result sets. |
| |
| |
| <<Selecting and Adding Fields, Selecting>> - |
| <<Filtering Tuples, Filtering>> - |
| <<Paging, Paging>> - |
| <<Handling Nulls, Handling Nulls>> - |
| <<Regex Matching, Regex Matching>> - |
| <<Sorting, Sorting>> - |
| <<Rollups, Rollups>> |
| |
| |
| == Selecting and Adding Fields |
| |
| The `select` function wraps another streaming expression can perform the following operations on each tuple |
| in the stream: |
| |
| * *Select* a subset of fields |
| * *Map* fields to new names |
| * *Compute* new fields |
| |
| Below is an example showing the `select` function wrapping a `search` function |
| and mapping fields to new field names. The `recNum` function is a math expression |
| which simply returns the current record number of the tuple. The `select` expression can call |
| any math expression to compute new values. |
| |
| image::images/math-expressions/select1.png[] |
| |
| Below is an example using the `div` function to compute a new field |
| from two existing fields: |
| |
| image::images/math-expressions/select-math.png[] |
| |
| |
| == Filtering Tuples |
| |
| The `having` function can be used to filter tuples in the stream based on |
| boolean logic. |
| |
| In the example below the `having` function is filtering the output of the |
| `facet` function to only emit tuples that have `count(*)` greater then 20404. |
| |
| image::images/math-expressions/having.png[] |
| |
| |
| == Paging |
| |
| The *record number*, added with the `recNum` function, |
| can be filtered on to support paging. |
| |
| In the example below the `and` function with nested `lt` and `gt` functions are |
| used to select records within a specific record number range: |
| |
| image::images/math-expressions/search-page.png[] |
| |
| |
| == Handling Nulls |
| |
| The `notNull` and `isNull` functions can be used to either replace null values with different values, |
| or to filter out tuples with null values. |
| |
| The example below is using the `isNull` function inside of `select` function |
| to replace null values with -1. The `if` function takes 3 parameters. The first |
| is a boolean expression, in this case `isNull`. The `if` function returns |
| the second parameter if the boolean function returns true, and the third |
| parameter if it returns false. In this case `isNull` is always true because its |
| checking for a field in the tuples that is not included in the result set. |
| |
| image::images/math-expressions/select2.png[] |
| |
| `notNull` and `isNull` can also be used with the `having` function to filter out |
| tuples with null values. |
| |
| The example below emits all the documents because it is evaluating `isNull` for |
| a field that is not in the result set, which always returns true. |
| |
| image::images/math-expressions/having2.png[] |
| |
| The example below emits zero documents because it is evaluating `notNull` for |
| a field that is not in the result set, which always returns false. |
| |
| image::images/math-expressions/having3.png[] |
| |
| == Regex Matching and Filtering |
| |
| The `matches` function can be used inside of a `having` function |
| to test if a field in the record matches a specific |
| regular expression. This allows for sophisticated regex matching over search results. |
| |
| The example below uses the `matches` function to return all records where |
| the *complaint_type_s* field ends with *Commercial*. |
| |
| image::images/math-expressions/search-matches.png[] |
| |
| == Sorting |
| |
| The `sort` and `top` function can be used to resort a result set in memory. The `sort` function |
| sorts and returns the entire result set based on the sort criteria. The `top` function |
| can be used to return the top N values in a result set based on the sort criteria. |
| |
| image::images/math-expressions/search-resort.png[] |
| |
| == Rollups |
| |
| The `rollup` and `hashRollup` functions can be used to perform aggregations over result sets. This |
| is different then the `facet`, `facet2D` and `timeseries` aggregation functions which push the aggregations |
| into the search engine using the JSON facet API. |
| |
| The `rollup` function performs map-reduce style rollups, which requires the result stream be sorted by the |
| the grouping fields. This allows for aggregations over very high cardinality fields. The `hashRollup` function |
| performs rollups keeping all buckets in an in-memory hashmap. This requires enough memory to store all the |
| distinct group by fields in memory, but does not require that the underlying stream be sorted. |
| |
| The example below shows a visualization of the top 5 complaint types |
| from a random sample of the `nyc311` complaint database. The `top` |
| function is used to select the top 5 complaint types based on |
| the `count(*)` field output by the `hashRollup`. |
| |
| image::images/math-expressions/hashRollup.png[] |