| = Script Update Processor |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| The {solr-javadocs}/contrib/scripting/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.html[ScriptUpdateProcessorFactory] allows Java scripting engines to be used |
| during Solr document update processing, allowing dramatic flexibility in |
| expressing custom document processing logic before being indexed. It has hooks to the |
| commit, delete, rollback, etc indexing actions, however add is the most common usage. |
| It is implemented as an UpdateProcessor to be placed in an UpdateChain. |
| |
| TIP: This used to be known as the _StatelessScriptingUpdateProcessor_ and was renamed to clarify the key aspect of this update processor is it enables scripting. |
| |
| The script can be written in any scripting language supported by your JVM (such |
| as JavaScript), and executed dynamically so no pre-compilation is necessary. |
| |
| WARNING: Being able to run a script of your choice as part of the indexing pipeline is a really powerful tool, that I sometimes call the |
| _Get out of jail free_ card because you can solve some problems this way that you can't in any other way. However, you are introducing some |
| potential security vulnerabilities. |
| |
| == Installing the ScriptingUpdateProcessor and Scripting Engines |
| |
| The scripting update processor lives in the contrib module `/contrib/scripting`, and you need to explicitly add it to your Solr setup. |
| |
| Java 11 and previous versions come with a JavaScript engine called Nashorn, but Java 12 will require you to add your own JavaScript engine. Other supported scripting engines like |
| JRuby, Jython, Groovy, all require you to add JAR files. |
| |
| Learn more about adding the `dist/solr-scripting-*.jar` file, and any other needed JAR files (depending on your scripting engine) into Solr's <<libs.adoc#lib-directories,Lib Directories>>. |
| |
| == Configuration |
| |
| [source,xml] |
| ---- |
| <updateRequestProcessorChain name="script"> |
| <processor class="org.apache.solr.scripting.update.ScriptUpdateProcessorFactory"> |
| <str name="script">update-script.js</str> |
| </processor> |
| <!-- optional parameters passed to script |
| <lst name="params"> |
| <str name="config_param">example config parameter</str> |
| </lst> |
| --> |
| <processor class="solr.LogUpdateProcessorFactory" /> |
| <processor class="solr.RunUpdateProcessorFactory" /> |
| </updateRequestProcessorChain> |
| ---- |
| |
| NOTE: The processor supports the defaults/appends/invariants concept for its config. |
| However, it is also possible to skip this level and configure the parameters directly underneath the `<processor>` tag. |
| |
| Below follows a list of each configuration parameters and their meaning: |
| |
| `script`:: |
| The script file name. The script file must be placed in the `conf/ directory. |
| There can be one or more "script" parameters specified; multiple scripts are executed in the order specified. |
| |
| `engine`:: |
| Optionally specifies the scripting engine to use. This is only needed if the extension |
| of the script file is not a standard mapping to the scripting engine. For example, if your |
| script file was coded in JavaScript but the file name was called `update-script.foo`, |
| use "javascript" as the engine name. |
| |
| `params`:: |
| Optional parameters that are passed into the script execution context. This is |
| specified as a named list (`<lst>`) structure with nested typed parameters. If |
| specified, the script context will get a "params" object, otherwise there will be no "params" object available. |
| |
| |
| == Script execution context |
| |
| Every script has some variables provided to it. |
| |
| `logger`:: |
| Logger (org.slf4j.Logger) instance. This is useful for logging information from the script. |
| |
| `req`:: |
| {solr-javadocs}/core/org/apache/solr/request/SolrQueryRequest.html[SolrQueryRequest] instance. |
| |
| `rsp`:: |
| {solr-javadocs}/core/org/apache/solr/response/SolrQueryResponse.html[SolrQueryResponse] instance. |
| |
| `params`:: |
| The "params" object, if any specified, from the configuration. |
| |
| == Examples |
| |
| The `processAdd()` and the other script methods can return false to skip further |
| processing of the document. All methods must be defined, though generally the |
| `processAdd()` method is where the action is. |
| |
| Here's a URL that works with the techproducts example setup demonstrating specifying |
| the "script" update chain: `http://localhost:8983/solr/techproducts/update?commit=true&stream.contentType=text/csv&fieldnames=id,description&stream.body=1,foo&update.chain=script` |
| which logs the following: |
| |
| [source,text] |
| ---- |
| INFO: update-script#processAdd: id=1 |
| ---- |
| |
| You can see the message recorded in the Solr logging UI. |
| |
| === Javascript |
| |
| Note: There is a JavaScript example `update-script.js` as part of the `techproducts` configset. |
| Check `solrconfig.xml` and uncomment the update request processor definition to enable this feature. |
| |
| [source,javascript] |
| ---- |
| function processAdd(cmd) { |
| |
| doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument |
| id = doc.getFieldValue("id"); |
| logger.info("update-script#processAdd: id=" + id); |
| |
| // Set a field value: |
| // doc.setField("foo_s", "whatever"); |
| |
| // Get a configuration parameter: |
| // config_param = params.get('config_param'); // "params" only exists if processor configured with <lst name="params"> |
| |
| // Get a request parameter: |
| // some_param = req.getParams().get("some_param") |
| |
| // Add a field of field names that match a pattern: |
| // - Potentially useful to determine the fields/attributes represented in a result set, via faceting on field_name_ss |
| // field_names = doc.getFieldNames().toArray(); |
| // for(i=0; i < field_names.length; i++) { |
| // field_name = field_names[i]; |
| // if (/attr_.*/.test(field_name)) { doc.addField("attribute_ss", field_names[i]); } |
| // } |
| |
| } |
| |
| function processDelete(cmd) { |
| // no-op |
| } |
| |
| function processMergeIndexes(cmd) { |
| // no-op |
| } |
| |
| function processCommit(cmd) { |
| // no-op |
| } |
| |
| function processRollback(cmd) { |
| // no-op |
| } |
| |
| function finish() { |
| // no-op |
| } |
| ---- |
| |
| === Ruby |
| Ruby support is implemented via the https://www.jruby.org/[JRuby] project. |
| To use JRuby as the scripting engine, add `jruby.jar` to Solr. |
| |
| Here's an example of a JRuby update processing script (note that all variables passed in require prefixing with `$`, such as `$logger`): |
| |
| [source,ruby] |
| ---- |
| def processAdd(cmd) |
| doc = cmd.solrDoc # org.apache.solr.common.SolrInputDocument |
| id = doc.getFieldValue('id') |
| |
| $logger.info "update-script#processAdd: id=#{id}" |
| |
| doc.setField('source_s', 'ruby') |
| |
| $logger.info "update-script#processAdd: config_param=#{$params.get('config_param')}" |
| end |
| |
| def processDelete(cmd) |
| # no-op |
| end |
| |
| def processMergeIndexes(cmd) |
| # no-op |
| end |
| |
| def processCommit(cmd) |
| # no-op |
| end |
| |
| def processRollback(cmd) |
| # no-op |
| end |
| |
| def finish() |
| # no-op |
| end |
| ---- |
| |
| ==== Known issues |
| |
| The following in JRuby does not work as expected for some reason, though it does work properly in JavaScript: |
| |
| [source,ruby] |
| ---- |
| # $logger.info "update-script#processAdd: request_param=#{$req.params.get('request_param')}" |
| # $rsp.add('script_processed',id) |
| ---- |
| |
| === Groovy |
| |
| Add JARs from a Groovy distro's `lib/` directory to Solr. All JARs from |
| Groovy's distro probably aren't required, but more than just the main `groovy.jar` |
| file is needed (at least when this was tested using Groovy 2.0.6) |
| |
| [source,groovy] |
| ---- |
| def processAdd(cmd) { |
| doc = cmd.solrDoc // org.apache.solr.common.SolrInputDocument |
| id = doc.getFieldValue('id') |
| |
| logger.info "update-script#processAdd: id=" + id |
| |
| doc.setField('source_s', 'groovy') |
| |
| logger.info "update-script#processAdd: config_param=" + params.get('config_param') |
| |
| logger.info "update-script#processAdd: request_param=" + req.params.get('request_param') |
| rsp.add('script_processed',id) |
| } |
| |
| def processDelete(cmd) { |
| // no-op |
| } |
| |
| def processMergeIndexes(cmd) { |
| // no-op |
| } |
| |
| def processCommit(cmd) { |
| // no-op |
| } |
| |
| def processRollback(cmd) { |
| // no-op |
| } |
| |
| def finish() { |
| // no-op |
| } |
| ---- |
| |
| === Python |
| Python support is implemented via the https://www.jython.org/[Jython] project. |
| Add the *standalone* `jython.jar` (the JAR that contains all the dependencies) into Solr. |
| |
| [source,python] |
| ---- |
| def processAdd(cmd): |
| doc = cmd.solrDoc |
| id = doc.getFieldValue("id") |
| logger.info("update-script#processAdd: id=" + id) |
| |
| def processDelete(cmd): |
| logger.info("update-script#processDelete") |
| |
| def processMergeIndexes(cmd): |
| logger.info("update-script#processMergeIndexes") |
| |
| def processCommit(cmd): |
| logger.info("update-script#processCommit") |
| |
| def processRollback(cmd): |
| logger.info("update-script#processRollback") |
| |
| def finish(): |
| logger.info("update-script#finish") |
| ---- |