solr/solr-ref-guide/src/script-update-processor.adoc - lucene-solr - Git at Google

 = Script Update Processor
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 The {solr-javadocs}/contrib/scripting/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.html[ScriptUpdateProcessorFactory] allows Java scripting engines to be used
 during Solr document update processing, allowing dramatic flexibility in
 expressing custom document processing logic before being indexed.  It has hooks to the
 commit, delete, rollback, etc indexing actions, however add is the most common usage.
 It is implemented as an UpdateProcessor to be placed in an UpdateChain.

 TIP: This used to be known as the _StatelessScriptingUpdateProcessor_ and was renamed to clarify the key aspect of this update processor is it enables scripting.

 The script can be written in any scripting language supported by your JVM (such
 as JavaScript), and executed dynamically so no pre-compilation is necessary.

 WARNING: Being able to run a script of your choice as part of the indexing pipeline is a really powerful tool, that I sometimes call the
 _Get out of jail free_ card because you can solve some problems this way that you can't in any other way.  However, you are introducing some
 potential security vulnerabilities.

 == Installing the ScriptingUpdateProcessor and Scripting Engines

 The scripting update processor lives in the contrib module `/contrib/scripting`, and you need to explicitly add it to your Solr setup.

 Java 11 and previous versions come with a JavaScript engine called Nashorn, but Java 12 will require you to add your own JavaScript engine.   Other supported scripting engines like
 JRuby, Jython, Groovy, all require you to add JAR files.

 Learn more about adding the `dist/solr-scripting-*.jar` file, and any other needed JAR files (depending on your scripting engine) into Solr's <<libs.adoc#lib-directories,Lib Directories>>.

 == Configuration

 [source,xml]
 ----
 <updateRequestProcessorChain name="script">
    <processor class="org.apache.solr.scripting.update.ScriptUpdateProcessorFactory">
      <str name="script">update-script.js</str>
    </processor>
    <!--  optional parameters passed to script
      <lst name="params">
        <str name="config_param">example config parameter</str>
      </lst>
    -->
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>
 ----

 NOTE: The processor supports the defaults/appends/invariants concept for its config.
 However, it is also possible to skip this level and configure the parameters directly underneath the `<processor>` tag.

 Below follows a list of each configuration parameters and their meaning:

 `script`::
 The script file name. The script file must be placed in the `conf/ directory.
 There can be one or more "script" parameters specified; multiple scripts are executed in the order specified.

 `engine`::
 Optionally specifies the scripting engine to use. This is only needed if the extension
 of the script file is not a standard mapping to the scripting engine. For example, if your
 script file was coded in JavaScript but the file name was called `update-script.foo`,
 use "javascript" as the engine name.

 `params`::
 Optional parameters that are passed into the script execution context. This is
 specified as a named list (`<lst>`) structure with nested typed parameters. If
 specified, the script context will get a "params" object, otherwise there will be no "params" object available.


 == Script execution context

 Every script has some variables provided to it.

 `logger`::
 Logger (org.slf4j.Logger) instance. This is useful for logging information from the script.

 `req`::
 {solr-javadocs}/core/org/apache/solr/request/SolrQueryRequest.html[SolrQueryRequest] instance.

 `rsp`::
 {solr-javadocs}/core/org/apache/solr/response/SolrQueryResponse.html[SolrQueryResponse] instance.

 `params`::
 The "params" object, if any specified, from the configuration.

 == Examples

 The `processAdd()` and the other script methods can return false to skip further
 processing of the document. All methods must be defined, though generally the
 `processAdd()` method is where the action is.

 Here's a URL that works with the techproducts example setup demonstrating specifying
 the "script" update chain: `http://localhost:8983/solr/techproducts/update?commit=true&stream.contentType=text/csv&fieldnames=id,description&stream.body=1,foo&update.chain=script`
 which logs the following:

 [source,text]
 ----
 INFO: update-script#processAdd: id=1
 ----

 You can see the message recorded in the Solr logging UI.

 === Javascript

 Note: There is a JavaScript example `update-script.js` as part of the `techproducts` configset.
 Check `solrconfig.xml` and uncomment the update request processor definition to enable this feature.

 [source,javascript]
 ----
 function processAdd(cmd) {

   doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
   id = doc.getFieldValue("id");
   logger.info("update-script#processAdd: id=" + id);

 // Set a field value:
 //  doc.setField("foo_s", "whatever");

 // Get a configuration parameter:
 //  config_param = params.get('config_param');  // "params" only exists if processor configured with <lst name="params">

 // Get a request parameter:
 // some_param = req.getParams().get("some_param")

 // Add a field of field names that match a pattern:
 //   - Potentially useful to determine the fields/attributes represented in a result set, via faceting on field_name_ss
 //  field_names = doc.getFieldNames().toArray();
 //  for(i=0; i < field_names.length; i++) {
 //    field_name = field_names[i];
 //    if (/attr_.*/.test(field_name)) { doc.addField("attribute_ss", field_names[i]); }
 //  }

 }

 function processDelete(cmd) {
   // no-op
 }

 function processMergeIndexes(cmd) {
   // no-op
 }

 function processCommit(cmd) {
   // no-op
 }

 function processRollback(cmd) {
   // no-op
 }

 function finish() {
   // no-op
 }
 ----

 === Ruby
 Ruby support is implemented via the https://www.jruby.org/[JRuby] project.
 To use JRuby as the scripting engine, add `jruby.jar` to Solr.

 Here's an example of a JRuby update processing script (note that all variables passed in require prefixing with `$`, such as `$logger`):

 [source,ruby]
 ----
 def processAdd(cmd)
   doc = cmd.solrDoc  # org.apache.solr.common.SolrInputDocument
   id = doc.getFieldValue('id')

   $logger.info "update-script#processAdd: id=#{id}"

   doc.setField('source_s', 'ruby')

   $logger.info "update-script#processAdd: config_param=#{$params.get('config_param')}"
 end

 def processDelete(cmd)
   # no-op
 end

 def processMergeIndexes(cmd)
   # no-op
 end

 def processCommit(cmd)
   # no-op
 end

 def processRollback(cmd)
   # no-op
 end

 def finish()
   # no-op
 end
 ----

 ==== Known issues

 The following in JRuby does not work as expected for some reason, though it does work properly in JavaScript:

 [source,ruby]
 ----
 #  $logger.info "update-script#processAdd: request_param=#{$req.params.get('request_param')}"
 #  $rsp.add('script_processed',id)
 ----

 === Groovy

 Add JARs from a Groovy distro's `lib/` directory to Solr.  All JARs from
 Groovy's distro probably aren't required, but more than just the main `groovy.jar`
 file is needed (at least when this was tested using Groovy 2.0.6)

 [source,groovy]
 ----
 def processAdd(cmd) {
   doc = cmd.solrDoc  // org.apache.solr.common.SolrInputDocument
   id = doc.getFieldValue('id')

   logger.info "update-script#processAdd: id=" + id

   doc.setField('source_s', 'groovy')

   logger.info "update-script#processAdd: config_param=" + params.get('config_param')

   logger.info "update-script#processAdd: request_param=" + req.params.get('request_param')
   rsp.add('script_processed',id)
 }

 def processDelete(cmd) {
  //  no-op
 }

 def processMergeIndexes(cmd) {
  // no-op
 }

 def processCommit(cmd) {
  //  no-op
 }

 def processRollback(cmd) {
  // no-op
 }

 def finish() {
  // no-op
 }
 ----

 === Python
 Python support is implemented via the https://www.jython.org/[Jython] project.
 Add the *standalone* `jython.jar` (the JAR that contains all the dependencies) into Solr.

 [source,python]
 ----
 def processAdd(cmd):
   doc = cmd.solrDoc
   id = doc.getFieldValue("id")
   logger.info("update-script#processAdd: id=" + id)

 def processDelete(cmd):
     logger.info("update-script#processDelete")

 def processMergeIndexes(cmd):
     logger.info("update-script#processMergeIndexes")

 def processCommit(cmd):
     logger.info("update-script#processCommit")

 def processRollback(cmd):
     logger.info("update-script#processRollback")

 def finish():
     logger.info("update-script#finish")
 ----
	= Script Update Processor
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	The {solr-javadocs}/contrib/scripting/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.html[ScriptUpdateProcessorFactory] allows Java scripting engines to be used
	during Solr document update processing, allowing dramatic flexibility in
	expressing custom document processing logic before being indexed. It has hooks to the
	commit, delete, rollback, etc indexing actions, however add is the most common usage.
	It is implemented as an UpdateProcessor to be placed in an UpdateChain.

	TIP: This used to be known as the _StatelessScriptingUpdateProcessor_ and was renamed to clarify the key aspect of this update processor is it enables scripting.

	The script can be written in any scripting language supported by your JVM (such
	as JavaScript), and executed dynamically so no pre-compilation is necessary.

	WARNING: Being able to run a script of your choice as part of the indexing pipeline is a really powerful tool, that I sometimes call the
	_Get out of jail free_ card because you can solve some problems this way that you can't in any other way. However, you are introducing some
	potential security vulnerabilities.

	== Installing the ScriptingUpdateProcessor and Scripting Engines

	The scripting update processor lives in the contrib module `/contrib/scripting`, and you need to explicitly add it to your Solr setup.

	Java 11 and previous versions come with a JavaScript engine called Nashorn, but Java 12 will require you to add your own JavaScript engine. Other supported scripting engines like
	JRuby, Jython, Groovy, all require you to add JAR files.

	Learn more about adding the `dist/solr-scripting-*.jar` file, and any other needed JAR files (depending on your scripting engine) into Solr's <<libs.adoc#lib-directories,Lib Directories>>.

	== Configuration

	[source,xml]
	----
	<updateRequestProcessorChain name="script">
	<processor class="org.apache.solr.scripting.update.ScriptUpdateProcessorFactory">
	<str name="script">update-script.js</str>
	</processor>
	<!-- optional parameters passed to script
	<lst name="params">
	<str name="config_param">example config parameter</str>
	</lst>
	-->
	<processor class="solr.LogUpdateProcessorFactory" />
	<processor class="solr.RunUpdateProcessorFactory" />
	</updateRequestProcessorChain>
	----

	NOTE: The processor supports the defaults/appends/invariants concept for its config.
	However, it is also possible to skip this level and configure the parameters directly underneath the `<processor>` tag.

	Below follows a list of each configuration parameters and their meaning:

	`script`::
	The script file name. The script file must be placed in the `conf/ directory.
	There can be one or more "script" parameters specified; multiple scripts are executed in the order specified.

	`engine`::
	Optionally specifies the scripting engine to use. This is only needed if the extension
	of the script file is not a standard mapping to the scripting engine. For example, if your
	script file was coded in JavaScript but the file name was called `update-script.foo`,
	use "javascript" as the engine name.

	`params`::
	Optional parameters that are passed into the script execution context. This is
	specified as a named list (`<lst>`) structure with nested typed parameters. If
	specified, the script context will get a "params" object, otherwise there will be no "params" object available.


	== Script execution context

	Every script has some variables provided to it.

	`logger`::
	Logger (org.slf4j.Logger) instance. This is useful for logging information from the script.

	`req`::
	{solr-javadocs}/core/org/apache/solr/request/SolrQueryRequest.html[SolrQueryRequest] instance.

	`rsp`::
	{solr-javadocs}/core/org/apache/solr/response/SolrQueryResponse.html[SolrQueryResponse] instance.

	`params`::
	The "params" object, if any specified, from the configuration.

	== Examples

	The `processAdd()` and the other script methods can return false to skip further
	processing of the document. All methods must be defined, though generally the
	`processAdd()` method is where the action is.

	Here's a URL that works with the techproducts example setup demonstrating specifying
	the "script" update chain: `http://localhost:8983/solr/techproducts/update?commit=true&stream.contentType=text/csv&fieldnames=id,description&stream.body=1,foo&update.chain=script`
	which logs the following:

	[source,text]
	----
	INFO: update-script#processAdd: id=1
	----

	You can see the message recorded in the Solr logging UI.

	=== Javascript

	Note: There is a JavaScript example `update-script.js` as part of the `techproducts` configset.
	Check `solrconfig.xml` and uncomment the update request processor definition to enable this feature.

	[source,javascript]
	----
	function processAdd(cmd) {

	doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument
	id = doc.getFieldValue("id");
	logger.info("update-script#processAdd: id=" + id);

	// Set a field value:
	// doc.setField("foo_s", "whatever");

	// Get a configuration parameter:
	// config_param = params.get('config_param'); // "params" only exists if processor configured with <lst name="params">

	// Get a request parameter:
	// some_param = req.getParams().get("some_param")

	// Add a field of field names that match a pattern:
	// - Potentially useful to determine the fields/attributes represented in a result set, via faceting on field_name_ss
	// field_names = doc.getFieldNames().toArray();
	// for(i=0; i < field_names.length; i++) {
	// field_name = field_names[i];
	// if (/attr_.*/.test(field_name)) { doc.addField("attribute_ss", field_names[i]); }
	// }

	}

	function processDelete(cmd) {
	// no-op
	}

	function processMergeIndexes(cmd) {
	// no-op
	}

	function processCommit(cmd) {
	// no-op
	}

	function processRollback(cmd) {
	// no-op
	}

	function finish() {
	// no-op
	}
	----

	=== Ruby
	Ruby support is implemented via the https://www.jruby.org/[JRuby] project.
	To use JRuby as the scripting engine, add `jruby.jar` to Solr.

	Here's an example of a JRuby update processing script (note that all variables passed in require prefixing with `$`, such as `$logger`):

	[source,ruby]
	----
	def processAdd(cmd)
	doc = cmd.solrDoc # org.apache.solr.common.SolrInputDocument
	id = doc.getFieldValue('id')

	$logger.info "update-script#processAdd: id=#{id}"

	doc.setField('source_s', 'ruby')

	$logger.info "update-script#processAdd: config_param=#{$params.get('config_param')}"
	end

	def processDelete(cmd)
	# no-op
	end

	def processMergeIndexes(cmd)
	# no-op
	end

	def processCommit(cmd)
	# no-op
	end

	def processRollback(cmd)
	# no-op
	end

	def finish()
	# no-op
	end
	----

	==== Known issues

	The following in JRuby does not work as expected for some reason, though it does work properly in JavaScript:

	[source,ruby]
	----
	# $logger.info "update-script#processAdd: request_param=#{$req.params.get('request_param')}"
	# $rsp.add('script_processed',id)
	----

	=== Groovy

	Add JARs from a Groovy distro's `lib/` directory to Solr. All JARs from
	Groovy's distro probably aren't required, but more than just the main `groovy.jar`
	file is needed (at least when this was tested using Groovy 2.0.6)

	[source,groovy]
	----
	def processAdd(cmd) {
	doc = cmd.solrDoc // org.apache.solr.common.SolrInputDocument
	id = doc.getFieldValue('id')

	logger.info "update-script#processAdd: id=" + id

	doc.setField('source_s', 'groovy')

	logger.info "update-script#processAdd: config_param=" + params.get('config_param')

	logger.info "update-script#processAdd: request_param=" + req.params.get('request_param')
	rsp.add('script_processed',id)
	}

	def processDelete(cmd) {
	// no-op
	}

	def processMergeIndexes(cmd) {
	// no-op
	}

	def processCommit(cmd) {
	// no-op
	}

	def processRollback(cmd) {
	// no-op
	}

	def finish() {
	// no-op
	}
	----

	=== Python
	Python support is implemented via the https://www.jython.org/[Jython] project.
	Add the standalone `jython.jar` (the JAR that contains all the dependencies) into Solr.

	[source,python]
	----
	def processAdd(cmd):
	doc = cmd.solrDoc
	id = doc.getFieldValue("id")
	logger.info("update-script#processAdd: id=" + id)

	def processDelete(cmd):
	logger.info("update-script#processDelete")

	def processMergeIndexes(cmd):
	logger.info("update-script#processMergeIndexes")

	def processCommit(cmd):
	logger.info("update-script#processCommit")

	def processRollback(cmd):
	logger.info("update-script#processRollback")

	def finish():
	logger.info("update-script#finish")
	----