indexer-rabbit plugin for Nutch

indexer-rabbit plugin is used for sending documents from one or more segments to a RabbitMQ server. The configuration for the index writers is on conf/index-writers.xml file, included in the official Nutch distribution and it's as follow:

<writer id="<writer_id>" class="org.apache.nutch.indexwriter.rabbit.RabbitIndexWriter">
  <mapping>
    ...
  </mapping>
  <parameters>
    ...
  </parameters>
</writer>

Each <writer> element has two mandatory attributes:

  • <writer_id> is a unique identification for each configuration. This feature allows Nutch to distinguish each configuration, even when they are for the same index writer. In addition, it allows to have multiple instances for the same index writer, but with different configurations.

  • org.apache.nutch.indexwriter.rabbit.RabbitIndexWriter corresponds to the canonical name of the class that implements the IndexWriter extension point. This value should not be modified for the indexer-rabbit plugin.

Mapping

The mapping section is explained here. The structure of this section is general for all index writers.

Parameters

Each parameter has the form <param name="<name>" value="<value>"/> and the parameters for this index writer are:

Parameter NameDescriptionDefault value
server.uriURI with connection parameters in the form amqp://<username>:<password>@<hostname>:<port>/<virtualHost>
Where:<username> is the username for RabbitMQ server.<password> is the password for RabbitMQ server.<hostname> is where the RabbitMQ server is running.<port> is where the RabbitMQ server is listening.<virtualHost> is where the exchange is and the user has access.
amqp://guest:guest@localhost:5672/
bindingWhether the relationship between an exchange and a queue is created automatically.
NOTE: Binding between exchanges is not supported.
false
binding.argumentsArguments used in binding. It must have the form key1=value1,key2=value2. This value is only used when the exchange's type is headers and the value of binding property is true. In other cases is ignored.
exchange.nameName for the exchange where the messages will be sent.
exchange.optionsOptions used when the exchange is created. Only used when the value of binding property is true. It must have the form type=<type>,durable=<durable>
Where:<type> is direct, topic, headers or fanout<durable> is true or false
type=direct,durable=true
queue.nameName of the queue used to create the binding. Only used when the value of binding property is true.nutch.queue
queue.optionsOptions used when the queue is created. Only used when the value of binding property is true. It must have the form durable=<durable>,exclusive=<exclusive>,auto-delete=<auto-delete>,arguments=<arguments>
Where:<durable> is true or false<exclusive> is true or false<auto-delete> is true or false<arguments> must be the form key1:value1;key2:value2
durable=true,exclusive=false,auto-delete=false
routingkeyThe routing key used to route messages in the exchange. It only makes sense when the exchange type is topic or direct.Value of queue.name property
commit.modesingle if a message contains only one document. In this case, a header with the action (write, update or delete) will be added. multiple if a message contains all documents.multiple
commit.sizeAmount of documents to send into each message if the value of commit.mode property is multiple. In single mode this value represents the amount of messages to be sent.250
headers.staticHeaders to add to each message. It must have the form key1=value1,key2=value2.
headers.dynamicDocument's fields to add as headers to each message. It must have the form field1,field2. Only used when the value of commit.mode property is single.