| = Post Tool |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| Solr includes a simple command line tool for POSTing various types of content to a Solr server. |
| |
| The tool is `bin/post`. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the section <<Post Tool Windows Support>> below. |
| |
| NOTE: This tool is meant for use by new users exploring Solr's capabilities, and is not intended as a robust solution to be used for indexing documents into production systems. |
| |
| To run it, open a window and enter: |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted example/films/films.json |
| ---- |
| |
| This will contact the server at `localhost:8983`. Specifying the `collection/core name` is *mandatory*. The `-help` (or simply `-h`) option will output information on its usage (i.e., `bin/post -help)`. |
| |
| == Using the bin/post Tool |
| |
| Specifying either the `collection/core name` or the full update `url` is *mandatory* when using `bin/post`. |
| |
| The basic usage of `bin/post` is: |
| |
| [source,plain] |
| ---- |
| $ bin/post -h |
| Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d ["...",...]> |
| or post -help |
| |
| collection name defaults to DEFAULT_SOLR_COLLECTION if not specified |
| |
| OPTIONS |
| ======= |
| Solr options: |
| -url <base Solr update URL> (overrides collection, host, and port) |
| -host <host> (default: localhost) |
| -p or -port <port> (default: 8983) |
| -commit yes|no (default: yes) |
| -u or -user <user:pass> (sets BasicAuth credentials) |
| |
| Web crawl options: |
| -recursive <depth> (default: 1) |
| -delay <seconds> (default: 10) |
| |
| |
| Directory crawl options: |
| -delay <seconds> (default: 0) |
| |
| stdin/args options: |
| -type <content/type> (default: application/xml) |
| |
| |
| Other options: |
| -filetypes <type>[,<type>,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log) |
| -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request) |
| -out yes|no (default: no; yes outputs Solr response to console) |
| ... |
| ---- |
| |
| == Examples Using bin/post |
| |
| There are several ways to use `bin/post`. This section presents several examples. |
| |
| === Indexing XML |
| |
| Add all documents with file extension `.xml` to collection or core named `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted *.xml |
| ---- |
| |
| Add all documents with file extension `.xml` to the `gettingstarted` collection/core on Solr running on port `8984`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted -p 8984 *.xml |
| ---- |
| |
| Send XML arguments to delete a document from `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted -d '<delete><id>42</id></delete>' |
| ---- |
| |
| === Indexing CSV |
| |
| Index all CSV files into `gettingstarted`: |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted *.csv |
| ---- |
| |
| Index a tab-separated file into `gettingstarted`: |
| |
| [source,bash] |
| ---- |
| bin/post -c signals -params "separator=%09" -type text/csv data.tsv |
| ---- |
| |
| The content type (`-type`) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is. The <<uploading-data-with-index-handlers.adoc#csv-formatted-index-updates,CSV handler>> supports the `separator` parameter, and is passed through using the `-params` setting. |
| |
| === Indexing JSON |
| |
| Index all JSON files into `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted *.json |
| ---- |
| |
| === Indexing Rich Documents (PDF, Word, HTML, etc.) |
| |
| Index a PDF file into `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted a.pdf |
| ---- |
| |
| Automatically detect content types in a folder, and recursively scan it for documents for indexing into `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted afolder/ |
| ---- |
| |
| Automatically detect content types in a folder, but limit it to PPT and HTML files and index into `gettingstarted`. |
| |
| [source,bash] |
| ---- |
| bin/post -c gettingstarted -filetypes ppt,html afolder/ |
| ---- |
| |
| === Indexing to a Password Protected Solr (Basic Auth) |
| |
| Index a PDF as the user "solr" with password "SolrRocks": |
| |
| [source,bash] |
| ---- |
| bin/post -u solr:SolrRocks -c gettingstarted a.pdf |
| ---- |
| |
| == Post Tool Windows Support |
| |
| `bin/post` is a Unix shell script and as such cannot be used directly on Windows. |
| However it delegates its work to a cross-platform capable Java program called "SimplePostTool" or `post.jar`, that can be used in Windows environments. |
| |
| The argument syntax differs significantly from `bin/post`, so your first step should be to print the SimplePostTool help text. |
| |
| [source,plain] |
| ---- |
| $ java -jar example\exampledocs\post.jar -h |
| ---- |
| |
| This command prints information about all the arguments and System properties available to SimplePostTool users. |
| There are also examples showing how to post files, crawl a website or file system folder, and send update commands (deletes, etc.) directly to Solr. |
| |
| Most usage involves passing both Java System properties and program arguments on the command line. Consider the example below: |
| |
| [source,plain] |
| ---- |
| $ java -jar -Dc=gettingstarted -Dauto example\exampledocs\post.jar example\exampledocs\* |
| ---- |
| |
| This indexes the contents of the `exampledocs` directory into a collection called `gettingstarted`. |
| The `-Dauto` System property governs whether or not Solr sends the document type to Solr during extraction. |