blob: 29995993199fce6e5af99d8032e3e16bee367426 [file] [log] [blame]
= SolrJ
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
{solr-javadocs}/solrj/[SolrJ] is an API that makes it easy for applications written in Java (or any language based on the JVM) to talk to Solr.
SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods.
SolrJ supports most Solr APIs, and is highly configurable.
== Building and Running SolrJ Applications
The SolrJ API ships with Solr, so you do not have to download or install anything else.
But you will need to configure your build to include SolrJ and its dependencies.
=== Common Build Systems
Most mainstream build systems greatly simplify dependency management, making it easy to add SolrJ to your project.
For projects built with Ant (using http://ant.apache.org/ivy/[Ivy]), place the following in your `ivy.xml`:
[source,xml,subs="verbatim,attributes"]
----
<dependency org="org.apache.solr" name="solr-solrj" rev="{solr-full-version}"/>
----
For projects built with Maven, place the following in your `pom.xml`:
[source,xml,subs="verbatim,attributes"]
----
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>{solr-full-version}</version>
</dependency>
----
For projects built with Gradle, place the following in your `build.gradle`:
[source,groovy,subs="attributes"]
----
compile group: 'org.apache.solr', name: 'solr-solrj', version: '{solr-full-version}'
----
If you want to use `CloudSolrClient` _and_ have it talk directly to ZooKeeper, you will need to add a dependency on the `solr-solrj-zookeeper` artifact.
If you want to build and use xref:query-guide:streaming-expressions.adoc[Streaming Expressions] classes in your Java code, you will need to add a dependency on the `solr-solrj-streaming` artifact.
=== Adding SolrJ to the Classpath Manually
If you are not using one of the above build system, it's still easy to add SolrJ to your build.
At build time, all that is required is the SolrJ jar itself: `solr-solrj-{solr-full-version}.jar`.
To compile code manually that uses SolrJ, use a `javac` command similar to:
[source,bash,subs="attributes"]
----
javac -cp .:$SOLR_TIP/server/solr-webapp/webapp/WEB-INF/lib/solr-solrj-{solr-full-version}.jar ...
----
At runtime, you need a few of SolrJ's dependencies, in addition to SolrJ itself.
In the Solr distribution these dependencies are not separated from Solr's dependencies, so you must include all or manually choose the exact set that is needed.
Please refer to the https://search.maven.org/artifact/org.apache.solr/solr-solrj/{solr-full-version}/jar[maven release] for the exact dependencies needed for your version.
Run your project with a classpath like:
[source,bash,subs="attributes"]
----
java -cp .:$SOLR_TIP/server/lib/ext:$SOLR_TIP/server/solr-webapp/webapp/WEB-INF/lib/* ...
----
If you are worried about the SolrJ libraries expanding the size of your client application, you can use a code obfuscator like http://proguard.sourceforge.net/[ProGuard] to remove APIs that you are not using.
== SolrJ Overview
For all its flexibility, SolrJ is built around a few simple interfaces.
All requests to Solr are sent by a {solr-javadocs}/solrj/org/apache/solr/client/solrj/SolrClient.html[`SolrClient`]. SolrClient's are the main workhorses at the core of SolrJ.
They handle the work of connecting to and communicating with Solr, and are where most of the user configuration happens.
Requests are sent in the form of {solr-javadocs}/solrj/org/apache/solr/client/solrj/SolrRequest.html[`SolrRequests`], and are returned as {solr-javadocs}/solrj/org/apache/solr/client/solrj/SolrResponse.html[`SolrResponses`].
=== Types of SolrClients
`SolrClient` has a few concrete implementations, each geared towards a different usage-pattern or resiliency model:
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/HttpSolrClient.html[`HttpSolrClient`] - geared towards query-centric workloads, though also a good general-purpose client.
Communicates directly with a single Solr node.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/Http2SolrClient.html[`Http2SolrClient`] - async, non-blocking and general-purpose client that leverage HTTP/2 using the Jetty Http library.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/HttpJdkSolrClient.html[`HttpJdkSolrClient`] - General-purpose client using the JDK's built-in Http Client. Supports both Http/2 and Http/1.1. Supports async. Targeted for those users wishing to minimize application dependencies.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/LBHttpSolrClient.html[`LBHttpSolrClient`] - balances request load across a list of Solr nodes.
Adjusts the list of "in-service" nodes based on node health.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/LBHttp2SolrClient.html[`LBHttp2SolrClient`] - just like `LBHttpSolrClient` but using `Http2SolrClient` instead, with the Jetty Http library.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html[`CloudSolrClient`] - geared towards communicating with SolrCloud deployments.
Uses already-recorded ZooKeeper state to discover and route requests to healthy Solr nodes.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html[`ConcurrentUpdateSolrClient`] - geared towards indexing-centric workloads.
Buffers documents internally before sending larger batches to Solr.
- {solr-javadocs}/solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateHttp2SolrClient.html[`ConcurrentUpdateHttp2SolrClient`] - just like `ConcurrentUpdateSolrClient` but using `Http2SolrClient` instead, with the Jetty Http library.
=== Common Configuration Options
Most SolrJ configuration happens at the `SolrClient` level.
The most common/important of these are discussed below.
For comprehensive information on how to tweak your `SolrClient`, see the Javadocs for the involved client, and its corresponding builder object.
==== Base URLs
Many `SolrClient` implementations require users to specify one or more Solr URLs, which the client then uses to send HTTP requests to Solr.
Unless otherwise specified, SolrJ expects these URLs to point to the root Solr path (i.e. "/solr").
A few notable exceptions to this are described below:
- *Http2SolrClient* - Users of `Http2SolrClient` may choose to skip providing a root URL to their client, in favor of specifing the URL on a request-by-request basis using `SolrRequest.setBasePath`.
`Http2SolrClient` will throw an `IllegalArgumentException` if neither the client nor the request specify a URL.
- *LBHttpSolrClient* and *LBHttp2SolrClient* - Solr's "load balancing" clients are frequently used to round-robin requests across a set of replicas or cores.
URLs are still expected to point to the Solr root (i.e. "/solr"), but to support this use-case the URLs are often supplemented by an additional parameter to specify the targeted core.
Alternatively, some "load balancing" methods make use of an `Endpoint` abstraction to provide this URL and core information in a more structured way.
- *CloudSolrClient* - Like many clients, CloudSolrClient accepts a series of URLs pointing to the Solr root path (i.e. "/solr").
+
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-cloudsolrclient-baseurl]
----
+
However, unlike other clients, these URLs aren't used to send user-provided requests, but instead serve to fetch information about the layout and health of the Solr cluster.
If Solr URLs are not known, users may instead specify a list of ZooKeeper hosts (with ports) and ZooKeeper root path, for the `CloudSolrClient` to fetch this information from ZooKeeper directly.
+
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-cloudsolrclient-zookeeperroot]
----
+
If no ZooKeeper root path is used then `java.util.Optional.empty()` should be provided.
+
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-cloudsolrclient-zookeepernoroot]
----
+
`CloudSolrClient` users who wish to provide ZooKeeper connection information, must depend on the `solr-solrj-zookeeper` artifact to have all of the necessary classes available.
The ZooKeeper based connection is the most reliable and performant means for CloudSolrClient to work. On the other hand, it means exposing ZooKeeper more broadly than to Solr nodes, which is a security risk. It also adds more JAR dependencies.
==== Default Collections
Most `SolrClient` methods allow users to specify the core or collection they wish to query, etc. as a `String` parameter.
However continually specifying this parameter can become tedious, especially for users who always work with the same collection.
Users can avoid this pattern by specifying a "default" collection when creating their client, using the `withDefaultCollection(String)` method available on the relevant `SolrClient` Builder object.
If specified on a Builder, the created `SolrClient` will use this default for making requests whenever a collection or core is needed (and no overriding value is specified).
==== Timeouts
All `SolrClient` implementations allow users to specify the connection and read timeouts for communicating with Solr.
These are provided at client creation time, as in the example below:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-solrclient-timeouts]
----
When these values are not explicitly provided, SolrJ falls back to using the defaults for the OS/environment is running on.
`ConcurrentUpdateSolrClient` and its counterpart `ConcurrentUpdateHttp2SolrClient` also implement a stall prevention
timeout that allows requests to non-responsive nodes to fail quicker than waiting for a socket timeout.
The default value of this timeout is set to 15000 ms and can be adjusted by a system property `solr.cloud.client.stallTime`.
This value should be smaller than `solr.jetty.http.idleTimeout` (Which is 120000 ms by default) and greater than the
processing time of the largest update request.
=== Cloud Request Routing
The SolrJ `CloudSolrClient` implementations (`CloudSolrClient` and `CloudHttp2SolrClient`) respect the xref:solrcloud-distributed-requests.adoc#shards-preference-parameter[shards.preference parameter].
Therefore requests sent to single-sharded collections, using either of the above clients, will route requests the same way that distributed requests are routed to individual shards.
If no `shards.preference` parameter is provided, the clients will default to sorting replicas randomly.
For update requests, while the replicas are sorted in the order defined by the request, leader replicas will always be sorted first.
== Querying in SolrJ
`SolrClient` has a number of `query()` methods for fetching results from Solr.
Each of these methods takes in a `SolrParams`,an object encapsulating arbitrary query-parameters.
And each method outputs a `QueryResponse`, a wrapper which can be used to access the result documents and other related metadata.
The following snippet uses a SolrClient to query Solr's "techproducts" example collection, and iterate over the results.
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-query-with-raw-solrparams]
----
`SolrParams` has a `SolrQuery` subclass, which provides some convenience methods that greatly simplifies query creation.
The following snippet shows how the query from the previous example can be built using some of the convenience methods in `SolrQuery`:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-query-with-solrquery]
----
== Indexing in SolrJ
Indexing is also simple using SolrJ.
Users build the documents they want to index as instances of `SolrInputDocument`, and provide them as arguments to one of the `add()` methods on `SolrClient`.
The following example shows how to use SolrJ to add a document to Solr's "techproducts" example collection:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-index-with-raw-solrinputdoc]
----
CAUTION: The indexing examples above are intended to show syntax.
For brevity, they break several Solr indexing best-practices.
Under normal circumstances, documents should be indexed in larger batches, instead of one at a time.
It is also suggested that Solr administrators commit documents using Solr's autocommit settings, and not using explicit `commit()` invocations.
== Java Object Binding
While the `UpdateResponse` and `QueryResponse` interfaces that SolrJ provides are useful, it is often more convenient to work with domain-specific objects that can more easily be understood by your application.
Thankfully, SolrJ supports this by implicitly converting documents to and from any class that has been specially marked with {solr-javadocs}/solrj/org/apache/solr/client/solrj//beans/Field.html[`Field`] annotations.
Each instance variable in a Java object can be mapped to a corresponding Solr field, using the `Field` annotation.
The Solr field shares the name of the annotated variable by default, however, this can be overridden by providing the annotation with an explicit field name.
The example snippet below shows an annotated `TechProduct` class that can be used to represent results from Solr's "techproducts" example collection.
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-techproduct-value-type]
----
Application code with access to the annotated `TechProduct` class above can index `TechProduct` objects directly without any conversion, as in the example snippet below:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-index-bean-value-type]
----
Similarly, search results can be converted directly into bean objects using the `getBeans()` method on `QueryResponse`:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-query-bean-value-type]
----
== Other APIs
SolrJ allows more than just querying and indexing.
It supports all of Solr's APIs.
Accessing Solr's other APIs is as easy as finding the appropriate request object, providing any necessary parameters, and passing it to the `request()` method of your `SolrClient`.
`request()` will return a `NamedList`: a generic object which mirrors the hierarchical structure of the JSON or XML returned by their request.
The example below shows how SolrJ users can call the CLUSTERSTATUS API of SolrCloud deployments, and manipulate the returned `NamedList`:
[source,java,indent=0]
----
include::example$UsingSolrJRefGuideExamplesTest.java[tag=solrj-other-apis]
----