blob: 19a1941b318e6f1065beb9dcd54559c817875e06 [file] [log] [blame]
= Introduction to Scaling and Distribution
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Both Lucene and Solr were designed to scale to support large implementations with minimal custom coding.
This section covers:
* <<distributed-search-with-index-sharding.adoc#,distributing>> an index across multiple servers
* <<index-replication.adoc#,replicating>> an index on multiple servers
* <<merging-indexes.adoc#,merging indexes>>
If you need full scale distribution of indexes and queries, as well as replication, load balancing and failover, you may want to use SolrCloud. Full details on configuring and using SolrCloud is available in the section <<solrcloud.adoc#,SolrCloud>>.
== What Problem Does Distribution Solve?
If searches are taking too long or the index is approaching the physical limitations of its machine, you should consider distributing the index across two or more Solr servers.
To distribute an index, you divide the index into partitions called shards, each of which runs on a separate machine. Solr then partitions searches into sub-searches, which run on the individual shards, reporting results collectively.
The architectural details underlying index sharding are invisible to end users, who simply experience faster performance on queries against very large indexes.
== What Problem Does Replication Solve?
Replicating an index is useful when:
* You have a large search volume which one machine cannot handle, so you need to distribute searches across multiple read-only copies of the index.
* There is a high volume/high rate of indexing which consumes machine resources and reduces search performance on the indexing machine, so you need to separate indexing and searching.
* You want to make a backup of the index (see <<making-and-restoring-backups.adoc#,Making and Restoring Backups>>).