blob: 19359084cc49a5af1b23266aa7a46ef14c9215bf [file] [log] [blame]
= How SolrCloud Works
:page-children: shards-and-indexing-data-in-solrcloud, distributed-requests, aliases
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
The following sections cover provide general information about how various SolrCloud features work. To understand these features, it's important to first understand a few key concepts that relate to SolrCloud.
* <<shards-and-indexing-data-in-solrcloud.adoc#,Shards and Indexing Data in SolrCloud>>
* <<distributed-requests.adoc#,Distributed Requests>>
* <<aliases.adoc#,Standard and Routed Aliases>>
If you are already familiar with SolrCloud concepts and basic functionality, you can skip to the section covering <<solrcloud-configuration-and-parameters.adoc#,SolrCloud Configuration and Parameters>>.
== Key SolrCloud Concepts
A SolrCloud cluster consists of some "logical" concepts layered on top of some "physical" concepts.
=== Logical Concepts
* A Cluster can host multiple Collections of Solr Documents.
* A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
* The number of Shards that a Collection has determines:
** The theoretical limit to the number of Documents that Collection can reasonably contain.
** The amount of parallelization that is possible for an individual search request.
=== Physical Concepts
* A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
* Each Node can host multiple Cores.
* Each Core in a Cluster is a physical Replica for a logical Shard.
* Every Replica uses the same configuration specified for the Collection that it is a part of.
* The number of Replicas that each Shard has determines:
** The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable.
** The theoretical limit in the number concurrent search requests that can be processed under heavy load.
WARNING: Make sure the DNS resolution in your cluster is stable, i.e.,
for each live host belonging to a Cluster the host name always corresponds to the
same specific IP and physical node. For example, in clusters deployed on AWS this would
require setting `preserve_hostname: true` in `/etc/cloud/cloud.cfg`. Changing DNS resolution
of live nodes may lead to unexpected errors. See SOLR-13159 for more details.