solr/solr-ref-guide/src/putting-the-pieces-together.adoc - lucene-solr - Git at Google

 = Putting the Pieces Together
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 At the highest level, `schema.xml` is structured as follows.

 This example is not real XML, but it gives you an idea of the structure of the file.

 [source,xml]
 ----
 <schema>
   <types>
   <fields>
   <uniqueKey>
   <copyField>
 </schema>
 ----

 Obviously, most of the excitement is in `types` and `fields`, where the field types and the actual field definitions live.

 These are supplemented by `copyFields`.

 The `uniqueKey` must always be defined.

 .Types and fields are optional tags
 [NOTE]
 ====
 Note that the `types` and `fields` sections are optional, meaning you are free to mix `field`, `dynamicField`, `copyField` and `fieldType` definitions on the top level. This allows for a more logical grouping of related tags in your schema.
 ====

 == Choosing Appropriate Numeric Types

 For general numeric needs, consider using one of the `IntPointField`, `LongPointField`, `FloatPointField`, or `DoublePointField` classes, depending on the specific values you expect. These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used. Enable <<docvalues.adoc#,DocValues>> on these fields as needed for sorting and/or faceting.

 Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent `TrieIntField`, `TrieLongField`, `TrieFloatField`, and `TrieDoubleField` classes. These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary. Configure a `precisionStep="0"` if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default `precisionStep` (by not specifying it) or specify it as `precisionStep="8"` (which is the default). This offers faster speed for range queries at the expense of increasing index size.

 == Working With Text

 Handling text properly will make your users happy by providing them with the best possible results for text searches.

 One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use `copyField` to take a variety of fields and funnel them all into a single text field for keyword searches.

 In the `schema.xml` file for the "```techproducts```" example included with Solr, `copyField` declarations are used to dump the contents of `cat`, `name`, `manu`, `features`, and `includes` into a single field, `text`. In addition, it could be a good idea to copy `ID` into `text` in case users wanted to search for a particular product by passing its product number to a keyword search.

 Another technique is using `copyField` to use the same field in different ways. Suppose you have a field that is a list of authors, like this:

 `Schildt, Herbert; Wolpert, Lewis; Davies, P.`

 For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:

 `schildt / herbert / wolpert / lewis / davies / p`

 For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:

 `schildt herbert wolpert lewis davies p`

 Finally, for faceting, use the primary author only via a `StrField`:

 `Schildt, Herbert`
	= Putting the Pieces Together
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	At the highest level, `schema.xml` is structured as follows.

	This example is not real XML, but it gives you an idea of the structure of the file.

	[source,xml]
	----
	<schema>
	<types>
	<fields>
	<uniqueKey>
	<copyField>
	</schema>
	----

	Obviously, most of the excitement is in `types` and `fields`, where the field types and the actual field definitions live.

	These are supplemented by `copyFields`.

	The `uniqueKey` must always be defined.

	.Types and fields are optional tags
	[NOTE]
	====
	Note that the `types` and `fields` sections are optional, meaning you are free to mix `field`, `dynamicField`, `copyField` and `fieldType` definitions on the top level. This allows for a more logical grouping of related tags in your schema.
	====

	== Choosing Appropriate Numeric Types

	For general numeric needs, consider using one of the `IntPointField`, `LongPointField`, `FloatPointField`, or `DoublePointField` classes, depending on the specific values you expect. These "Dimensional Point" based numeric classes use specially encoded data structures to support efficient range queries regardless of the size of the ranges used. Enable <<docvalues.adoc#,DocValues>> on these fields as needed for sorting and/or faceting.

	Some Solr features may not yet work with "Dimensional Points", in which case you may want to consider the equivalent `TrieIntField`, `TrieLongField`, `TrieFloatField`, and `TrieDoubleField` classes. These field types are deprecated and are likely to be removed in a future major Solr release, but they can still be used if necessary. Configure a `precisionStep="0"` if you wish to minimize index size, but if you expect users to make frequent range queries on numeric types, use the default `precisionStep` (by not specifying it) or specify it as `precisionStep="8"` (which is the default). This offers faster speed for range queries at the expense of increasing index size.

	== Working With Text

	Handling text properly will make your users happy by providing them with the best possible results for text searches.

	One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use `copyField` to take a variety of fields and funnel them all into a single text field for keyword searches.

	In the `schema.xml` file for the "```techproducts```" example included with Solr, `copyField` declarations are used to dump the contents of `cat`, `name`, `manu`, `features`, and `includes` into a single field, `text`. In addition, it could be a good idea to copy `ID` into `text` in case users wanted to search for a particular product by passing its product number to a keyword search.

	Another technique is using `copyField` to use the same field in different ways. Suppose you have a field that is a list of authors, like this:

	`Schildt, Herbert; Wolpert, Lewis; Davies, P.`

	For searching by author, you could tokenize the field, convert to lower case, and strip out punctuation:

	`schildt / herbert / wolpert / lewis / davies / p`

	For sorting, just use an untokenized field, converted to lower case, with punctuation stripped:

	`schildt herbert wolpert lewis davies p`

	Finally, for faceting, use the primary author only via a `StrField`:

	`Schildt, Herbert`