blob: 8a81efc113c1dfabe7f8c5e113b213fcf8f1cce6 [file] [log] [blame]
= Major Changes in Solr 7
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Solr 7 is a major new release of Solr which introduces new features and a number of other changes that may impact your existing installation.
== Upgrade Planning
There are major changes in Solr 7 to consider before starting to migrate your configurations and indexes. This page is designed to highlight the biggest changes - new features you may want to be aware of, but also changes in default behavior and deprecated features that have been removed.
There are many hundreds of changes in Solr 7, however, so a thorough review of the <<solr-upgrade-notes.adoc#,Solr Upgrade Notes>> as well as the {solr-javadocs}/changes/Changes.html[CHANGES.txt] file in your Solr instance will help you plan your migration to Solr 7. This section attempts to highlight some of the major changes you should be aware of.
You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 6.2, you should review changes made in all subsequent 6.x releases in addition to changes for 7.0.
<<reindexing.adoc#upgrades,Reindexing>> your data is considered the best practice and you should try to do so if possible. However, if reindexing is not feasible, keep in mind you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.
If you do not reindex now, keep in mind that you will need to either reindex your data or upgrade your indexes before you will be able to move to Solr 8 when it is released in the future. See the section <<indexupgrader-tool.adoc#,IndexUpgrader Tool>> for more details on how to upgrade your indexes.
See also the section <<upgrading-a-solr-cluster.adoc#,Upgrading a Solr Cluster>> for details on how to upgrade a SolrCloud cluster.
== New Features & Enhancements
=== Replication Modes
Until Solr 7, the SolrCloud model for replicas has been to allow any replica to become a leader when a leader is lost. This is highly effective for most users, providing reliable failover in case of issues in the cluster. However, it comes at a cost in large clusters because all replicas must be in sync at all times.
To provide additional flexibility, two new types of replicas have been added, named TLOG & PULL. These new types provide options to have replicas which only sync with the leader by copying index segments from the leader. The TLOG type has an additional benefit of maintaining a transaction log (the "tlog" of its name), which would allow it to recover and become a leader if necessary; the PULL type does not maintain a transaction log, so cannot become a leader.
As part of this change, the traditional type of replica is now named NRT. If you do not explicitly define a number of TLOG or PULL replicas, Solr defaults to creating NRT replicas. If this model is working for you, you will not have to change anything.
See the section <<shards-and-indexing-data-in-solrcloud.adoc#types-of-replicas,Types of Replicas>> for more details on the new replica modes, and how define the replica type in your cluster.
=== Autoscaling
Solr autoscaling is a new suite of features in Solr to make managing a SolrCloud cluster easier and more automated.
At its core, Solr autoscaling provides users with a rule syntax to define preferences and policies for how to distribute nodes and shards in a cluster, with the goal of maintaining a balance in the cluster. As of Solr 7, Solr will take any policy or preference rules into account when determining where to place new shards and replicas created or moved with various Collections API commands.
See the section <<solrcloud-autoscaling.adoc#,SolrCloud Autoscaling>> for details on the options available in 7.0. Expect more features to be released in subsequent 7.x releases in this area.
=== Other Features & Enhancements
// TODO 7.1 - update link to docs when complete
* The Analytics Component has been refactored.
** The documentation for this component is in progress; until it is available, please refer to https://issues.apache.org/jira/browse/SOLR-11144[SOLR-11144] for more details.
* There were several other new features released in earlier 6.x releases, which you may have missed:
** <<learning-to-rank.adoc#,Learning to Rank>>
** <<highlighting.adoc#the-unified-highlighter,Unified Highlighter>>
** <<metrics-reporting.adoc#,Metrics API>>. See also information about related deprecations in the section <<JMX Support and MBeans>> below.
** <<other-parsers.adoc#payload-query-parsers,Payload queries>>
** <<stream-evaluator-reference.adoc#,Streaming Evaluators>>
** <<v2-api.adoc#,/v2 API>>
** <<graph-traversal.adoc#,Graph streaming expressions>>
== Configuration and Default Changes
=== New Default Configset
Several changes have been made to configsets that ship with Solr; not only their content but how Solr behaves in regard to them:
* The `data_driven_configset` and `basic_configset` have been removed, and replaced by the `_default` configset. The `sample_techproducts_configset` also remains, and is designed for use with the example documents shipped with Solr in the `example/exampledocs` directory.
* When creating a new collection, if you do not specify a configset, the `_default` will be used.
** If you use SolrCloud, the `_default` configset will be automatically uploaded to ZooKeeper.
** If you use standalone mode, the instanceDir will be created automatically, using the `_default` configset as its basis.
=== Schemaless Improvements
To improve the functionality of Schemaless Mode, Solr now behaves differently when it detects that data in an incoming field should have a text-based field type.
* Incoming fields will be indexed as `text_general` by default (you can change this). The name of the field will be the same as the field name defined in the document.
* A copy field rule will be inserted into your schema to copy the new `text_general` field to a new field with the name `<name>_str`. This field's type will be a `strings` field (to allow for multiple values). The first 256 characters of the text field will be inserted to the new `strings` field.
This behavior can be customized if you wish to remove the copy field rule, or to change the number of characters inserted to the string field, or the field type used. See the section <<schemaless-mode.adoc#,Schemaless Mode>> for details.
TIP: Because copy field rules can slow indexing and increase index size, it's recommended you only use copy fields when you need to. If you do not need to sort or facet on a field, you should remove the automatically-generated copy field rule.
Automatic field creation can be disabled with the `update.autoCreateFields` property. To do this, you can use the Config API with a command such as:
[.dynamic-tabs]
--
[example.tab-pane#v1setprop]
====
[.tab-label]*V1 API*
[source,bash]
----
curl http://host:8983/solr/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
----
====
[example.tab-pane#v2setprop]
====
[.tab-label]*V2 API*
[source,bash]
----
curl http://host:8983/api/collections/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
----
====
--
=== Changes to Default Behaviors
* JSON is now the default response format. If you rely on XML responses, you must now define `wt=xml` in your request. In addition, line indentation is enabled by default (`indent=on`).
* The `sow` parameter (short for "Split on Whitespace") now defaults to `false`, which allows support for multi-word synonyms out of the box. This parameter is used with the eDismax and standard/"lucene" query parsers. If this parameter is not explicitly specified as `true`, query text will not be split on whitespace before analysis.
* The `legacyCloud` parameter now defaults to `false`. If an entry for a replica does not exist in `state.json`, that replica will not get registered.
+
This may affect users who bring up replicas and they are automatically registered as a part of a shard. It is possible to fall back to the old behavior by setting the property `legacyCloud=true`, in the cluster properties using the following command:
+
`./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd clusterprop -name legacyCloud -val true`
* The eDismax query parser parameter `lowercaseOperators` now defaults to `false` if the `luceneMatchVersion` in `solrconfig.xml` is 7.0.0 or above. Behavior for `luceneMatchVersion` lower than 7.0.0 is unchanged (so, `true`). This means that clients must sent boolean operators (such as AND, OR and NOT) in upper case in order to be recognized, or you must explicitly set this parameter to `true`.
* The `handleSelect` parameter in `solrconfig.xml` now defaults to `false` if the `luceneMatchVersion` is 7.0.0 or above. This causes Solr to ignore the `qt` parameter if it is present in a request. If you have request handlers without a leading '/', you can set `handleSelect="true"` or consider migrating your configuration.
+
The `qt` parameter is still used as a SolrJ special parameter that specifies the request handler (tail URL path) to use.
* The `lucenePlusSort` query parser (aka the "Old Lucene Query Parser") has been deprecated and is no longer implicitly defined. If you wish to continue using this parser until Solr 8 (when it will be removed), you must register it in your `solrconfig.xml`, as in: `<queryParser name="lucenePlusSort" class="solr.OldLuceneQParserPlugin"/>`.
* The name of `TemplateUpdateRequestProcessorFactory` is changed to `template` from `Template` and the name of `AtomicUpdateProcessorFactory` is changed to `atomic` from `Atomic`
** Also, `TemplateUpdateRequestProcessorFactory` now uses `{}` instead of `${}` for `template`.
== Deprecations and Removed Features
=== Point Fields Are Default Numeric Types
Solr has implemented \*PointField types across the board, to replace Trie* based numeric fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8.
If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to reindex your data.
=== Spatial Fields
The following spatial-related fields have been deprecated:
* `LatLonType`
* `GeoHashField`
* `SpatialVectorFieldType`
* `SpatialTermQueryPrefixTreeFieldType`
Choose one of these field types instead:
* `LatLonPointSpatialField`
* `SpatialRecursivePrefixTreeField`
* `RptWithGeometrySpatialField`
See the section <<spatial-search.adoc#,Spatial Search>> for more information.
=== JMX Support and MBeans
* The `<jmx>` element in `solrconfig.xml` has been removed in favor of `<metrics><reporter>` elements defined in `solr.xml`.
+
Limited back-compatibility is offered by automatically adding a default instance of `SolrJmxReporter` if it's missing AND when a local MBean server is found. A local MBean server can be activated either via `ENABLE_REMOTE_JMX_OPTS` in `solr.in.sh` or via system properties, e.g., `-Dcom.sun.management.jmxremote`. This default instance exports all Solr metrics from all registries as hierarchical MBeans.
+
This behavior can be also disabled by specifying a `SolrJmxReporter` configuration with a boolean init argument `enabled` set to `false`. For a more fine-grained control users should explicitly specify at least one `SolrJmxReporter` configuration.
+
See also the section <<metrics-reporting.adoc#the-metrics-reporters-element,The <metrics><reporters> Element>>, which describes how to set up Metrics Reporters in `solr.xml`. Note that back-compatibility support may be removed in Solr 8.
* MBean names and attributes now follow the hierarchical names used in metrics. This is reflected also in `/admin/mbeans` and `/admin/plugins` output, and can be observed in the UI Plugins tab, because now all these APIs get their data from the metrics API. The old (mostly flat) JMX view has been removed.
=== SolrJ
The following changes were made in SolrJ.
* `HttpClientInterceptorPlugin` is now `HttpClientBuilderPlugin` and must work with a `SolrHttpClientBuilder` rather than an `HttpClientConfigurer`.
* `HttpClientUtil` now allows configuring `HttpClient` instances via `SolrHttpClientBuilder` rather than an `HttpClientConfigurer`. Use of env variable `SOLR_AUTHENTICATION_CLIENT_CONFIGURER` no longer works, please use `SOLR_AUTHENTICATION_CLIENT_BUILDER`
* `SolrClient` implementations now use their own internal configuration for socket timeouts, connect timeouts, and allowing redirects rather than what is set as the default when building the `HttpClient` instance. Use the appropriate setters on the `SolrClient` instance.
* `HttpSolrClient#setAllowCompression` has been removed and compression must be enabled as a constructor parameter.
* `HttpSolrClient#setDefaultMaxConnectionsPerHost` and `HttpSolrClient#setMaxTotalConnections` have been removed. These now default very high and can only be changed via parameter when creating an HttpClient instance.
=== Other Deprecations and Removals
* The `defaultOperator` parameter in the schema is no longer supported. Use the `q.op` parameter instead. This option had been deprecated for several releases. See the section <<the-standard-query-parser.adoc#standard-query-parser-parameters,Standard Query Parser Parameters>> for more information.
* The `defaultSearchField` parameter in the schema is no longer supported. Use the `df` parameter instead. This option had been deprecated for several releases. See the section <<the-standard-query-parser.adoc#standard-query-parser-parameters,Standard Query Parser Parameters>> for more information.
* The `mergePolicy`, `mergeFactor` and `maxMergeDocs` parameters have been removed and are no longer supported. You should define a `mergePolicyFactory` instead. See the section <<indexconfig-in-solrconfig.adoc#mergepolicyfactory,the mergePolicyFactory>> for more information.
* The PostingsSolrHighlighter has been deprecated. It's recommended that you move to using the UnifiedHighlighter instead. See the section <<highlighting.adoc#the-unified-highlighter,Unified Highlighter>> for more information about this highlighter.
* Index-time boosts have been removed from Lucene, and are no longer available from Solr. If any boosts are provided, they will be ignored by the indexing chain. As a replacement, index-time scoring factors should be indexed in a separate field and combined with the query score using a function query. See the section <<function-queries.adoc#,Function Queries>> for more information.
* The `StandardRequestHandler` is deprecated. Use `SearchHandler` instead.
* To improve parameter consistency in the Collections API, the parameter names `fromNode` for the MOVEREPLICA command and `source`, `target` for the REPLACENODE command have been deprecated and replaced with `sourceNode` and `targetNode` instead. The old names will continue to work for back-compatibility but they will be removed in Solr 8.
* The unused `valType` option has been removed from ExternalFileField, if you have this in your schema you can safely remove it.
== Major Changes in Earlier 6.x Versions
The following summary of changes in earlier 6.x releases highlights significant changes released between Solr 6.0 and 6.6 that were listed in earlier versions of this Guide. Mentions of deprecations are likely superseded by removal in Solr 7, as noted in the above sections.
Note again that this is not a complete list of all changes that may impact your installation, so a thorough review of CHANGES.txt is highly recommended if upgrading from any version earlier than 6.6.
* The Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
* JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any `mincount` greater than 1.
* If you use historical dates, specifically on or before the year 1582, you should reindex for better date handling.
* If you use the JSON Facet API (json.facet) with `method=stream`, you must now set `sort='index asc'` to get the streaming behavior; otherwise it won't stream. Reminder: `method` is a hint that doesn't change defaults of other parameters.
* If you use the JSON Facet API (json.facet) to facet on a numeric field and if you use `mincount=0` or if you set the prefix, you will now get an error as these options are incompatible with numeric faceting.
* Solr's logging verbosity at the INFO level has been greatly reduced, and you may need to update the log configs to use the DEBUG level to see all the logging messages you used to see at INFO level before.
* We are no longer backing up `solr.log` and `solr_gc.log` files in date-stamped copies forever. If you relied on the `solr_log_<date>` or `solr_gc_log_<date>` being in the logs folder that will no longer be the case. See the section <<configuring-logging.adoc#,Configuring Logging>> for details on how log rotation works as of Solr 6.3.
* The create/deleteCollection methods on `MiniSolrCloudCluster` have been deprecated. Clients should instead use the `CollectionAdminRequest` API. In addition, `MiniSolrCloudCluster#uploadConfigDir(File, String)` has been deprecated in favour of `#uploadConfigSet(Path, String)`.
* The `bin/solr.in.sh` (`bin/solr.in.cmd` on Windows) is now completely commented by default. Previously, this wasn't so, which had the effect of masking existing environment variables.
* The `\_version_` field is no longer indexed and is now defined with `indexed=false` by default, because the field has DocValues enabled.
* The `/export` handler has been changed so it no longer returns zero (0) for numeric fields that are not in the original document. One consequence of this change is that you must be aware that some tuples will not have values if there were none in the original document.
* Metrics-related classes in `org.apache.solr.util.stats` have been removed in favor of the http://metrics.dropwizard.io/3.1.0/[Dropwizard metrics library]. Any custom plugins using these classes should be changed to use the equivalent classes from the metrics library. As part of this, the following changes were made to the output of Overseer Status API:
** The "totalTime" metric has been removed because it is no longer supported.
** The metrics "75thPctlRequestTime", "95thPctlRequestTime", "99thPctlRequestTime" and "999thPctlRequestTime" in Overseer Status API have been renamed to "75thPcRequestTime", "95thPcRequestTime" and so on for consistency with stats output in other parts of Solr.
** The metrics "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" have been replaced by corresponding per-second rates viz. "avgRequestsPerSecond", "5minRateRequestsPerSecond" and "15minRateRequestsPerSecond" for consistency with stats output in other parts of Solr.
* A new highlighter named UnifiedHighlighter has been added. You are encouraged to try out the UnifiedHighlighter by setting `hl.method=unified` and report feedback. It's more efficient/faster than the other highlighters, especially compared to the original Highlighter. See `HighlightParams.java` for a listing of highlight parameters annotated with which highlighters use them. `hl.useFastVectorHighlighter` is now considered deprecated in lieu of `hl.method=fastVector`.
* The <<query-settings-in-solrconfig.adoc#,`maxWarmingSearchers` parameter>> now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existing `maxWarmingSearchers` entries from their `solrconfig.xml` files.
* The <<other-parsers.adoc#complex-phrase-query-parser,Complex Phrase query parser>> now supports leading wildcards. Beware of its possible heaviness, users are encouraged to use ReversedWildcardFilter in index time analysis.
* The JMX metric "avgTimePerRequest" (and the corresponding metric in the metrics API for each handler) used to be a simple non-decaying average based on total cumulative time and the number of requests. The Codahale Metrics implementation applies exponential decay to this value, which heavily biases the average towards the last 5 minutes.
* Parallel SQL now uses Apache Calcite as its SQL framework. As part of this change the default aggregation mode has been changed to `facet` rather than `map_reduce`. There have also been changes to the SQL aggregate response and some SQL syntax changes. Consult the <<parallel-sql-interface.adoc#,Parallel SQL Interface>> documentation for full details.