blob: d2e3a562b6158d4a57e8c6cbffe77d9f7bca860d [file] [log] [blame]
---
active_crumb: Data Model
layout: documentation
id: data_model
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Model Overview</h2>
<p>
Data model is a central concept in NLPCraft defining interface to your data sources
like a database or a SaaS application.
NLPCraft employs <em>model-as-a-code</em> approach where entire data model is an implementation of
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface which
can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy.
</p>
<p>
A data model defines:
</p>
<ul>
<li>Set of model <a href="#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li>
<li>Zero or more intent callbacks.</li>
<li>Common model configuration and various life-cycle callbacks.</li>
</ul>
<p>
Note that model-as-a-code approach natively supports any software life
cycle tools and frameworks like various build tools, CI/SCM tools, IDEs, etc.
You don't have to use additional web-based tools to manage some aspects of your
data models - your entire model and all of its components are part of your project source code.
</p>
</section>
<section id="dataflow">
<h2 class="section-title">Model Dataflow</h2>
<figure>
<img alt="data model dataflow" class="img-fluid" src="/images/homepage-fig1.1.png">
<figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption>
</figure>
<p>
User request starts with the user application (like a chatbot or NLI-based system) making a
REST call using <a href="/using-rest.html">NLPCraft REST API</a>. That REST call carries among
other things the input text and data model ID, and it arrives first to the REST server.
</p>
<p>
Upon receiving the user request, the REST server performs NLP pre-processing converting the input
text into a sequence of tokens and enriching them with additional information.
</p>
<p>
Once finished, the encrypted sequence of tokens is sent further down to the probe where the requested data model
is deployed.
</p>
<p>
Upon receiving that sequence of tokens, the data probe further
enriches it based on the user data model and matches it against declared intents. When a matching
intent is found its callback method is called and its result travels back from the data probe to the
REST server and eventually to the user that made the REST call.
</p>
<div class="bq info">
<p>
<b>Security <span class="amp">&</span> Isolation</b>
</p>
<p>
Note that in this architecture the user-defined data model is fully isolated from the REST server accepting
user calls. Users never access data probes and hence user data models directly. Typically REST server
should be deployed in DMZ and only ingress connectivity is necessary between REST server and the data probes.
</p>
</div>
</section>
<section id="lifecycle">
<h2 class="section-title">Model Lifecycle</h2>
<p>
Data model is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface.
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface has
defaults for most of its methods. These are the only methods that need to be implemented by its sub-class:
</p>
<ul>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId--">getId()</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName--">getName()</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion--">getVersion()</a></li>
</ul>
<p>
You can either implement <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
interface directly or use one of the adapters (recommended in most cases):
</p>
<ul>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelAdapter.html">NCModelAdapter</a> - when
entire model definition is in sub-class source code.
</li>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> - when
using external JSON/YAML declaration for model definition.
</li>
</ul>
<p>
Note that you can also use 3rd party IoC frameworks like <a target=_ href="https://spring.io">Spring</a> to construct your data models. See
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFactory.html">NCModelFactory</a> for more information.
</p>
<div class="bq success">
<p>
<b>Using Adapters</b>
</p>
<p>
It is recommended to use one of the adapter classes when defining your
own data model in the most uses cases.
</p>
</div>
<h3 class="section-title">Deployment</h3>
<p>
Data models get <a href="/server-and-probe.html">deployed</a> to and hosted by the data probes - a lightweight
container whose job is to host data models and securely transfer requests between REST server and the data
models. When a data probe starts it reads its <a href="/server-and-probe.html">configuration</a>
to see which models to deploy.
</p>
<p>
Note that data probes don't support hot-redeployment. To redeploy the data model you need to restart
the data probe. Note also that data probe can be started in embedded mode, i.e. it can be started
from within an existing JVM process like user application.
</p>
<h3 class="section-title">Callbacks</h3>
<p>
There are two callbacks on
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface
(by way of extending <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html">NCLifecycle</a> interface) that you can optionally override to affect the the default lifecycle behavior:
</p>
<ul>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onInit--">onInit()</a> - called
right after the model was loaded and deployed.
</li>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onDiscard--">onDiscard()</a> - called to
discard the data model when and only when data probe is orderly shutting down.
</li>
</ul>
<p>
Note that there are also several callbacks that you can override to affect model behavior
to perform logging, debugging, statistic or usage collection, explicit update or initialization of
conversation context, security audit or validation:
</p>
<ul>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onParsedVariant-org.apache.nlpcraft.model.NCVariant-">onParsedVariant(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext-org.apache.nlpcraft.model.NCContext-">onContext(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent-org.apache.nlpcraft.model.NCIntentMatch-">onMatchedIntent(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onResult-org.apache.nlpcraft.model.NCIntentMatch-org.apache.nlpcraft.model.NCResult-">onResult(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onError-org.apache.nlpcraft.model.NCContext-java.lang.Throwable-">onError(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onRejection-org.apache.nlpcraft.model.NCIntentMatch-org.apache.nlpcraft.model.NCRejection-">onRejection(...)</a>
</li>
</ul>
<div class="bq info">
<b>Conversation Reset</b>
<p>
Callbacks
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext-org.apache.nlpcraft.model.NCContext-">onContext(...)</a> and
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent-org.apache.nlpcraft.model.NCIntentMatch-">onMatchedIntent(...)</a>
are especially handy to perform a soft reset on the conversation context. Read their Javadoc documentation
to understand these callbacks protocol.
</p>
</div>
<div class="bq info">
<b>Lifecycle Components</b>
<p>
Note that both the server and the probe provide their own lifecycle components support. When registered in
the probe or server configuration the lifecycle components will be called
during various stages of the probe or server startup or shutdown procedures. These callbacks can be used
to control lifecycle of external libraries and systems that the data probe or the server rely on, i.e.
<a href="metrics-and-tracing.html">OpenCensus exporters</a>,
security environment, devops hooks, etc.
</p>
<p>
See server and probe <a href="">configuration</a> as well as <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCProbeLifecycle.html">NCProbeLifecycle</a>
interface for more details.
</p>
</div>
</section>
<section id="config">
<h2 class="section-title">Model Configuration</h2>
<p>
Apart from mandatory model <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId--">ID</a>,
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName--">name</a> and
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion--">version</a>
there is a number of static model configurations that you can set. All of these properties have sensible
defaults that you can override, when required, in either sub-classes or via external JSON/YAML declaration:
</p>
<ul>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getAdditionalStopWords--">getAdditionalStopWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens--">getEnabledBuiltInTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExamples--">getExamples</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExcludedStopWords--">getExcludedStopWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getJiggleFactor--">getJiggleFactor</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxFreeWords--">getMaxFreeWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxSuspiciousWords--">getMaxSuspiciousWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTokens--">getMaxTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTotalSynonyms--">getMaxTotalSynonyms</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxUnknownWords--">getMaxUnknownWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxWords--">getMaxWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMetadata--">getMetadata</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinNonStopwords--">getMinNonStopwords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinTokens--">getMinTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinWords--">getMinWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getSuspiciousWords--">getSuspiciousWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isDupSynonymsAllowed--">isDupSynonymsAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed--">isNonEnglishAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoNounsAllowed--">isNoNounsAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNotLatinCharsetAllowed--">isNotLatinCharsetAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoUserTokensAllowed--">isNoUserTokensAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isPermutateSynonyms--">isPermutateSynonyms</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed--">isSwearWordsAllowed</a></li>
</ul>
<h3 class="section-title">External JSON/YAML Declaration</h3>
<p>
You can move out all the static model configuration into an external JSON or YAML file. To load that
configuration you need to use <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
adapter when creating your data model. Here are JSON and YAML templates and you can find more details in
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Javadoc and in
<a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/src/main/scala/org/apache/nlpcraft/examples">examples</a>.
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#model-json" role="tab" aria-controls="nav-home" aria-selected="true">JSON</a>
<a class="nav-item nav-link" data-toggle="tab" href="#model-yaml" role="tab" aria-controls="nav-home" aria-selected="true">YAML</a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="model-json" role="tabpanel">
<pre class="brush: js">
{
"id": "user.defined.id",
"name": "User Defined Name",
"version": "1.0",
"description": "Short model description.",
"enabledBuiltInTokens": ["google:person", "google:location"]
"examples": [],
"macros": [],
"metadata": {},
"elements": [
{
"id": "x:id",
"description": "",
"groups": [],
"parentId": "",
"synonyms": [],
"metadata": {},
"values": []
}
],
...
"intents": []
}
</pre>
</div>
<div class="tab-pane fade show" id="model-yaml" role="tabpanel">
<pre class="brush: js">
id: "user.defined.id"
name: "User Defined Name"
version: "1.0"
description: "Short model description."
examples:
macros:
enabledBuiltInTokens:
elements:
- id: "x:id"
description: ""
synonyms:
groups:
values:
parentId:
metadata:
...
intents:
</pre>
</div>
</div>
<div class="bq success">
<p>
Note that using JSON/YAML-based configuration is a <b>canonical way</b> for
creating data models in NLPCraft as it allows to cleanly separate static configuration from model's
programmable logic.
</p>
</div>
</section>
<section id="elements">
<h2 class="section-title">Model Elements</h2>
<p>
Data model element defines a semantic entity that will be detected in the user input.
A model element typically is one or more individual words that have a consistent semantic meaning and typically denote a
real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
object can be abstract or have a physical existence.
</p>
<p>
Model element is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a>
interface. <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> provides
its elements via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getElements--">getElements()</a> method.
Typically, you create model elements by either:
</p>
<ul>
<li>
Implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> interface directly, or
</li>
<li>
<U></U>sing JSON or YAML static model configuration (the preferred way in most cases).
</li>
</ul>
<p>
Note that when you use external static model configuration with JSON or YAML you can still modify it after it was loaded
using <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
adapter. It is particular convenient when synonyms or values are loaded separately from, or in
addition to, the model elements themselves, i.e. from a database or another file.
</p>
<div class="bq info">
<p>
<b>Model Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b>
</p>
<p>
Terms 'model element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably:
</p>
<dl>
<dt>Model Element</dt>
<dd>
Denotes a named entity <em>declared</em> in NLPCraft model.
</dd>
<dt>Token</dt>
<dd>
Denotes a named entity that was <em>detected</em> by NLPCraft in the user input.
</dd>
<dt>Named Entity</dt>
<dd>
Denotes a classic term, i.e. one or more individual words that have a
consistent semantic meaning and typically define a real-world object.
</dd>
</dl>
</div>
<p>
Although model element and named entity describe a similar concept, the NLPCraft model
elements provide a much more powerful instrument. Unlike named entities support in other projects
NLPCraft model elements have number of unique capabilities:
</p>
<ul>
<li>
New model elements can be added declaratively via token DSL, regex and macro expansion.
</li>
<li>
New model elements can be also added programmatically for ultimate flexibility.
</li>
<li>
Model elements can have many-to-many group memberships.
</li>
<li>
Model elements can form a hierarchical structure.
</li>
<li>
Model elements are composable, i.e. a model element can use other model elements in its definition.
</li>
<li>
Model elements can be declared with user defined metadata.
</li>
<li>
Model elements provide normalized values and can define their own "proper nouns".
</li>
<li>
Model elements can compose named entities from many <a href="integrations.html#nlp">3rd party libraries</a>.
</li>
<li>
All properties of model elements (id, groups, parent & ancestors, values, and metadata) can be used in token and intent DSLs.
</li>
</ul>
<h3 class="section-title">User vs. Built-In Elements</h3>
<p>
Additionally to the model elements that are defined by the user in the data model (i.e. <em>user model elements</em>)
NLPCraft provides <a href="#builtin">its own named entities</a> as well as the integration with number of <a href="integrations.html#nlp">3rd party projects</a>. You can think of these built-in elements as if they were implicitly defined in your model - you
can use them in exactly the same way as if you defined them yourself.
You can find more information on how to configure external token providers
in <a href="/integrations.html#nlp">Integrations</a> section.
</p>
<p>
Note that you can't directly change group membership, parent-child relationship or metadata of the
built-in elements. You can, however, "wrap" built-in entity into your own one using <code>^^id == 'external.id'^^</code>
token DSL expression where you can define all necessary additional configuration properties (more on that below).
</p>
<span id="synonyms" class="section-sub-title">Synonyms</span>
<p>
NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that
would require pre-existing marked up data sets and extensive training. For each model element you can either provide a
set of synonyms to match on or specify a piece of code that would be responsible for detecting that named
entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its
implicit synonym so that even if no additional synonyms are defined at least one synonym always exists. Note
also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both
a synonym and user input.
</p>
<p>
Here's an example of a simple model element definition in JSON:
</p>
<pre class="brush: js, highlight: [6,7,8,9,10,11,12]">
...
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"truck",
"light duty truck"
"heavy duty truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<p>
During synonym matching NLPCraft uses <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getJiggleFactor--">jiggle factor</a> to rearrange (or "jiggle")
the individual words in the user input in attempt to match a given synonym. Jiggle factor is a measure of
how much sparsity is allowed when user input words are reordered in attempt to match the multi-word
synonyms. Zero means no reordering is allowed. One means that a word can move only one
position left or right, and so on. Empirically the value of 2 proved to be a good default value in
most cases. Note that larger values mean that synonym words can be almost in any random place in the user
input which makes synonym matching less meaningful.
</p>
<p>
While adding multi-word synonyms looks somewhat
trivial - in real models, the naive approach can lead to thousands and even tens of thousands of
possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if
performed manually.
</p>
<p>
NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible
multi-word synonyms one by one you can use combination of following expressions:
</p>
<ul>
<li><a href="#macros">Macros</a></li>
<li><a href="#regex">Regular expressions</a></li>
<li><a href="#option-groups">Option Groups</a></li>
<li><a href="#dsl">Token DSL</a></li>
</ul>
<p>
Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example
where it will be matched on using its normalized and stemmatized form) or one of the above expression.
</p>
<p>
Note that this universal synonyms definition is used in the following
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> methods:
</p>
<ul>
<li><code>getSynonyms()</code> - gets synonyms to match on.</li>
<li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li>
</ul>
<span id="macros" class="section-sub-title">Macros</span>
<p>
Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros
together with option groups allow for significant simplification of this process.
Macros let you give a name to an often used set of words or option groups and reuse it without
repeating those words or option groups again and again. A model provides a list of macros via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMacros--">getMacros()</a> method on
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> interface. Each macro
has a name in a form of <code>&lt;X&gt;</code> where <code>X</code>
is just any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include
references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively
replaced with its value.
</p>
<p>
Here's a code snippet of macro definitions using JSON definition:
</p>
<pre class="brush: js">
"macros": [
{
"name": "&lt;A&gt;",
"macro": "aaa"
},
{
"name": "&lt;B&gt;",
"macro": "&lt;A&gt; bbb"
},
{
"name": "&lt;C&gt;",
"macro": "&lt;A&gt; bbb {z|w}"
}
]
</pre>
<span id="option-groups" class="section-sub-title">Option Groups</span>
<p>
Option groups are similar to wildcard patterns that operates on a single word base. One line of
option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened
synonyms notation. The following examples demonstrate how to use option groups.
</p>
<p>
Consider the following macros defined below (note that macros <code>&lt;B&gt;</code> and <code>&lt;C&gt;</code>
are nested):
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt;</code></td>
<td><code>aaa</code></td>
</tr>
<tr>
<td><code>&lt;B&gt;</code></td>
<td><code>&lt;A&gt; bbb</code></td>
</tr>
<tr>
<td><code>&lt;C&gt;</code></td>
<td><code>&lt;A&gt; bbb {z|w}</code></td>
</tr>
</tbody>
</table>
<p>
Then the following option group expansions will occur in these examples:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Synonym</th>
<th>Synonym Expansions</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt; {b|*} c</code></td>
<td>
<code>"aaa b c"</code><br>
<code>"aaa c"</code>
</td>
</tr>
<tr>
<td><code>&lt;B&gt; {b|*} c</code></td>
<td>
<code>"aaa bbb b c"</code><br>
<code>"aaa bbb c"</code>
</td>
</tr>
<tr>
<td><code>{b|\{\*\}}</code></td>
<td>
<code>"b"</code><br>
<code>"b {*}"</code>
</td>
</tr>
<tr>
<td><code>a {b|*}. c</code></td>
<td>
<code>"a b. c"</code><br>
<code>"a . c"</code>
</td>
</tr>
<tr>
<td><code>a .{b, |*}. c</code></td>
<td>
<code>"a .b, . c"</code><br>
<code>"a .. c"</code>
</td>
</tr>
<tr>
<td><code>
{% raw %}a {{b|c}|*}.{% endraw %}</code></td>
<td>
<code>"a ."</code><br>
<code>"a b."</code><br>
<code>"a c."</code>
</td>
</tr>
<tr>
<td><code>a {% raw %}{{{&lt;C&gt;}}|{*}}{% endraw %} c</code></td>
<td>
<code>"a aaa bbb z c"</code><br>
<code>"a aaa bbb w c"</code><br>
<code>"a c"</code>
</td>
</tr>
<tr>
<td><code>{% raw %}{{{a}}} {b||*|{{*}}||*}{% endraw %}</code></td>
<td>
<code>"a b"</code><br>
<code>"a"</code>
</td>
</tr>
</tbody>
</table>
<p>
Specifically:
</p>
<ul>
<li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li>
<li><code>{A|B|*}</code> denotes either <code>A</code> or <code>B</code> or nothing.</li>
<li>Excessive curly brackets are ignored, when safe to do so.</li>
<li>Macros cannot be recursive but can be nested.</li>
<li>Option groups can be nested.</li>
<li>
<code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and
<code>'*'</code> special symbols used by the option groups.
</li>
<li>Excessive whitespaces are trimmed when expanding option groups.</li>
</ul>
<p>
We can rewrite our transportation model element in a bit more efficient way using macros and option groups.
Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms
we would have to write out manually otherwise:
</p>
<pre class="brush: js, highlight: [4,5,14]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|*} {pickup|*} truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<span id="regex" class="section-sub-title">Regular Expressions</span>
<p>
Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is
considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that
regular expression can only span a single word, i.e. only individual words from the user input will be
matched against given regular expression and no whitespaces are allowed within regular expression. Note
also that option group special symbols <code>{</code>, <code>}</code>,
<code>|</code> and <code>*</code> have to be escaped in the regular expression using <code>\</code>
(backslash).
</p>
<p>
For example, the following synonym:
</p>
<pre class="brush: js">
"synonyms": [
"{foo|//[bar].+//}}"
]
</pre>
<p>
will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as
this string doesn't contain whitespaces.
</p>
<div class="bq info">
<b>Regular Expressions Performance</b>
<p>
It's important to note that regular expressions can significantly affect the performance of the
underlying NLPCraft implementation if used uncontrolled. Use it with caution and test the performance
of your model to ensure it meets your expectations.
</p>
</div>
<span id="values" class="section-sub-title">Element Values</span>
<p>
Model element can have an optional set of special synonyms called <em>values</em> or proper nouns for this element.
Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value,
and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an
implicit synonym even when no additional synonyms added for that value.
</p>
<p>
When a model element is recognized it is made available to the model's matching logic as an instance of
the <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> interface.
This interface has a method
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue--">getValue()</a> which
returns the name of the value, if any, by which
that model element was recognized. That value name can be further used in intent matching.
</p>
<p>
To understand the importance of the values consider the following changes to our transportation
example model:
</p>
<pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|*} {pickup|*} truck"
"sedan",
"coupe"
],
"values": [
{
"value": "mercedes",
"synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"]
},
{
"value": "bmw",
"synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"]
}
{
"value": "chevrolet",
"synonyms": ["chevy"]
}
]
}
]
...
</pre>
<p>
With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string:
</p>
<ul>
<li><code>car</code></li>
<li><code>benz</code> (with value <code>mercedes</code>)</li>
<li><code>3/4 ton pickup truck</code></li>
<li><code>light duty truck</code></li>
<li><code>chevy</code> (with value <code>chevrolet</code>)</li>
<li><code>bimmer</code> (with value <code>bmw</code>)</li>
<li><code>transport.vehicle</code></li>
</ul>
<p>
Note that element value can be used in token and intent DSLs.
</p>
<span id="groups" class="section-sub-title">Element Groups</span>
<p>
Each model element belongs to one or more groups. Model element provides its groups via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelElement.html#getGroups--">getGroups()</a> method.
By default, if element group is not specified, the element ID will act as its default group ID.
</p>
<p>
Group membership is a quick and easy way to organise similar model elements together and use this
categorization in token and intent DSL.
</p>
<p>
Note that the proper grouping of the elements is also necessary for the correct operation of
Short-Term-Memory (STM) in the conversational context
when using intent-based matching. See
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
for mode details.
</p>
<p>
Consider a <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> that
represents a previously found model element that is stored in the conversation. Such token
will be overridden in the conversation by the more <b>recent token</b>
from the <b>same group</b> - a critical rule of maintaining the proper conversational context.
</p>
<p>
Note that token's groups can be used in token and intent DSLs.
</p>
<span id="parent" class="section-sub-title">Element Parent</span>
<p>
Each model element can form an optional hierarchical relationship with other element by specifying its
parent element ID via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelElement.html#getParnetId--">getParentID()</a> method. The main idea here is that sometimes model elements can act not only individually but
their place in the hierarchy can be important too for token and intent DSL.
</p>
<p>
For example, we could have designed our transportation example model in a different way by using
multiple model elements linked with this hierarchy:
</p>
<pre>
+-- vehicle
| +--truck
| | |-- light.duty.truck
| | |-- heavy.duty.truck
| | +-- medium.duty.truck
| +--car
| | |-- coupe
| | |-- sedan
| | |-- hatchback
| | +-- wagon
</pre>
<p>
Then in our intent DSL, for example, we could look for any token with root parent ID <code>vehicle</code>
or immediate parent ID <code>truck</code> or <code>car</code> without a need to match on all current and
future individual sub-IDs:
</p>
<pre class="brush: plain">
"intent=vehicle.intent term={ancestors @@ 'vehicle'}"
"intent=truck.intent term={parent == 'truck'}"
"intent=car.intent term={parent == 'car'}"
</pre>
</section>
<section id="dsl" >
<h2 class="section-title">Token DSL</h2>
<p>
Any individual synonym word that that starts and ends with <code>^^</code> is a token DSL expression. A token
DSL expression inside of <code>^^ ... ^^</code> markers allows you to define a predicate on already parsed and detected token. It is very important to
note that unlike all other synonyms the token DSL predicate operates on a already detected <em>token</em>, not on an
individual unparsed <em>word</em>.
</p>
<p>
Token DSL allows you to <em>compose</em> named entities, i.e. use one name entity when defining another one. For example,
we could define a model element for the race car using our previous transportation example (note how synonym on
<b>line 18</b>
references the element defined on <b>line 4</b>):
</p>
<pre class="brush: js, highlight: [4, 18]">
...
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"truck",
"{light|heavy|super|medium} duty {pickup|*} truck"
"sedan",
"coupe"
]
},
{
"id": "race.vehicle",
"description": "Race vehicle",
"synonyms": [
"{race|speed|track} ^^id == 'transport.vehicle'^^"
]
}
]
...
</pre>
<div class="bq warn">
<p>
<b>Greedy NERs <span class="amp">&</span> Synonyms Conflicts</b>
</p>
<p>
Note that in the above example you need to ensure that words <code>race</code>,
<code>speed</code> or <code>track</code> are not part of the <code>transport.vehicle</code>
token. It is particular important for the 3rd party NERs where specific rules about what
words can or cannot be part of the token are unclear or undefined. In such cases the only remedy is
to extensively test with 3rd party NERs and verify the synonyms recognition in data probe logs.
</p>
</div>
<p>
Another often used use case is to wrap 3rd party named entities to add group membership, metadata or hierarchical
relationship to the externally detected named entity. For example, you can wrap <code>google:location</code>
token and add group membership for <code>my_group</code> group:
</p>
<pre class="brush: js, highlight: [6,8]">
...
"elements": [
{
"id": "google.loc.wrap",
"description": "Wrapper for google location",
"groups": ["my_group"],
"synonyms": [
"^^id == 'google:location'^^"
]
}
]
...
</pre>
<span id="dsl-syntax" class="section-sub-title">Token DSL Syntax</span>
<p>
Token DSL is a simple expression language for defining a single predicate over a token - a detected model
element. Remember that unlike token DSL all other types of synonyms work with simple words (vs. tokens).
Here's a full <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/src/main/scala/org/apache/nlpcraft/probe/mgrs/model/antlr4/NCSynonymDsl.g4">ANTLR4 grammar</a> for token DSL.
Note that this is exactly the same syntax as
used by <a href="intent-matching.html#syntax">intent DSL</a> for token predicates in intents - except for
aliases which we will explain below.
</p>
<p>
Here's an example of token DSL defining a synonym for the population of any city in France:
</p>
<pre class="brush: js">
"synonyms": [
"population {of|for} ^^[city](id == 'nlpcraft:city' && lowercase(~city:country) == 'france')^^"
]
</pre>
<p>
Few notes on token DSL syntax:
</p>
<ul>
<li>
This synonym defines a composed named entity, i.e. named entity that consists of other named entities.
In our example, we utilize token <code>nlpcraft:city</code> along with other basic synonym.
</li>
<li>
Token DSL expression always results in one and only one token when matched, however, the synonym can have multiple
token DSL expressions.
</li>
<li>
Token DSL expression can have optional alias (<code>[city]</code>) that can be used in other token DSL
expressions when referencing the token matched by that expression.
</li>
<li>
You can get all participant nested tokens, if required, using <code>NCToken#getPartTokens()</code> method call chain.
You can also reference participant tokens in the token DSL expression itself by using dot-notation (see below)
with either token IDs or aliases.
</li>
<li>
All string values should be places in single quotes, as in <code>'some string'</code>.
For numeric literals you can use underscores to help readability, i.e. <code>~list:size >= <b>1_000_000</b></code>
</li>
<li>
You can use <code>null</code>, <code>true</code> and <code>false</code> literals as a values.
</li>
<li>
Individual token expressions can be combined with <code>&&</code>, <code>||</code> and <code>!</code>
logical combinators and <code>(</code> <code>)</code> brackets that obey standard precedence rules.
</li>
</ul>
<p>
The individual token DSL expression can be one of the following forms:
</p>
<pre class="brush: js">
{qual}param op value
func({qual}param) op value
</pre>
<p>
The <code>{qual}param</code> is the left side parameter and it can have optional qualifier (<code>qual</code>).
Qualifier allows to reference participant tokens either by their ID or their DSL expression's alias using
dot-notation. For example:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Qualifier</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code><b>partId.</b>groups @@ 'my_grp'</code>
</td>
<td>
There must be a participant token (i.e. constituent token) with either token ID or alias
of <code>partId</code>. That participant token should belong to group <code>my_grp</code>.
</td>
</tr>
<tr>
<td>
<code><b>alias1.alias2.</b>~meta['key'] >= 10</code>
</td>
<td>
There must be two nested participant tokens with either token ID or alias
of <code>alias1</code> and <code>alias2</code>. That second (inner-most <code>alias2</code>) participant token
should have metadata property <code>meta</code> of type map with key <code>key</code> which value
should be greater or equal to 10.
</td>
</tr>
</tbody>
</table>
<div class="bq warn">
<p>
<b>NOTE:</b> If qualifier is present it <b>must</b> be valid and found, i.e. the participant tokens this qualifier
is referencing must be present. If qualifier is present but referenced participant tokens cannot be
found - the processing will abort with an exception rather than simply rejecting given synonym. In other
words, if specified - qualifiers are not optional.
</p>
</div>
<p>
The <code>param</code> itself can be one of the following literals:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>id</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getId--">ID</a> as
a <code>java.lang.String</code> object.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>id</b> == 'nlpcraft:city'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>groups</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getGroups--">groups</a>
as <code>java.util.Collection</code> of token IDs.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>groups</b> @@ 'my_group'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>aliases</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getAliases--">aliases</a>
as <code>java.util.Collection</code> of token aliases.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>aliases</b> @@ 'my_alias'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>startidx</code></td>
<td>
<p>
Token start character <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getStartCharIndex--">index</a> in the original text.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>startidx</b> > 5^^</code>
</p>
</td>
</tr>
<tr>
<td><code>endidx</code></td>
<td>
<p>
Token end character <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getStartCharIndex--">index</a> in the original text.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>endidx</b> < 15^^</code>
</p>
</td>
</tr>
<tr>
<td><code>parent</code></td>
<td>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getParentId--">ID</a> of
the parent token as a <code>java.lang.String</code> object.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>parent</b> == 'root'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>ancestors</code></td>
<td>
<p>
<code>java.util.List</code> of all token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getParentId--">parent ID</a>
from the current one to the root. List can be empty if current token has no parent ID.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>ancestors</b> @@ 'tok:id'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>value</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue--">value</a>
as a <code>java.lang.String</code> object.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>value</b> == 'brand_name'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>~propName</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">metadata</a>
property for given <code>propName</code>.
</p>
<p>
<b>Example:</b><br/>
<code>^^~<b>city:country</b> == 'france'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>~propName[key]</code></td>
<td>
<p>
Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">metadata</a>
property for given <code>propName</code>
of type <code>java.util.List</code> or <code>java.util.Map</code>.
Returns indexed or keyed value. Note that <code>key</code> should be integer
for <code>java.util.List</code> and string for <code>java.util.Map</code>.
Nested indexing is not allowed.
</p>
<p>
<b>Example:</b><br/>
<code>^^~<b>my:list[0]</b> >= 1_000_000^^</code><br>
<code>^^~<b>my:map['key']</b> >= 1_000_000^^</code>
</p>
</td>
</tr>
</tbody>
</table>
<p>
The optional <code>func</code> function can alter the value of the left-side parameter. Only one function call is allowed, i.e.
function calls cannot be nested. The primary use case for functions is dealing with 3rd party metadata where you
don't have a direct control on the values supplied from 3rd party named entity providers. The following functions are
supported:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Function Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>keys</code></td>
<td>
<p>
Calling <code>java.util.Map#keySet()</code> function on given parameter to a collection of
map keys. Applicable to <code>java.util.Map</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>keys</b>(~my:map) @@ 'my_key'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>values</code></td>
<td>
<p>
Calling <code>java.util.Map#values()</code> function on given parameter to get a collection
of map values. Applicable to <code>java.util.Map</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>values</b>(~my:map) @@ (200_000, 100_000)^^</code>
</p>
</td>
</tr>
<tr>
<td><code>trim</code></td>
<td>
<p>
Calling <code>java.lang.String#trim()</code> function on given parameter.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>trim</b>(~nlp:origtext) == '//^[Pp]aris$//'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>isalpha</code></td>
<td>
<p>
Checks that given string parameter contains only Unicode letters.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>isalpha</b>(~nlp:origtext) == true^^</code>
</p>
</td>
</tr>
<tr>
<td><code>isalphanum</code></td>
<td>
<p>
Checks that given string parameter contains only Unicode letters or digits.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>isalphanum</b>(~nlp:origtext) == true^^</code>
</p>
</td>
</tr>
<tr>
<td><code>isnumeric</code></td>
<td>
<p>
Checks that given string parameter contains only Unicode digits.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>isnumeric</b>(~zipcode) == true^^</code>
</p>
</td>
</tr>
<tr>
<td><code>iswhitespace</code></td>
<td>
<p>
Checks that given string parameter contains only whitespaces.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>iswhitespace</b>(~my_txt) == false^^</code>
</p>
</td>
</tr>
<tr>
<td><code>uppercase</code></td>
<td>
<p>
Calling <code>java.lang.String#toUpperCase()</code> function on given parameter.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>uppercase</b>(~nlp:origtext) == 'PARIS'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>lowercase</code></td>
<td>
<p>
Calling <code>java.lang.String#toLowerCase()</code> function on given parameter.
Applicable to <code>java.lang.String</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>lowercase</b>(~nlp:origtext) == 'paris'^^</code>
</p>
</td>
</tr>
<tr>
<td><code>ceil</code></td>
<td>
<p>
Calling <code>java.lang.Math#ceil()</code> function on given parameter.
Applicable to <code>java.lang.Double</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>ceil</b>(~custom:double) > 1.0^^</code>
</p>
</td>
</tr>
<tr>
<td><code>floor</code></td>
<td>
<p>
Calling <code>java.lang.Math#floor()</code> function on given parameter.
Applicable to <code>java.lang.Double</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>floor</b>(~custom:double) > 1.0^^</code>
</p>
</td>
</tr>
<tr>
<td><code>rint</code></td>
<td>
<p>
Calling <code>java.lang.Math#rint()</code> function on given parameter.
Applicable to <code>java.lang.Double</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>rint</b>(~custom:double) > 1.0^^</code>
</p>
</td>
</tr>
<tr>
<td><code>round</code></td>
<td>
<p>
Calling <code>java.lang.Map#round()</code> function on given parameter.
Applicable to <code>java.lang.Double</code> and <code>java.lang.Float</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>round</b>(~custom:double) > 1.0^^</code>
</p>
</td>
</tr>
<tr>
<td><code>size</code>, <code>count</code> or <code>length</code></td>
<td>
<p>
Getting size of the <code>java.util.Collection</code> or <code>java.util.Map</code>, or number
of characters for <code>java.lang.String</code> parameter.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>size</b>(~custom:coll) > 0^^</code>
</p>
</td>
</tr>
<tr>
<td><code>signum</code></td>
<td>
<p>
Calling <code>java.lang.Math#signum()</code> function on given parameter.
Applicable to <code>java.lang.Double</code> and <code>java.lang.Float</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>signum</b>(~custom:double) == -1^^</code>
</p>
</td>
</tr>
<tr>
<td><code>abs</code></td>
<td>
<p>
Calling <code>java.lang.Math#abs()</code> function on given parameter.
Applicable to <code>java.lang.Double</code>, <code>java.lang.Float</code>,
<code>java.lang.Long</code> and <code>java.lang.Integer</code> parameters only.
</p>
<p>
<b>Example:</b><br/>
<code>^^<b>abs</b>(~custom:int) > 10_000^^</code>
</p>
</td>
</tr>
</tbody>
</table>
<p>
The <code>op</code> (operation) can be one of the following:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Operation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>==</code><br/>
<code>!=</code>
</td>
<td>
<p>
Both operators perform equality check and work differently depending on the type of the left
and right parameter:
</p>
<ul>
<li>
<p>
If both left and right parameters are of type <code>java.util.Collection</code> then
it checks that both collections contain (do not contain)
exactly the same elements with exactly the same cardinalities.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~col <b>==</b> (1, 2, 3)^^</code></dt>
<dd>'col' metadata collection should contain only three elements: 1, 2, and 3.</dd>
<dt><code>^^groups <b>!=</b> ('null', 'void')^^</code></dt>
<dd>Token cannot belong to the exact two groups 'null' and 'void'.</dd>
<dt><code>^^keys(~map) <b>==</b> ('key1', 'key2')^^</code></dt>
<dd>'map' metadata map should contain only two keys: 'key1' and 'key2'.</dd>
</dl>
</li>
<li>
<p>
If only right parameters is of type <code>java.util.Collection</code> and the left
parameter is a single value then it checks that given single value is (is not) present
in the right side collection.
</p>
<b>Example:</b>
<dl>
<dt><code>^^id <b>==</b> ('id1', 'id2')^^</code></dt>
<dd>'id' should be either 'id1' or 'id2'.</dd>
<dt><code>^^~index <b>!=</b> (-1, 0)^^</code></dt>
<dd>
'index' metadata should NOT be either -1 or 0.
</dd>
</dl>
</li>
<li>
<p>
If both left and right parameters are of type <code>java.lang.Number</code>
then method <code>java.lang.Double.compare()</code> is used to compare two numbers.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~score <b>==</b> 100_000^^</code></dt>
<dd>
'score' metadata (of any numeric type) should be equal to 100,000 when compared using
double values.
</dd>
</dl>
</li>
<li>
<p>
If both left and right parameters are of type <code>java.lang.String</code>
and either one is a regular expression written using <code>//</code> prefix and suffix
syntax then that regular expression is used to perform equality check.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~txt <b>==</b> '//^[tT]ext$//'^^</code></dt>
<dd>'txt' metadata matches given regex.</dd>
<dt><code>^^~my_regex <b>!=</b> 'test'^^</code></dt>
<dd>
'my_regex' metadata regex string matches 'test' value. Note that 'my_regex' metadata string
should use <code>//</code>...<code>//</code> syntax for regular expression.
</dd>
</dl>
</li>
<li>
<p>
In all other cases the standard Java <code>java.lang.Object.equal()</code> equality check
is used.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~value <b>==</b> null^^</code></dt>
<dd>Token does not have a value.</dd>
<dt><code>^^parentId <b>!=</b> null^^</code></dt>
<dd>Token's parent ID is not null.</dd>
<dt><code>^^~flag <b>==</b> true^^</code></dt>
<dd>'flag' metadata is true.</dd>
</dl>
</li>
</ul>
</td>
</tr>
<tr>
<td>
<code>@@</code><br/>
<code>!@</code>
</td>
<td>
<p>
Both operators perform collection containment check and work differently depending on the type of the left
and right parameter:
</p>
<ul>
<li>
<p>
If left parameter is of type <code>java.util.Collection</code> and the right side
parameter is a single value then it checks that given collection contains (does not
contain) given single value.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~col <b>@@</b> 100_000^^</code></dt>
<dd>'col' metadata collection should contain 100,000 value.</dd>
<dt><code>^^groups <b>!@</b> 'null'^^</code></dt>
<dd>Token should not belong to 'null' group.</dd>
</dl>
</li>
<li>
<p>
If both left and right parameters are of type <code>java.util.Collection</code> then
it checks that a left side collection contains (does not contain) <b>all elements</b>
from the right side collection.
</p>
<b>Example:</b>
<dl>
<dt><code>^^~col <b>@@</b> (1, 2, 3)^^</code></dt>
<dd>'col' metadata collection should contain all three elements: 1, 2, and 3.</dd>
<dt><code>^^groups <b>!@</b> ('null', 'void')^^</code></dt>
<dd>
Token should not belong to both 'null' and 'void' groups in the same time.
Note that it can belong to other groups.
</dd>
</dl>
</li>
<li>
<p>
If both left and right parameters are of type <code>java.lang.String</code> then
it checks that a left side string contains (does not contain) the right side string as
its sub-string.
</p>
<b>Example:</b>
<dl>
<dt><code>^^id <b>@@</b> 'sub'^^</code></dt>
<dd>Token ID should contain 'sub' sub-string.</dd>
<dt><code>^^~name <b>!@</b> 'nlp'^^</code></dt>
<dd>
Metadata 'name' should not contain 'nlp' substring.
</dd>
</dl>
</li>
</ul>
</td>
</tr>
<tr>
<td>
<code>&gt;</code><br/>
<code><=</code><br/>
<code><=</code><br/>
<code>&lt;</code>
</td>
<td>
<p>
Standard relational operators that are applicable to <code>java.lang.Number</code> left and
right side values only.
</p>
<b>Example:</b>
<dl>
<dt><code>^^startidx <b>>=</b> 10^^</code></dt>
<dd>Token start index should be greater or equal to 10.</dd>
<dt><code>^^~score <b><</b> 100_000^^</code></dt>
<dd>
Metadata 'score' should be less then 100,000.
</dd>
</dl>
</td>
</tr>
</tbody>
</table>
<span id="combinators" class="section-sub-title">Logical Combinators</span>
<p>
Individual token expressions can be combined with <code>&&</code>, <code>||</code> and <code>!</code>
logical combinators and <code>( )</code> brackets that obey standard precedence rules as well as short-cut
processing of logical <code>&&</code> and <code>||</code> combinators. For example:
</p>
<p>
<code>^^[alias](my:list[0] >= 1_000_000 <b>&&</b> alias1.groups @@ 'clients')^^</code><br>
<code>^^<b>(</b>id == 'myid' && ~score > 10<b>)</b> <b>||</b> <b>(</b>alias1.groups @@ 'clients' && ~score <= 10<b>)</b>^^</code><br>
</p>
<span id="custom" class="section-sub-title">Custom Parsers</span>
<p>
In cases when declarative synonyms (macros, option groups, regexp and token DSL) are not expressive enough
you create your model element recognizer programmatically:
</p>
<ul>
<li>
Model provides its custom parsers via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getParsers--">getParsers()</a> method.
</li>
<li>
Custom parser is defined by the following classes:
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomElement.html">NCCustomElement</a>,
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomParser.html">NCCustomParser</a> and
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomWord.html">NCCustomWord</a>.
</li>
</ul>
</section>
<section id="logic">
<h2 class="section-title">Model Logic</h2>
<p>
When a user sends its request via REST API it is received by the REST server. Upon receipt,
the REST server does the basic NLP processing and enriching. Once finished, the REST server
sends the enriched request down to a specific data probe selected based on the requested data model.
</p>
<p>
The model logic is defined in <a href="intent-matching.html">intents</a>, specifically in the intent callbacks that get called when
their intent is chosen as a winning match against the user request.
Below we will quickly discuss the key APIs that are essential for developing intent callbacks.
Note that this does now replace a more detailed <a target=_ href="/apis/latest/index.html">Javadoc</a>
documentation that you are encouraged to read through as well:
</p>
<ul>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li>
<li>Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li>
</ul>
<h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></h3>
<p>
This interface provides read-only view on data model. Model view defines a declarative, or configurable, part of the model.
All properties in this interface can be defined or overridden in JSON/YAML external
presentation when used with <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> adapter.
</p>
<h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></h3>
<p>
This interface defines a context of a particular intent match. It can be passed into the callback of the matched intent
and provides the following:
</p>
<ul>
<li>ID of the matched intent.</li>
<li>Specific parsing variant that was matched against this intent.</li>
<li>Access to the original query context (<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a>).</li>
<li>Various access APIs for intent tokens.</li>
</ul>
<h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></h3>
<p>
This interface provides all available data about the parsed user input and all its
supplemental information. It's accessible from <code>NCIntentMatch</code> interface and
provide large amount of information to the intent callback logic:
</p>
<ul>
<li>
Server request ID. Server request is defined as a processing of one user input sentence.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
for controlling STM of conversation manager and dialog flow.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a>
instance that the intent callback method belongs to giving access to entire static model configuration.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> that
provides detailed information about the user input.
</li>
<li>
List of parsing variants provided
by <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants--">getVariants()</a>
method. When the user sentence gets parsed into individual tokens (i.e. detected model elements) there is generally
more than one way to do it. This ambiguity is perfectly fine because only the data model has all the
necessary information to select one parsing variant that fits that model the best. Without the data model
there isn't enough context to determine which variant is the best fitting.
Method <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants--">getVariants()</a>
returns list of all parsing variants for a given user input.
</li>
</ul>
<h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></h3>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCSentence.html">NCRequest</a> interface
is one of the several important entities in Data Model API that you as a model developer will be working with. You
should review its <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">Javadoc</a> but
here is an outline of the information it provides:
</p>
<ul>
<li>
Information about the user that issued the request.
</li>
<li>
User agent and remote address, if any available, of the user's application that made the initial REST call.
</li>
<li>
Original request text, timestamp of its receipt, and server request ID.
</li>
</ul>
<h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></h3>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> object is another
key abstraction in Data Model API. A token is a detected model element and is a part of a fully parsed user input.
Sequence of tokens represents parsed user input. A single token corresponds to a one or more words, sequential
or not, in the user sentence.
</p>
<p>
Most of the token's information is stored in map-based metadata accessible via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">getMetadata()</a> method.
Depending on the token ID each token will have different set of <a href="#meta">metadata properties</a>. Some common NLP properties
are always present for tokens of all types.
</p>
<h3 class="section-title">Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></h3>
<p>
This class defines data model result returned from model's intent callbacks. Result consists of the
text body and the type. The type is similar in notion to MIME types. Intent callbacks must use this class
to provide their results.
</p>
</section>
<section id="builtin">
<h2 class="section-title">Built-In Tokens</h2>
<p>
NLPCraft provides a number of built-in model elements (i.e. tokens) including the
<a href="integrations.html">integration</a> with several popular 3rd party NER frameworks. Table
below provides information about these built-in tokens. Section about <a href="#meta">token metadata</a> provides
further information about metadata that each type of token carries.
</p>
<p>
Built-in tokens have to be explicitly enabled on both the REST server and in the model. See
<code>nlpcraft.server.tokenProviders</code> configuration property and
<a target="javadoc" href="apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens--">NCModelView#getEnabledBuiltInTokens()</a>
method for more details. By default, only NLPCraft tokens are enabled (token ID
starting with <code>nlpcraft</code>).
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Token ID</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>nlpcraft:nlp</code></td>
<td>
<p>
This token denotes a word (always a single word) that is not a part of any other token. It's
also call a free-word, i.e. a word that is not linked to any other detected model element.
</p>
<p>
<b>NOTE:</b> the metadata from this token defines a common set of NLP properties and
is present in every other token as well.
</p>
</td>
<td>
<ul>
<li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:date</code></td>
<td>
This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not
currently recognize time component.
</td>
<td>
<ul>
<li>Meeting <code>next tuesday</code>.</li>
<li>Report for entire <code>2018 year</code>.</li>
<li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:num</code></td>
<td>
This token denotes a single numeric value or numeric condition.
</td>
<td>
<ul>
<li>Price <code>&gt; 100</code>.</li>
<li>Price is <code>less than $100</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:continent</code></td>
<td>
This token denotes a geographical continent.
</td>
<td>
<ul>
<li>Population of <code>Africa</code>.</li>
<li>Surface area of <code>America</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:subcontinent</code></td>
<td>
This token denotes a geographical subcontinent.
</td>
<td>
<ul>
<li>Population of <code>Alaskan peninsula</code>.</li>
<li>Surface area of <code>South America</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:region</code></td>
<td>
This token denotes a geographical region/state.
</td>
<td>
<ul>
<li>Population of <code>California</code>.</li>
<li>Surface area of <code>South Dakota</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:country</code></td>
<td>
This token denotes a country.
</td>
<td>
<ul>
<li>Population of <code>France</code>.</li>
<li>Surface area of <code>USA</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:city</code></td>
<td>
This token denotes a city.
</td>
<td>
<ul>
<li>Population of <code>Paris</code>.</li>
<li>Surface area of <code>Washington DC</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:metro</code></td>
<td>
This token denotes a metro area.
</td>
<td>
<ul>
<li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li>
<li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:sort</code></td>
<td>
This token denotes a sorting or ordering.
</td>
<td>
<ul>
<li>Report <code>sorted from top to bottom</code>.</li>
<li>Analysis <code>sorted in descending order</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:limit</code></td>
<td>
This token denotes a numerical limit.
</td>
<td>
<ul>
<li>Show <code>top 5</code> brands.</li>
<li>Show <code>several</code> brands.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:coordinate</code></td>
<td>
This token denotes a latitude and longitude coordinates.
</td>
<td>
<ul>
<li>Route the path to <code>55.7558, 37.6173</code> location.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:relation</code></td>
<td>
This token denotes a relation function:
<code>compare</code> or
<code>correlate</code>. Note this token always need another two tokens that it references.
</td>
<td>
<ul>
<li>
What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code>
(assuming that 'price' and 'location' are also detected tokens).
</li>
</ul>
</td>
</tr>
<tr>
<td><code>google:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
<code>google:person</code>, <code>google:location</code>, etc.
</p>
<p>
See <a href="integrations.html#google">integration</a> section for more details on how
to configure Google named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>opennlp:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
<code>opennlp:person</code>, <code>opennlp:money</code>, etc.
</p>
<p>
See <a href="integrations.html#opennlp">integration</a> section for more details on how
to configure Apache OpenNLP named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants under <code>100$</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>spacy:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
<code>spacy:person</code>, <code>spacy:location</code>, etc.
</p>
<p>
See <a href="integrations.html#spacy">integration</a> section for more details on how
to configure spaCy named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>stanford:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
<code>stanford:person</code>, <code>stanford:location</code>, etc.
</p>
<p>
See <a href="integrations.html#stanford">integration</a> section for more details on how
to configure Stanford CoreNLP named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</section>
<section id="meta">
<h2 class="section-title">Token Metadata</h2>
<p>
Each token has different set of metadata. Sections below describe metadata for each built-in token
supported by NLPCraft:
</p>
<ul>
<li><a href="#nlpcraft:nlp">Token ID <code>nlpcraft:nlp</code></a></li>
<li><a href="#nlpcraft:date">Token ID <code>nlpcraft:date</code></a></li>
<li><a href="#nlpcraft:num">Token ID <code>nlpcraft:num</code></a></li>
<li><a href="#nlpcraft:city">Token ID <code>nlpcraft:city</code></a></li>
<li><a href="#nlpcraft:continent">Token ID <code>nlpcraft:continent</code></a></li>
<li><a href="#nlpcraft:subcontinent">Token ID <code>nlpcraft:subcontinent</code></a></li>
<li><a href="#nlpcraft:region">Token ID <code>nlpcraft:region</code></a></li>
<li><a href="#nlpcraft:country">Token ID <code>nlpcraft:country</code></a></li>
<li><a href="#nlpcraft:metro">Token ID <code>nlpcraft:metro</code></a></li>
<li><a href="#nlpcraft:coordinate">Token ID <code>nlpcraft:coordinate</code></a></li>
<li><a href="#nlpcraft:sort">Token ID <code>nlpcraft:sort</code></a></li>
<li><a href="#nlpcraft:limit">Token ID <code>nlpcraft:limit</code></a></li>
<li><a href="#nlpcraft:relation">Token ID <code>nlpcraft:relation</code></a></li>
<li><a href="#stanford:xxx">Token ID <code>stanford:xxx</code></a></li>
<li><a href="#spacy:xxx">Token ID <code>spacy:xxx</code></a></li>
<li><a href="#google:xxx">Token ID <code>google:xxx</code></a></li>
<li><a href="#opennlp:xxx">Token ID <code>opennlp:xxx</code></a></li>
</ul>
<div class="bq info">
<p>
<b>Metadata Name Conflicts</b>
</p>
<p>
Note that model element metadata gets merged into the same map container as common NLP token metadata
(see <code>nlpcraft:nlp:xxx</code> properties below).
In other words, their share the same namespace. It is important to remember that and choose unique names
for user-defined metadata properties. One possible way that is used by NLPCraft internally is to prefix
metadata name with some unique prefix based on the token ID.
</p>
</div>
<span id="nlpcraft:nlp" class="section-sub-title">Token ID <code>nlpcraft:nlp</code></span>
<p>
This token's metadata provides common basic NLP properties that are part of any token.
<b>All tokens</b> without exception have these metadata properties. This metadata
represents a common set of NLP properties for a given token. All these metadata properties are <b>mandatory</b>.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:nlp:unid</b></code></td>
<td><code>java.lang.String</code></td>
<td>Internal globally unique system ID of the token.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:bracketed</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token is surrounded by any of <code>'['</code>, <code>']'</code>, <code>'{'</code>, <code>'}'</code>, <code>'('</code>, <code>')'</code> brackets.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:freeword</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token represents a free word. A free word is a token that was detected neither as a part of user defined or system tokens.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:direct</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token was matched on direct (not permutated) synonym.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:english</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this token represents an English word. Note that this only checks that token's text
consists of characters of English alphabet, i.e. the text doesn't have to be necessary a
known valid English word. See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed--" target="javadoc">NCModelView.isNonEnglishAllowed()</a> method
for corresponding model configuration.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:lemma</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Lemma of this token, i.e. a canonical form of this word. Note that stemming and
lemmatization allow to reduce inflectional forms and sometimes derivationally related forms
of a word to a common base form. Lemmatization refers to the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and to
return the base or dictionary form of a word, which is known as the lemma.
Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:stem</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Stem of this token. Note that stemming and lemmatization allow to reduce inflectional forms
and sometimes derivationally related forms of a word to a common base form. Unlike lemma,
stemming is a basic heuristic process that chops off the ends of words in the hope of
achieving this goal correctly most of the time, and often includes the removal of derivational
affixes.
Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:pos</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Penn Treebank POS tag for this token. Note that additionally to standard Penn Treebank POS
tags NLPCraft introduced '-&#45;&#45;' synthetic tag to indicate a POS tag for multiword tokens.
Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:posdesc</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Description of Penn Treebank POS tag.
Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:swear</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is a swear word. NLPCraft has built-in list of common English swear words.
See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed--" target="javadoc">NCModelView.isSwearWordsAllowed()</a> for corresponding model configuration
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:origtext</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Original user input text for this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:normtext</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Normalized user input text for this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:sparsity</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Numeric value of how sparse the token is. Sparsity zero means that all individual words in
the token follow each other.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:minindex</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of the first word in this token. Note that token may not be contiguous.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:maxindex</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of the last word in this token. Note that token may not be contiguous.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:wordindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>
List of original word indexes in this token. Note that a token can have words that are not
contiguous in the original sentence. Always has at least one element in it.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:wordlength</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Number of individual words in this token. Equal to the size of <code>wordindexes</code> list.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:contiguous</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token has zero sparsity, i.e. consists of contiguous words.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:start</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Start character index of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:end</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
End character index of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:index</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of this token in the sentence.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:charlength</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Character length of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:quoted</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is surrounded by single or double quotes.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:stopword</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is a stopword. Stopwords are some extremely common words which
add little value in helping understanding user input and are excluded from the processing entirely.
For example, words like a, the, can, of, about, over, etc. are typical stopwords in English.
NLPCraft has built-in set of stopwords.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:dict</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is found in Princeton WordNet database.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:date" class="section-sub-title">Token ID <code>nlpcraft:date</code></span>
<p>
This token denotes a date range including single days.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b>.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:date:from</b></code></td>
<td><code>java.lang.Long</code></td>
<td>
Start timestamp of the datetime range.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:date:to</b></code></td>
<td><code>java.lang.Long</code></td>
<td>
End timestamp of the datetime range.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:num" class="section-sub-title">Token ID <code>nlpcraft:num</code></span>
<p>
This token denotes a single numerical value or a numeric condition.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:num:from</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
Start of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
and <code>to</code> are the same this token represent a single value (whole or fractional) in
which case <code>isequalcondition</code>> will be <code>true</code>.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:to</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
Ed of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
and <code>to</code> are the same this token represent a single value (whole or fractional) in
which case <code>isequalcondition</code>> will be <code>true</code>.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:fromincl</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not start of the numeric range is inclusive
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:toincl</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not end of the numeric range is inclusive
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isequalcondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is an equality condition. Note that single numeric values also default to equality
condition and this property will be <code>true</code>. Indeed, <code>A is equal to 2</code> and
<code>A is 2</code> have the same meaning.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isnotequalcondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is a not-equality condition.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isfromnegativeinfinity</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this range is from negative infinity.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:israngecondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is a range condition.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:istopositiveinfinity</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this range is to positive infinity.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isfractional</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this token's value (single numeric value of a range) is a whole or a fractional number.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:unit</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>
Optional numeric value unit name (see below).
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:unittype</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>
Optional numeric value unit type (see below).
</td>
</tr>
</tbody>
</table>
<p>
Following table provides possible values for <code><b>nlpcraft:num:unit</b></code> and <code><b>nlpcraft:num:unittype</b></code>
properties:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>num:unittype</th>
<th>num:unit <sub>possible values</sub></th>
</tr>
</thead>
<tbody>
<tr><td><code>mass</code></td><td><code>feet per second</code><br/><code>grams</code><br/><code>kilogram</code><br/><code>grain</code><br/><code>dram</code><br/><code>ounce</code><br/><code>pound</code><br/><code>hundredweight</code><br/><code>ton</code><br/><code>tonne</code><br/><code>slug</code></td>
<tr><td><code>torque</code></td><td><code>newton meter</code></td>
<tr><td><code>area</code></td><td><code>square meter</code><br/><code>acre</code><br/><code>are</code><br/><code>hectare</code><br/><code>square inches</code><br/><code>square feet</code><br/><code>square yards</code><br/><code>square miles</code></td>
<tr><td><code>paper quantity</code></td><td><code>paper bale</code></td>
<tr><td><code>force</code></td><td><code>kilopond</code><br/><code>pond</code></td>
<tr><td><code>pressure</code></td><td><code>pounds per square inch</code></td>
<tr><td><code>solid angle</code></td><td><code>steradian</code></td>
<tr><td><code>pressure</code><br/><code>stress</code></td><td><code>pascal</code></td>
<tr><td><code>luminous</code></td><td><code>flux</code><br/><code>lumen</code></td>
<tr><td><code>amount of substance</code></td><td><code>mole</code></td>
<tr><td><code>luminance</code></td><td><code>candela per square metre</code></td>
<tr><td><code>angle</code></td><td><code>radian</code><br/><code>degree</code></td>
<tr><td><code>magnetic flux density</code><br/><code>magnetic field</code></td><td><code>tesla</code></td>
<tr><td><code>power</code><br/><code>radiant flux</code></td><td><code>watt</code></td>
<tr><td><code>datetime</code></td><td><code>second</code><br/><code>minute</code><br/><code>hour</code><br/><code>day</code><br/><code>week</code><br/><code>month</code><br/><code>year</code></td>
<tr><td><code>electrical inductance</code></td><td><code>henry</code></td>
<tr><td><code>electric charge</code></td><td><code>coulomb</code></td>
<tr><td><code>temperature</code></td><td><code>kelvin</code><br/><code>centigrade</code><br/><code>fahrenheit</code></td>
<tr><td><code>voltage</code><br/><code>electrical</code></td><td><code>volt</code></td>
<tr><td><code>momentum</code></td><td><code>kilogram meters per second</code></td>
<tr><td><code>amount of heat</code></td><td><code>calorie</code></td>
<tr><td><code>electrical capacitance</code></td><td><code>farad</code></td>
<tr><td><code>radioactive decay</code></td><td><code>becquerel</code></td>
<tr><td><code>electrical conductance</code></td><td><code>siemens</code></td>
<tr><td><code>luminous intensity</code></td><td><code>candela</code></td>
<tr><td><code>work</code><br/><code>energy</code></td><td><code>joule</code></td>
<tr><td><code>quantities</code></td><td><code>dozen</code></td>
<tr><td><code>density</code></td><td><code>density</code></td>
<tr><td><code>sound</code></td><td><code>decibel</code></td>
<tr><td><code>electrical resistance</code><br/><code>impedance</code></td><td><code>ohm</code></td>
<tr><td><code>force</code><br/><code>weight</code></td><td><code>newton</code></td>
<tr><td><code>light quantity</code></td><td><code>lumen seconds</code></td>
<tr><td><code>length</code></td><td><code>meter</code><br/><code>millimeter</code><br/><code>centimeter</code><br/><code>decimeter</code><br/><code>kilometer</code><br/><code>astronomical unit</code><br/><code>light year</code><br/><code>parsec</code><br/><code>inch</code><br/><code>foot</code><br/><code>yard</code><br/><code>mile</code><br/><code>nautical mile</code></td>
<tr><td><code>refractive index</code></td><td><code>diopter</code></td>
<tr><td><code>frequency</code></td><td><code>hertz</code><br/><code>angular frequency</code></td>
<tr><td><code>power</code></td><td><code>kilowatt</code><br/><code>horsepower</code><br/><code>bar</code></td>
<tr><td><code>magnetic flux</code></td><td><code>weber</code></td>
<tr><td><code>current</code></td><td><code>ampere</code></td>
<tr><td><code>acceleration of gravity</code></td><td><code>gravity imperial</code><br/><code>gravity metric</code></td>
<tr><td><code>volume</code></td><td><code>cubic meter</code><br/><code>liter</code><br/><code>milliliter</code><br/><code>centiliter</code><br/><code>deciliter</code><br/><code>hectoliter</code><br/><code>cubic inch</code><br/><code>cubic foot</code><br/><code>cubic yard</code><br/><code>acre-foot</code><br/><code>teaspoon</code><br/><code>tablespoon</code><br/><code>fluid ounce</code><br/><code>cup</code><br/><code>gill</code><br/><code>pint</code><br/><code>quart</code><br/><code>gallon</code></td>
<tr><td><code>speed</code></td><td><code>miles per hour</code><br/><code>meters per second</code></td>
<tr><td><code>illuminance</code></td><td><code>lux</code></td>
</tbody>
</table>
<br/>
<span id="nlpcraft:city" class="section-sub-title">Token ID <code>nlpcraft:city</code></span>
<p>
This token denotes a city.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:city:city</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the city.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for city's country (see below).
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:citymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for city (see below).
</td>
</tr>
</tbody>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:city:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:city:citymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>latitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>City latitude.</td>
</tr>
<tr>
<td><code><b>longitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>City longitude.</td>
</tr>
<tr>
<td><code><b>population</b></code></td>
<td><code>java.lang.Long</code></td>
<td>City population.</td>
</tr>
<tr>
<td><code><b>elevation</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Integer</code></td>
<td>Optional city elevation in meters.</td>
</tr>
<tr>
<td><code><b>timezone</b></code></td>
<td><code>java.lang.String</code></td>
<td>City timezone.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:continent" class="section-sub-title">Token ID <code>nlpcraft:continent</code></span>
<p>
This token denotes a continent.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:continent:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the continent.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:subcontinent" class="section-sub-title">Token ID <code>nlpcraft:subcontinent</code></span>
<p>
This token denotes a subcontinent.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:subcontinent:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the continent.</td>
</tr>
<tr>
<td><code><b>nlpcraft:subcontinent:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the subcontinent.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:metro" class="section-sub-title">Token ID <code>nlpcraft:metro</code></span>
<p>
This token denotes a metro area.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:metro:metro</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the metro area.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:region" class="section-sub-title">Token ID <code>nlpcraft:region</code></span>
<p>
This token denotes a geographical region.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
</tbody>
<tr>
<td><code><b>nlpcraft:region:region</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the region.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for region's country (see below).
</td>
</tr>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:region:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:country" class="section-sub-title">Token ID <code>nlpcraft:country</code></span>
<p>
This token denotes a country.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
</tbody>
<tr>
<td><code><b>nlpcraft:country:country</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the country.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for region's country (see below).
</td>
</tr>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:country:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:coordinate" class="section-sub-title">Token ID <code>nlpcraft:coordinate</code></span>
<p>
This token denotes a latitude and longitude coordinate.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>coordinate:latitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Coordinate latitude.</td>
</tr>
<tr>
<td><code><b>coordinate:longitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Coordinate longitude.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:sort" class="section-sub-title">Token ID <code>nlpcraft:sort</code></span>
<p>
This token denotes a sorting or ordering function.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:sort:subjindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>One of more indexes of the target tokens (i.e. the token that being sorted).</td>
</tr>
<tr>
<td><code><b>nlpcraft:sort:byindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Zero or more (i.e. optional) indexes of the reference token (i.e. the token being sorted by).</td>
</tr>
<tr>
<td><code><b>nlpcraft:sort:asc</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether sorting is in ascending or descending order.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:limit" class="section-sub-title">Token ID <code>nlpcraft:limit</code></span>
<p>
This token denotes a numeric limit value (like in "top 10" or "bottom five").
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:limit:indexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Index (always only one) of the reference token (i.e. the token being limited).</td>
</tr>
<tr>
<td><code><b>nlpcraft:limit:asc</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether limit order is ascending or descending.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:limit:limit</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Numeric value of the limit.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:relation" class="section-sub-title">Token ID <code>nlpcraft:relation</code></span>
<p>
This token denotes a numeric limit value (like in "top 10" or "bottom five").
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:relation:indexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Index (always only one) of the reference token (i.e. the token being related to).</td>
</tr>
<tr>
<td><code><b>nlpcraft:relation:type</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Type of the relation. One of the following values:
<ul>
<li><code>compare</code></li>
<li><code>correlate</code></li>
</ul>
</td>
</tr>
</tbody>
</table>
<br/>
<span id="google:xxx" class="section-sub-title">Token ID <code>google:xxx</code></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
<code>google:person</code>, <code>google:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>google:salience</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by Google Natural Language.</td>
</tr>
<tr>
<td><code><b>google:meta</b></code></td>
<td><code>java.util.Map&lt;String&gt;</code></td>
<td>
Map-based container for Google Natural Language specific properties.
</td>
</tr>
<tr>
<td><code><b>google:mentionsbeginoffsets</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mention begin offsets in the original normalized text.
</td>
</tr>
<tr>
<td><code><b>google:mentionscontents</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mentions.
</td>
</tr>
<tr>
<td><code><b>google:mentionstypes</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mention types.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="stanford:xxx" class="section-sub-title">Token ID <code>stanford:xxx</code></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
<code>stanford:person</code>, <code>stanford:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>stanford:confidence</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by Stanford CoreNLP.</td>
</tr>
<tr>
<td><code><b>stanford:nne</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Normalized Named Entity (NNE) text.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="spacy:xxx" class="section-sub-title">Token ID <code>spacy:xxx</code></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
<code>spacy:person</code>, <code>spacy:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>spacy:vector</b></code></td>
<td><code>java.lang.Double</code></td>
<td>spaCy span vector. </td>
</tr>
<tr>
<td><code><b>spacy:sentiment</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
A scalar value indicating the positivity or negativity of the token.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="opennlp:xxx" class="section-sub-title">Token ID <code>opennlp:xxx</code></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
<code>opennlp:person</code>, <code>opennlp:money</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>opennlp:probability</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by OpenNLP.</td>
</tr>
</tbody>
</table>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Model Overview</a></li>
<li><a href="#dataflow">Model Dataflow</a></li>
<li><a href="#lifecycle">Model Lifecycle</a></li>
<li><a href="#config">Model Configuration</a></li>
<li><a href="#elements">Model Elements</a></li>
<li><a href="#dsl">Token DSL</a></li>
<li><a href="#logic">Model Logic</a></li>
<li><a href="#builtin">Built-In Tokens</a></li>
<li><a href="#meta">Token Metadata</a></li>
{% include quick-links.html %}
</ul>
</div>