blob: 9b96561450d4f6aea9b5ec99ee7e7f6d11b77cab [file] [log] [blame]
---
active_crumb: Data Model
layout: documentation
id: data_model
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Model Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Data model is a central concept in NLPCraft defining natural language interface to your data sources
like a database or a SaaS application.
NLPCraft employs a <em>model-as-a-code</em> approach where entire data model is an implementation of
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface which
can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy.
</p>
<p>
A data model defines:
</p>
<ul>
<li>Set of model <a href="#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li>
<li>Zero or more intents and their callbacks.</li>
<li>Common model configuration and various life-cycle callbacks.</li>
</ul>
<p>
Note that model-as-a-code approach natively supports any software life
cycle tools and frameworks like various build tools, CI/SCM tools, IDEs, etc.
You don't have to use additional web-based tools to manage some aspects of your
data models - your entire model and all of its components are part of your project source code.
</p>
<p>
Here's two quick examples of the fully-functional data model implementations (from <a href="/examples/light_switch.html">Light Switch</a> and
<a href="/examples/alarm_clock.html">Alarm Clock</a> examples). You will find specific details about these
implementations in the following sections:
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch" role="tab"><b>LightSwitch <code><sub>ex</sub></code></b></a>
<a class="nav-item nav-link" data-toggle="tab" href="#alarm" role="tab"><b>Alarm <code><sub>ex</sub></code></b></a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="lightswitch" role="tabpanel">
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch_scala_model" role="tab"><code>LightSwitchModel.scala</code></a>
<a class="nav-item nav-link" data-toggle="tab" href="#lightswitch_yaml_model" role="tab"><code>lightswitch_model.yaml</code></a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="lightswitch_scala_model" role="tabpanel">
<pre class="brush: scala">
package org.apache.nlpcraft.examples.lightswitch
import org.apache.nlpcraft.model.{NCIntentTerm, _}
class LightSwitchModel extends NCModelFileAdapter("lightswitch_model.yaml") {
@NCIntentRef("ls")
@NCIntentSample(Array(
"Turn the lights off in the entire house.",
"Switch on the illumination in the master bedroom closet.",
"Get the lights on.",
"Lights up in the kitchen.",
"Please, put the light out in the upstairs bedroom.",
"Set the lights on in the entire house.",
"Turn the lights off in the guest bedroom.",
"Could you please switch off all the lights?",
"Dial off illumination on the 2nd floor.",
"Please, no lights!",
"Kill off all the lights now!",
"No lights in the bedroom, please.",
"Light up the garage, please!"
))
def onMatch(
@NCIntentTerm("act") actTok: NCToken,
@NCIntentTerm("loc") locToks: List[NCToken]
): NCResult = {
val status = if (actTok.getId == "ls:on") "on" else "off"
val locations =
if (locToks.isEmpty)
"entire house"
else
locToks.map(_.meta[String]("nlpcraft:nlp:origtext")).mkString(", ")
// Add HomeKit, Arduino or other integration here.
// By default - return a descriptive action string.
NCResult.text(s"Lights are [$status] in [${locations.toLowerCase}].")
}
}
</pre>
</div>
<div class="tab-pane fade show" id="lightswitch_yaml_model" role="tabpanel">
<pre class="brush: js">
id: "nlpcraft.lightswitch.ex"
name: "Light Switch Example Model"
version: "1.0"
description: "NLI-powered light switch example model."
macros:
- name: "&lt;ACTION&gt;"
macro: "{turn|switch|dial|let|set|get|put}"
- name: "&lt;KILL&gt;"
macro: "{shut|kill|stop|eliminate}"
- name: "&lt;ENTIRE_OPT&gt;"
macro: "{entire|full|whole|total|_}"
- name: "&lt;FLOOR_OPT&gt;"
macro: "{upstairs|downstairs|{1st|first|2nd|second|3rd|third|4th|fourth|5th|fifth|top|ground} floor|_}"
- name: "&lt;TYPE&gt;"
macro: "{room|closet|attic|loft|{store|storage} {room|_}}"
- name: "&lt;LIGHT&gt;"
macro: "{all|_} {it|them|light|illumination|lamp|lamplight}"
enabledBuiltInTokens: [] # This example doesn't use any built-in tokens.
#
# Allows for multi-word synonyms in this entire model
# to be sparse and permutate them for better detection.
# These two properties generally enable a free-form
# natural language comprehension.
#
permutateSynonyms: true
sparse: true
elements:
- id: "ls:loc"
description: "Location of lights."
synonyms:
- "&lt;ENTIRE_OPT&gt; &lt;FLOOR_OPT&gt; {kitchen|library|closet|garage|office|playroom|{dinning|laundry|play} &lt;TYPE&gt;}"
- "&lt;ENTIRE_OPT&gt; &lt;FLOOR_OPT&gt; {master|kid|children|child|guest|_} {bedroom|bathroom|washroom|storage} {&lt;TYPE&gt;|_}"
- "&lt;ENTIRE_OPT&gt; {house|home|building|{1st|first} floor|{2nd|second} floor}"
- id: "ls:on"
groups:
- "act"
description: "Light switch ON action."
synonyms:
- "&lt;ACTION&gt; {on|up|_} &lt;LIGHT&gt; {on|up|_}"
- "&lt;LIGHT&gt; {on|up}"
- id: "ls:off"
groups:
- "act"
description: "Light switch OFF action."
synonyms:
- "&lt;ACTION&gt; &lt;LIGHT&gt; {off|out|down}"
- "{&lt;ACTION&gt;|&lt;KILL&gt;} {off|out|down} &lt;LIGHT&gt;"
- "&lt;KILL&gt; &lt;LIGHT&gt;"
- "&lt;LIGHT&gt; &lt;KILL&gt;"
- "{out|no|off|down} &lt;LIGHT&gt;"
- "&lt;LIGHT&gt; {out|off|down}"
intents:
- "intent=ls term(act)={has(tok_groups, 'act')} term(loc)={# == 'ls:loc'}*"
</pre>
</div>
</div>
</div>
<div class="tab-pane fade show" id="alarm" role="tabpanel">
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#alarm_java_model" role="tab"><code>AlarmModel.java</code></a>
<a class="nav-item nav-link" data-toggle="tab" href="#alarm_intents_idl" role="tab"><code>intents.idl</code></a>
<a class="nav-item nav-link" data-toggle="tab" href="#alarm_json_model" role="tab"><code>alarm_model.json</code></a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="alarm_java_model" role="tabpanel">
<pre class="brush: java">
package org.apache.nlpcraft.examples.alarm;
import org.apache.nlpcraft.model.*;
import java.time.*;
import java.util.*;
import static java.time.temporal.ChronoUnit.MILLIS;
public class AlarmModel extends NCModelFileAdapter {
private static final DateTimeFormatter FMT =
DateTimeFormatter.ofPattern("HH'h' mm'm' ss's'").withZone(ZoneId.systemDefault());
private final Timer timer = new Timer();
public AlarmModel() {
// Loading the model from the file.
super("alarm_model.json");
}
@NCIntentRef("alarm") // Intent is defined in JSON model file (alarm_model.json and intents.idl).
@NCIntentSampleRef("alarm_samples.txt") // Samples supplied in an external file.
NCResult onMatch(
NCIntentMatch ctx,
@NCIntentTerm("nums") List&lt;NCToken&gt; numToks
) {
long ms = calculateTime(numToks);
assert ms >= 0;
timer.schedule(
new TimerTask() {
@Override
public void run() {
System.out.println(
"BEEP BEEP BEEP for: " + ctx.getContext().getRequest().getNormalizedText() + ""
);
}
},
ms
);
return NCResult.text("Timer set for: " + FMT.format(LocalDateTime.now().plus(ms, MILLIS)));
}
@Override
public void onDiscard() {
// Clean up when model gets discarded (e.g. during testing).
timer.cancel();
}
public static long calculateTime(List&lt;NCToken&gt; numToks) {
LocalDateTime now = LocalDateTime.now();
LocalDateTime dt = now;
for (NCToken num : numToks) {
String unit = num.meta("nlpcraft:num:unit");
// Skip possible fractional to simplify.
long v = ((Double)num.meta("nlpcraft:num:from")).longValue();
if (v <= 0)
throw new NCRejection("Value must be positive: " + unit);
switch (unit) {
case "second": { dt = dt.plusSeconds(v); break; }
case "minute": { dt = dt.plusMinutes(v); break; }
case "hour": { dt = dt.plusHours(v); break; }
case "day": { dt = dt.plusDays(v); break; }
case "week": { dt = dt.plusWeeks(v); break; }
case "month": { dt = dt.plusMonths(v); break; }
case "year": { dt = dt.plusYears(v); break; }
default:
// It shouldn't be an assertion, because 'datetime' unit can be extended outside.
throw new NCRejection("Unsupported time unit: " + unit);
}
}
return now.until(dt, MILLIS);
}
}
</pre>
</div>
<div class="tab-pane fade show" id="alarm_intents_idl" role="tabpanel">
<pre class="brush: idl">
// Fragments (mostly for demo purposes here).
fragment=buzz term~{# == 'x:alarm'}
fragment=when
term(nums)~{
// Demonstrating term variables.
@type = meta_tok('nlpcraft:num:unittype')
@iseq = meta_tok('nlpcraft:num:isequalcondition') // Excludes conditional statements.
# == 'nlpcraft:num' && @type == 'datetime' && @iseq == true
}[1,7]
// Intents (using fragments).
intent=alarm
fragment(buzz)
fragment(when)
</pre>
</div>
<div class="tab-pane fade show" id="alarm_json_model" role="tabpanel">
<pre class="brush: js">
{
"id": "nlpcraft.alarm.ex",
"name": "Alarm Example Model",
"version": "1.0",
"description": "Alarm example model.",
"enabledBuiltInTokens": [
"nlpcraft:num"
],
"elements": [
{
"id": "x:alarm",
"description": "Alarm token indicator.",
"synonyms": [
"{ping|buzz|wake|call|hit} {me|up|me up|_}",
"{set|_} {my|_} {wake|wake up|_} {alarm|timer|clock|buzzer|call} {clock|_} {up|_}"
]
}
],
"intents": [
"import('intents.idl')" // Import intents from external file.
]
}
</pre>
</div>
</div>
</div>
</div>
<p>
Further sub-sections will provide details on model's static configuration and dynamic programmable
logic implementation.
</p>
</section>
<section id="dataflow">
<h2 class="section-title">Model Dataflow <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<figure>
<img alt="data model dataflow" class="img-fluid" src="/images/homepage-fig1.1.png">
<figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption>
</figure>
<p>
Let's review the general dataflow of the user request in NLPCraft (from right to left).
User request starts with the user application (like a chatbot or NLI-based system) making a
REST call using <a href="/using-rest.html">NLPCraft REST API</a>. That REST call carries among
other things the input text and data model ID, and it arrives first to the REST server.
</p>
<p>
Upon receiving the user request, the REST server performs NLP pre-processing converting the input
text into a sequence of tokens and enriching them with additional information.
Once finished, the sequence of tokens is sent further down to the probe where the requested data model
is deployed.
</p>
<p>
Upon receiving that sequence of tokens, the data probe further
enriches it based on the user data model and <a href="/intent-matching.html">matches</a> it against declared intents. When a matching
intent is found its callback method is called and its result travels back from the data probe to the
REST server and eventually to the user that made the REST call.
</p>
<div class="bq info">
<p>
<b>Security <span class="amp">&</span> Isolation</b>
</p>
<p>
Note that in this architecture the user-defined data model is fully isolated from the REST server accepting
user calls. Users never access data probes and hence data models directly. Typically REST server
should be deployed in DMZ and only <em>ingress connectivity is required</em> from the REST server to data probes.
</p>
</div>
</section>
<section id="lifecycle">
<h2 class="section-title">Model Lifecycle <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Data model is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface.
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface has
defaults for most of its methods. These are the only methods that must to be implemented by its sub-class:
</p>
<ul>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">getId()</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">getName()</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">getVersion()</a></li>
</ul>
<p>
You can either implement <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
interface directly or use one of the adapters (recommended in most cases):
</p>
<ul>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelAdapter.html">NCModelAdapter</a> - when
entire model definition is in sub-class source code.
</li>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> - when
using external JSON/YAML declaration for model definition.
</li>
</ul>
<p>
Note that you can also use 3rd party IoC frameworks like <a target=_ href="https://spring.io">Spring</a> to construct your data models. See
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFactory.html">NCModelFactory</a> for more information.
</p>
<div class="bq success">
<div class="bq-idea-container">
<div><div>💡</div></div>
<div>
<p>
<b>Using Adapters</b>
</p>
<p>
It is recommended to use one of the adapter classes when defining your
own data model in the most uses cases.
</p>
</div>
</div>
</div>
<h2 id="deployment" class="section-sub-title">Deployment <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Data models get <a href="/server-and-probe.html">deployed</a> to and hosted by the data probes - a lightweight
container whose job is to host data models and securely transfer requests between REST server and the data
models. When a data probe starts it reads its <a href="/server-and-probe.html">configuration</a>
to see which models to deploy.
</p>
<p>
Note that data probes don't support hot-redeployment. To redeploy the data model you need to restart
the data probe. Note also that data probe can be started in <a href="/tools/embedded_probe.html">embedded mode</a>, i.e. it can be started
from within an existing JVM process like user application.
</p>
<h2 id="callbacks" class="section-sub-title">Callbacks <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There are two lifecycle callbacks on
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface
(by way of extending <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html">NCLifecycle</a> interface) that you can override to affect the the default lifecycle behavior:
</p>
<ul>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onInit()">onInit()</a> - called
right after the model was loaded and deployed.
</li>
<li>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onDiscard()">onDiscard()</a> - called to
discard the data model when and only when data probe is orderly shutting down.
</li>
</ul>
<p>
There are also several callbacks that you can override to affect model behavior during
<a href="/intent-matching.html#model_callbacks">intent matching</a>
to perform logging, debugging, statistic or usage collection, explicit update or initialization of
conversation context, security audit or validation:
</p>
<ul>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onParsedVariant(org.apache.nlpcraft.model.NCVariant)">onParsedVariant(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onResult(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCResult)">onResult(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onError(org.apache.nlpcraft.model.NCContext,java.lang.Throwable)">onError(...)</a>
</li>
<li>
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onRejection(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCRejection)">onRejection(...)</a>
</li>
</ul>
<div class="bq info">
<b>Conversation Reset</b>
<p>
Callbacks
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a> and
<a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a>
are especially handy to perform a soft reset on the conversation context. Read their Javadoc documentation
to understand these callbacks protocol.
</p>
</div>
<div class="bq info">
<b>Lifecycle Components</b>
<p>
Note that both the server and the probe provide their own lifecycle components support. When registered in
the probe or server configuration the lifecycle components will be called
during various stages of the probe or server startup or shutdown procedures. These callbacks can be used
to control lifecycle of external libraries and systems that the data probe or the server rely on, i.e.
<a href="metrics-and-tracing.html">OpenCensus exporters</a>, security environment, devops hooks, etc.
</p>
<p>
See server and probe <a href="">configuration</a>.
</p>
</div>
</section>
<section id="config">
<h2 class="section-title">Model Configuration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Apart from mandatory model <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">ID</a>,
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">name</a> and
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">version</a>
there is a number of static model configurations that you can set. All of these properties have sensible
defaults that you can override, when required, in either sub-classes or via external JSON/YAML declaration:
</p>
<ul>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getAdditionalStopWords()">getAdditionalStopWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">getEnabledBuiltInTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExcludedStopWords()">getExcludedStopWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxFreeWords()">getMaxFreeWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxSuspiciousWords()">getMaxSuspiciousWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTokens()">getMaxTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTotalSynonyms()">getMaxTotalSynonyms</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxUnknownWords()">getMaxUnknownWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxWords()">getMaxWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMetadata()">getMetadata</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinNonStopwords()">getMinNonStopwords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinTokens()">getMinTokens</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinWords()">getMinWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getSuspiciousWords()">getSuspiciousWords</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isDupSynonymsAllowed()">isDupSynonymsAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()">isNonEnglishAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoNounsAllowed()">isNoNounsAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNotLatinCharsetAllowed()">isNotLatinCharsetAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoUserTokensAllowed()">isNoUserTokensAllowed</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isPermutateSynonyms()">isPermutateSynonyms</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSparse()">isSparse</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()">isSwearWordsAllowed</a></li>
</ul>
<h2 class="section-sub-title">External JSON/YAML Declaration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
You can move out all the static model configuration into an external JSON or YAML file. To load that
configuration you need to use <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
adapter when creating your data model. Here are JSON and YAML sample templates and you can find more details in
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Javadoc and in
<a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">examples</a>.
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#model-json" role="tab">JSON</a>
<a class="nav-item nav-link" data-toggle="tab" href="#model-yaml" role="tab">YAML</a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="model-json" role="tabpanel">
<pre class="brush: js">
{
"id": "user.defined.id",
"name": "User Defined Name",
"version": "1.0",
"description": "Short model description.",
"enabledBuiltInTokens": ["google:person", "google:location"]
"macros": [],
"metadata": {},
"elements": [
{
"id": "x:id",
"description": "",
"groups": [],
"parentId": "",
"synonyms": [],
"metadata": {},
"values": []
}
],
...
"intents": []
}
</pre>
</div>
<div class="tab-pane fade show" id="model-yaml" role="tabpanel">
<pre class="brush: js">
id: "user.defined.id"
name: "User Defined Name"
version: "1.0"
description: "Short model description."
macros:
enabledBuiltInTokens:
elements:
- id: "x:id"
description: ""
synonyms:
groups:
values:
parentId:
metadata:
...
intents:
</pre>
</div>
</div>
<div class="bq success">
<div class="bq-idea-container">
<div><div>💡</div></div>
<div>
Note that using JSON/YAML-based configuration is a <b>canonical way</b> for
creating data models in NLPCraft as it allows to cleanly separate static configuration from model's
programmable logic.
</div>
</div>
</div>
</section>
<section id="ne">
<h2 class="section-title">Named Entities <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Named entity, also known as a model element or a token, is one of the main a components defined by the NLPCraft data model.
A named entity is one or more individual words that have a consistent semantic meaning and typically denote a
real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
object can be abstract or have a physical existence.
</p>
<p>
For example, in the following sentence:
</p>
<figure>
<img alt="named entities" class="img-fluid" src="/images/named-entities.png">
<figcaption><b>Fig 2.</b> Named Entities</figcaption>
</figure>
<p>
the following named entities can be detected:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Words</th>
<th>Type</th>
<th>Normalized Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>Top 20</b></td>
<td><code>nlpcraft:limit</code></td>
<td>top 20</td>
</tr>
<tr>
<td><b>best pages</b></td>
<td><code>user:element</code></td>
<td>best pages</td>
</tr>
<tr>
<td><b>California USA</b></td>
<td><code>nlpcraft:geo</code></td>
<td>USA, California</td>
</tr>
<tr>
<td><b>last 3 months</b></td>
<td><code>nlpcraft:date</code></td>
<td>1/1/2021 - 4/1/2021</td>
</tr>
</tbody>
</table>
<p>
In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many
notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>,
<code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form.
</p>
<p>
The process of detecting named entities is called Named Entity Recognition (NER). There are many ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using
statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based
named entities definition as well as the ability to compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL)
combining other named entities including named entities from
<a href="/integrations.html">external project</a> such OpenNLP, spaCy or Stanford CoreNLP.
</p>
<p>
Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic
abstractions like geographical location or time when you are trying to understand the meaning of the sentence.
One of the main goals of named entities is to act as an input ingredients for <a href="/intent-matching.html">intent matching</a>.
</p>
<div class="bq info">
<p>
<b>😀 User Input → Named Entities → Parsing Variants → Intent Matcher → Winning Intent 🚀</b>
</p>
<p>
User input is parsed into the list of named entities. That list is then further transformed into one or more
parsing variants where each variant represents a particular order and combination of detected named entities.
Finally, the list of variants act as an input to intent matching where each variant is matched against every intent
in the process of detecting the best matching intent for the original user input.
</p>
</div>
</section>
<section id="elements">
<h2 class="section-title">Model Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Data model element defines a named entity that will be detected in the user input.
Model element is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a>
interface. <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> provides
its elements via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getElements()">getElements()</a> method.
Typically, you create model elements by either:
</p>
<ul>
<li>
Implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> interface directly, or
</li>
<li>
Using JSON or YAML static model configuration (the preferred way in most cases).
</li>
</ul>
<p>
Note that when you use external static model configuration with JSON or YAML you can still modify it after it was loaded
using <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a>
adapter. It is particular convenient when synonyms or values are loaded separately from, or in
addition to, the model elements themselves, i.e. from a database or another file.
</p>
<div class="bq info">
<p>
<b>Model Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b>
</p>
<p>
Terms 'model element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably:
</p>
<dl>
<dt>Model Element</dt>
<dd>
Denotes a named entity <em>declared</em> in NLPCraft model.
</dd>
<dt>Token</dt>
<dd>
Denotes a model element that was <em>detected</em> by NLPCraft in the user input.
</dd>
<dt>Named Entity</dt>
<dd>
Denotes a classic term, i.e. one or more individual words that have a
consistent semantic meaning and typically define a real-world object.
</dd>
</dl>
</div>
<p>
Although model element and named entity describe a similar concept, the NLPCraft model
elements provide a much more powerful instrument. Unlike named entities support in other projects
NLPCraft model elements have number of unique capabilities:
</p>
<ul>
<li>
New model elements can be added declaratively via a subset of NLPCraft <a href="/intent-matching.html">IDL</a>, regex and macro expansion.
</li>
<li>
New model elements can be also added programmatically for ultimate flexibility.
</li>
<li>
Model elements can have many-to-many group memberships.
</li>
<li>
Model elements can form a hierarchical structure.
</li>
<li>
Model elements are composable, i.e. a model element can use other model elements in its definition.
</li>
<li>
Model elements can be declared with user defined metadata.
</li>
<li>
Model elements provide normalized values and can define their own "proper nouns".
</li>
<li>
Model elements can compose named entities from many <a href="integrations.html#nlp">3rd party libraries</a>.
</li>
<li>
All properties of model elements (id, groups, parent & ancestors, values, and metadata) can be used in NLPCraft <a href="/intent-matching.html">IDL</a>.
</li>
</ul>
<h2 class="section-title">User vs. Built-In Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Additionally to the model elements that are defined by the user in the data model (i.e. <em>user model elements</em>)
NLPCraft provides its own <a href="#builtin">built-in named entities</a> as well as the integration with number of <a href="integrations.html#nlp">3rd party projects</a>. You can think of these built-in elements as if they were implicitly defined in your model - you
can use them in exactly the same way as if you defined them yourself.
You can find more information on how to configure external token providers
in <a href="/integrations.html#nlp">Integrations</a> section.
</p>
<p>
Note that you can't directly change group membership, parent-child relationship or metadata of the
built-in elements. You can, however, "wrap" built-in entity into your own one using <code>^^{tok_id() == 'external.id'}^^</code>
<a href="/intent-matching.html">IDL</a> expression as its synonym where you can define all necessary additional
configuration properties (more on that below).
</p>
<span id="synonyms" class="section-sub-title">Synonyms <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that
would require pre-existing marked up data sets and extensive training. For each model element you can either provide a
set of synonyms to match on or specify a piece of code that would be responsible for detecting that named
entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its
implicit synonym so that even if no additional synonyms are defined at least one synonym always exists. Note
also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both
a synonym and user input.
</p>
<p>
Here's an example of a simple model element definition in JSON:
</p>
<pre class="brush: js, highlight: [6,7,8,9,10,11,12]">
...
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"truck",
"light duty truck"
"heavy duty truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<p>
While adding multi-word synonyms looks somewhat
trivial - in real models, the naive approach can lead to thousands and even tens of thousands of
possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if
performed manually.
</p>
<p>
NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible
multi-word synonyms one by one you can use combination of following techniques:
</p>
<ul>
<li><a href="#macros">Macros</a></li>
<li><a href="#regex">Regular expressions</a></li>
<li><a href="#option-groups">Option Groups</a></li>
<li><a href="#dsl">IDL expressions</a></li>
<li><a href="#programmable_ners">Programmable NERs</a></li>
</ul>
<p>
Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example
where it will be matched on using its normalized and stemmatized form) or one of the above expression.
</p>
<p>
Note that this synonyms definition is also used in the following
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> methods:
</p>
<ul>
<li><code>getSynonyms()</code> - gets synonyms to match on.</li>
<li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li>
</ul>
<span id="values" class="section-sub-title">Element Values <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Model element can have an optional set of special synonyms called <em>values</em> or "proper nouns" for this element.
Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value,
and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an
implicit synonym even when no additional synonyms added for that value.
</p>
<p>
When a model element is recognized it is made available to the model's matching logic as an instance of
the <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> interface.
This interface has a method
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue()">getValue()</a> which
returns the name of the value, if any, by which
that model element was recognized. That value name can be further used in intent matching.
</p>
<p>
To understand the importance of the values consider the following changes to our transportation
example model:
</p>
<pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|_} {pickup|_} truck"
"sedan",
"coupe"
],
"values": [
{
"value": "mercedes",
"synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"]
},
{
"value": "bmw",
"synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"]
}
{
"value": "chevrolet",
"synonyms": ["chevy"]
}
]
}
]
...
</pre>
<p>
With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string:
</p>
<ul>
<li><code>car</code></li>
<li><code>benz</code> (with value <code>mercedes</code>)</li>
<li><code>3/4 ton pickup truck</code></li>
<li><code>light duty truck</code></li>
<li><code>chevy</code> (with value <code>chevrolet</code>)</li>
<li><code>bimmer</code> (with value <code>bmw</code>)</li>
<li><code>transport.vehicle</code></li>
</ul>
<span id="groups" class="section-sub-title">Element Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Each model element always belongs to one or more groups. Model element provides its groups via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getGroups()">getGroups()</a> method.
By default, if element group is not specified, the element ID will act as its default group ID.
Group membership is a quick and easy way to organise similar model elements together and use this
categorization in <a href="/intent-matching.html">IDL</a> intents.
</p>
<p>
Note that the proper grouping of the elements is also necessary for the correct operation of
Short-Term-Memory (STM) in the conversational context. Consider a
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> that
represents a previously found model element that is stored in the conversation. Such token
will be overridden in the conversation by the more <b>recent token</b>
from the <b>same group</b> - a critical rule of maintaining the proper conversational context.
See
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
for mode details.
</p>
<span id="parent" class="section-sub-title">Element Parent <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Each model element can form an optional hierarchical relationship with other element by specifying its
parent element ID via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getParentId()">getParentID()</a> method.
The main idea here is that sometimes model elements can act not only individually but
their place in the hierarchy can be important too.
</p>
<p>
For example, we could have designed our transportation example model in a different way by using
multiple model elements linked with this hierarchy:
</p>
<pre>
+-- vehicle
| +--truck
| | |-- light.duty.truck
| | |-- heavy.duty.truck
| | +-- medium.duty.truck
| +--car
| | |-- coupe
| | |-- sedan
| | |-- hatchback
| | +-- wagon
</pre>
<p>
Then in our intent, for example, we could look for any token with root parent ID <code>vehicle</code>
or immediate parent ID <code>truck</code> or <code>car</code> without a need to match on all current and
future individual sub-IDs. For example:
</p>
<pre class="brush: idl">
intent=vehicle.intent term~{has(tok_ancestors, 'vehicle')}
intent=truck.intent term~{tok_parent == 'truck'}
intent=car.intent term~{tok_parent == 'car'}
</pre>
</section>
<section id="syns-tools">
<span id="macros" class="section-sub-title">Macros <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros
together with option groups allow for significant simplification of this task.
Macros allow you to give a name to an often used set of words or option groups and reuse it without
repeating those words or option groups again and again. A model provides a list of macros via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMacros()">getMacros()</a> method.
Each macro has a name in a form of <code>&lt;X&gt;</code> where <code>X</code>
is any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include
references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively
replaced with its value.
</p>
<p>
Here's a code snippet of macro definitions using JSON definition:
</p>
<pre class="brush: js">
"macros": [
{
"name": "&lt;A&gt;",
"macro": "aaa"
},
{
"name": "&lt;B&gt;",
"macro": "&lt;A&gt; bbb"
},
{
"name": "&lt;C&gt;",
"macro": "&lt;A&gt; bbb {z|w}"
}
]
</pre>
<span id="option-groups" class="section-sub-title">Option Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Option groups are similar to wildcard patterns that operates on a single word base. One line of
option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened
synonyms notation. The following examples demonstrate how to use option groups.
</p>
<p>
Consider the following macros defined below (note that macros <code>&lt;B&gt;</code> and <code>&lt;C&gt;</code>
are nested):
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt;</code></td>
<td><code>aaa</code></td>
</tr>
<tr>
<td><code>&lt;B&gt;</code></td>
<td><code>&lt;A&gt; bbb</code></td>
</tr>
<tr>
<td><code>&lt;C&gt;</code></td>
<td><code>&lt;A&gt; bbb {z|w}</code></td>
</tr>
</tbody>
</table>
<p>
Then the following option group expansions will occur in these examples:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Synonym</th>
<th>Synonym Expansions</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt; {b|_} c</code></td>
<td>
<code>"aaa b c"</code><br>
<code>"aaa c"</code>
</td>
</tr>
<tr>
<td><code>&lt;A&gt; {b|a}[1,2] c</code></td>
<td>
<code>"aaa b c"</code><br>
<code>"aaa b b c"</code><br>
<code>"aaa a c"</code><br>
<code>"aaa a a c"</code><br>
<code>"aaa c"</code>
</td>
</tr>
<tr>
<td>
<code>&lt;B&gt; {b|_} c</code><br>
or<br>
<code>&lt;B&gt; {b}[0,1] c</code>
</td>
<td>
<code>"aaa bbb b c"</code><br>
<code>"aaa bbb c"</code>
</td>
</tr>
<tr>
<td><code>{b|\{\_\}}</code></td>
<td>
<code>"b"</code><br>
<code>"b {_}"</code>
</td>
</tr>
<tr>
<td><code>a {b|_}. c</code></td>
<td>
<code>"a b. c"</code><br>
<code>"a . c"</code>
</td>
</tr>
<tr>
<td><code>a .{b, |_}. c</code></td>
<td>
<code>"a .b, . c"</code><br>
<code>"a .. c"</code>
</td>
</tr>
<tr>
<td><code>
{% raw %}a {{b|c}|_}.{% endraw %}</code></td>
<td>
<code>"a ."</code><br>
<code>"a b."</code><br>
<code>"a c."</code>
</td>
</tr>
<tr>
<td><code>a {% raw %}{{{&lt;C&gt;}}|{_}}{% endraw %} c</code></td>
<td>
<code>"a aaa bbb z c"</code><br>
<code>"a aaa bbb w c"</code><br>
<code>"a c"</code>
</td>
</tr>
<tr>
<td><code>{% raw %}{{{a}}} {b||_|{{_}}||_}{% endraw %}</code></td>
<td>
<code>"a b"</code><br>
<code>"a"</code>
</td>
</tr>
</tbody>
</table>
<p>
Specifically:
</p>
<ul>
<li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li>
<li>
<code>{A|B|_}</code> denotes either <code>A</code> or <code>B</code> or nothing.
<ul>
<li>Symbol <code>_</code> cam appear anywhere in the list of options, i.e. <code>{A|B|_}</code> is equal to <code>{A|_|B}</code>.</li>
</ul>
</li>
<li>
<code>{C}[x,y]</code> denotes an option group with quantifier, i.e. group <code>C</code> appearing from <code>x</code> to <code>y</code> times inclusive.
<ul>
<li>For example, <code>{C}[1,3]</code> is the same as <code>{C|C C|C C C}</code> notation.</li>
<li>Note that <code>{C|_}</code> is equal to <code>{C}[0,1]</code></li>
</ul>
</li>
<li>Excessive curly brackets are ignored, when safe to do so.</li>
<li>Macros cannot be recursive but can be nested.</li>
<li>Option groups can be nested.</li>
<li>
<code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and
<code>'_'</code> special symbols used by the option groups.
</li>
<li>Excessive whitespaces are trimmed when expanding option groups.</li>
</ul>
<p>
We can rewrite our transportation model element in a more efficient way using macros and option groups.
Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms
we would have to write out manually otherwise:
</p>
<pre class="brush: js, highlight: [4,5,14]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|_} {pickup|_} truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<span id="regex" class="section-sub-title">Regular Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is
considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that
regular expression can only span a single word, i.e. only individual words from the user input will be
matched against given regular expression and no whitespaces are allowed within regular expression. Note
also that option group special symbols <code>{</code>, <code>}</code>,
<code>|</code> and <code>_</code> have to be escaped in the regular expression using <code>\</code>
(backslash).
</p>
<p>
For example, the following synonym:
</p>
<pre class="brush: js">
"synonyms": [
"{foo|//[bar].+//}}"
]
</pre>
<p>
will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as
this string doesn't contain whitespaces.
</p>
<div class="bq info">
<b>Regular Expressions Performance</b>
<p>
It's important to note that regular expressions can significantly affect the performance of the
NLPCraft processing if used uncontrolled. Use it with caution and test the performance
of your model to ensure it meets your requirements.
</p>
</div>
<h2 id="dsl" class="section-sub-title">IDL Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Any individual synonym word that that starts and ends with <code>^^</code> is a
<a href="/intent-matching.html#idl">IDL expression.</a> IDL
expression inside of <code>^^ ... ^^</code> markers allows you to define a predicate on already parsed and detected token.
It is very important to note that unlike all other synonyms the IDL expression operates on a
already detected <em>token</em>, not on an individual unparsed <em>word</em>.
</p>
<p>
IDL expressions allows you to <em>compose</em> named entities, i.e. use one name entity when defining another one. For example,
we could define a model element for the race car using our previous transportation example (note how synonym on
<b>line 18</b>
references the element defined on <b>line 4</b>):
</p>
<pre class="brush: js, highlight: [4, 18]">
...
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"truck",
"{light|heavy|super|medium} duty {pickup|_} truck"
"sedan",
"coupe"
]
},
{
"id": "race.vehicle",
"description": "Race vehicle",
"synonyms": [
"{race|speed|track} ^^{# == 'transport.vehicle'}^^"
]
}
]
...
</pre>
<div class="bq warn">
<p>
<b>Greedy NERs <span class="amp">&</span> Synonyms Conflicts</b>
</p>
<p>
Note that in the above example you need to ensure that words <code>race</code>,
<code>speed</code> or <code>track</code> are not part of the <code>transport.vehicle</code>
token. It is particular important for the 3rd party NERs where specific rules about what
words can or cannot be part of the token are unclear or undefined. In such cases the only remedy is
to extensively test with 3rd party NERs and verify the synonyms recognition in data probe logs.
</p>
</div>
<p>
Another use case is to wrap 3rd party named entities to add group membership, metadata or hierarchical
relationship to the externally defined named entity. For example, you can wrap <code>google:location</code>
token and add group membership for <code>my_group</code> group:
</p>
<pre class="brush: js, highlight: [6,8]">
...
"elements": [
{
"id": "google.loc.wrap",
"description": "Wrapper for google location",
"groups": ["my_group"],
"synonyms": [
"^^{# == 'google:location'}^^"
]
}
]
...
</pre>
<b>IDL Expression Syntax</b>
<p>
IDL expressions are a subset of overall <a href="/intent-matching.html#idl">IDL syntax</a>. You can
review formal
<a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft/src/main/scala/org/apache/nlpcraft/model/intent/compiler/antlr4/NCIdl.g4">ANTLR4 grammar</a>
but basically
an IDL expression for synonym is a term expression with the optional alias at the beginning.
Here's an example of IDL expression defining a synonym for the population of any city in France:
</p>
<pre class="brush: js">
"synonyms": [
"population {of|for} ^^[city]{# == 'nlpcraft:city' && lowercase(meta_tok('city:country')) == 'france'}^^"
]
</pre>
<b>NOTES:</b>
<ul>
<li>Optional alias <code>city</code> can be used to access a constituent part token (with ID <code>nlpcraft:city</code>).</li>
<li>
The expression between <code>{</code> and <code>}</code> brackets is a standard IDL term expression.
</li>
</ul>
<h2 id="programmable_ners" class="section-sub-title">Programmable NERs <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
By default, the data model detects its elements by their synonyms, regexp or IDL expressions. However, in some cases
these methods are either not expressive enough or cannot be used. For example, detecting model elements based
on neural networks or integration with a non-standard 3rd-party NER components. In such cases, a user-defined parser
can be defined for the model that would allow the user to define its own arbitrary NER logic to detect the model elements
in the user input programmatically. Note that a custom parser can detect any number of model elements.
</p>
<p>
Model provides its custom parsers via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getParsers()">getParsers()</a> method.
</p>
</section>
<section id="logic">
<h2 class="section-title">Model Logic <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
When a user sends its request via REST API it is received by the REST server. Upon receipt,
the REST server does the basic NLP processing and enriching. Once finished, the REST server
sends the enriched request down to a specific data probe selected based on the requested data model.
</p>
<p>
The model logic is defined in <a href="intent-matching.html">intents</a>, specifically in the intent callbacks that get called when
their intent is chosen as a winning match against the user request.
Below we will quickly discuss the key APIs that are essential for developing intent callbacks.
Note that this does now replace a more detailed <a target=_ href="/apis/latest/index.html">Javadoc</a>
documentation that you are encouraged to read through as well:
</p>
<ul>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li>
<li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li>
<li>Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li>
</ul>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This interface provides read-only view on data model. Model view defines a declarative, or configurable, part of the model.
All properties in this interface can be defined or overridden in JSON/YAML external
presentation when used with <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> adapter.
</p>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This interface defines a context of a particular intent match. It can be passed into the callback of the matched intent
and provides the following:
</p>
<ul>
<li>ID of the matched intent.</li>
<li>Specific parsing variant that was matched against this intent.</li>
<li>Access to the original query context (<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a>).</li>
<li>Various access APIs for intent tokens.</li>
</ul>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This interface provides all available data about the parsed user input and all its
supplemental information. It's accessible from <code>NCIntentMatch</code> interface and
provide large amount of information to the intent callback logic:
</p>
<ul>
<li>
Server request ID. Server request is defined as a processing of one user input sentence.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a>
for controlling STM of conversation manager and dialog flow.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a>
instance that the intent callback method belongs to giving access to entire static model configuration.
</li>
<li>
Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> that
provides detailed information about the user input.
</li>
<li>
List of parsing variants provided
by <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a>
method. When the user sentence gets parsed into individual tokens (i.e. detected model elements) there is generally
more than one way to do it. This ambiguity is perfectly fine because only the data model has all the
necessary information to select one parsing variant that fits that model the best. Without the data model
there isn't enough context to determine which variant is the best fitting.
Method <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a>
returns list of all parsing variants for a given user input.
</li>
</ul>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> interface
is one of the several important entities in Data Model API that you as a model developer will be working with. You
should review its <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">Javadoc</a> but
here is an outline of the information it provides:
</p>
<ul>
<li>
Information about the user that issued the request.
</li>
<li>
User agent and remote address, if any available, of the user's application that made the initial REST call.
</li>
<li>
Original request text, timestamp of its receipt, and server request ID.
</li>
</ul>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> object is another
key abstraction in Data Model API. A token is a detected model element and is a part of a fully parsed user input.
Sequence of tokens represents parsed user input. A single token corresponds to a one or more words, sequential
or not, in the user sentence.
</p>
<p>
Most of the token's information is stored in map-based metadata accessible via
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">getMetadata()</a> method.
Depending on the token ID each token will have different set of <a href="#meta">metadata properties</a>. Some common NLP properties
are always present for tokens of all types.
</p>
<h2 class="section-sub-title">Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This class defines the result returned from model's intent callbacks. Result consists of the
text body and the type. The result types are similar in notion to MIME type and have specific meaning only for REST applications
that interpret them accordingly. For example, the REST client interfacing between NLPCraft and Amazon Alexa or Apple HomeKit could
only accept text result type and ignore everything else.
</p>
<h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a>
provides support for mutable runtime-only metadata. This interface can be used to attach user-defined runtime data
to variety of different objects in NLPCraft API. This interface is implemented by the following types:
</p>
<ul>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCompany.html">NCCompany</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCUser.html">NCUser</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomElement.html">NCCustomElement</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCDialogFlowItem.html">NCDialogFlowItem</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li>
<li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCVariant.html">NCVariant</a></li>
</ul>
</section>
<section id="builtin">
<h2 class="section-title">Built-In Tokens <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
NLPCraft provides a number of built-in model elements (i.e. tokens) including the
<a href="integrations.html">integration</a> with several popular 3rd party NER frameworks. Table
below provides information about these built-in tokens. Section about <a href="#meta">token metadata</a> provides
further information about <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">metadata</a> that each type of token carries.
</p>
<p>
Built-in tokens have to be explicitly enabled on both the REST server and in the model. See
<code>nlpcraft.server.tokenProviders</code> configuration property and
<a target="javadoc" href="apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView#getEnabledBuiltInTokens()</a>
method for more details. By default, only NLPCraft tokens are enabled (token ID
starting with <code>nlpcraft</code>).
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Token ID</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>nlpcraft:nlp</code></td>
<td>
<p>
This token denotes a word (always a single word) that is not a part of any other token. It's
also call a free-word, i.e. a word that is not linked to any other detected model element.
</p>
<p>
<b>NOTE:</b> the metadata from this token defines a common set of NLP properties and
is present in every other token as well.
</p>
</td>
<td>
<ul>
<li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:date</code></td>
<td>
This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not
currently recognize time component.
</td>
<td>
<ul>
<li>Meeting <code>next tuesday</code>.</li>
<li>Report for entire <code>2018 year</code>.</li>
<li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:num</code></td>
<td>
This token denotes a single numeric value or numeric condition.
</td>
<td>
<ul>
<li>Price <code>&gt; 100</code>.</li>
<li>Price is <code>less than $100</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:continent</code></td>
<td>
This token denotes a geographical continent.
</td>
<td>
<ul>
<li>Population of <code>Africa</code>.</li>
<li>Surface area of <code>America</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:subcontinent</code></td>
<td>
This token denotes a geographical subcontinent.
</td>
<td>
<ul>
<li>Population of <code>Alaskan peninsula</code>.</li>
<li>Surface area of <code>South America</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:region</code></td>
<td>
This token denotes a geographical region/state.
</td>
<td>
<ul>
<li>Population of <code>California</code>.</li>
<li>Surface area of <code>South Dakota</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:country</code></td>
<td>
This token denotes a country.
</td>
<td>
<ul>
<li>Population of <code>France</code>.</li>
<li>Surface area of <code>USA</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:city</code></td>
<td>
This token denotes a city.
</td>
<td>
<ul>
<li>Population of <code>Paris</code>.</li>
<li>Surface area of <code>Washington DC</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:metro</code></td>
<td>
This token denotes a metro area.
</td>
<td>
<ul>
<li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li>
<li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:sort</code></td>
<td>
This token denotes a sorting or ordering.
</td>
<td>
<ul>
<li>Report <code>sorted from top to bottom</code>.</li>
<li>Analysis <code>sorted in descending order</code>.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:limit</code></td>
<td>
This token denotes a numerical limit.
</td>
<td>
<ul>
<li>Show <code>top 5</code> brands.</li>
<li>Show <code>several</code> brands.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:coordinate</code></td>
<td>
This token denotes a latitude and longitude coordinates.
</td>
<td>
<ul>
<li>Route the path to <code>55.7558, 37.6173</code> location.</li>
</ul>
</td>
</tr>
<tr>
<td><code>nlpcraft:relation</code></td>
<td>
This token denotes a relation function:
<code>compare</code> or
<code>correlate</code>. Note this token always need another two tokens that it references.
</td>
<td>
<ul>
<li>
What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code>
(assuming that 'price' and 'location' are also detected tokens).
</li>
</ul>
</td>
</tr>
<tr>
<td><code>google:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
<code>google:person</code>, <code>google:location</code>, etc.
</p>
<p>
See <a href="integrations.html#google">integration</a> section for more details on how
to configure Google named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>opennlp:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
<code>opennlp:person</code>, <code>opennlp:money</code>, etc.
</p>
<p>
See <a href="integrations.html#opennlp">integration</a> section for more details on how
to configure Apache OpenNLP named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants under <code>100$</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>spacy:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
<code>spacy:person</code>, <code>spacy:location</code>, etc.
</p>
<p>
See <a href="integrations.html#spacy">integration</a> section for more details on how
to configure spaCy named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
<tr>
<td><code>stanford:xxx</code></td>
<td>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
<code>stanford:person</code>, <code>stanford:location</code>, etc.
</p>
<p>
See <a href="integrations.html#stanford">integration</a> section for more details on how
to configure Stanford CoreNLP named entity provider.
</p>
</td>
<td>
<ul>
<li>
Articles by <code>Ken Thompson</code>.
</li>
<li>
Best restaurants in <code>Paris</code>.
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</section>
<section id="meta">
<h2 class="section-title">Token Metadata <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Each token has different set of metadata. Sections below describe metadata for each built-in token
supported by NLPCraft:
</p>
<ul>
<li><a href="#nlpcraft:nlp">Token ID <code>nlpcraft:nlp</code></a></li>
<li><a href="#nlpcraft:date">Token ID <code>nlpcraft:date</code></a></li>
<li><a href="#nlpcraft:num">Token ID <code>nlpcraft:num</code></a></li>
<li><a href="#nlpcraft:city">Token ID <code>nlpcraft:city</code></a></li>
<li><a href="#nlpcraft:continent">Token ID <code>nlpcraft:continent</code></a></li>
<li><a href="#nlpcraft:subcontinent">Token ID <code>nlpcraft:subcontinent</code></a></li>
<li><a href="#nlpcraft:region">Token ID <code>nlpcraft:region</code></a></li>
<li><a href="#nlpcraft:country">Token ID <code>nlpcraft:country</code></a></li>
<li><a href="#nlpcraft:metro">Token ID <code>nlpcraft:metro</code></a></li>
<li><a href="#nlpcraft:coordinate">Token ID <code>nlpcraft:coordinate</code></a></li>
<li><a href="#nlpcraft:sort">Token ID <code>nlpcraft:sort</code></a></li>
<li><a href="#nlpcraft:limit">Token ID <code>nlpcraft:limit</code></a></li>
<li><a href="#nlpcraft:relation">Token ID <code>nlpcraft:relation</code></a></li>
<li><a href="#stanford:xxx">Token ID <code>stanford:xxx</code></a></li>
<li><a href="#spacy:xxx">Token ID <code>spacy:xxx</code></a></li>
<li><a href="#google:xxx">Token ID <code>google:xxx</code></a></li>
<li><a href="#opennlp:xxx">Token ID <code>opennlp:xxx</code></a></li>
</ul>
<div class="bq info">
<p>
<b>Metadata Name Conflicts</b>
</p>
<p>
Note that model element metadata gets merged into the same map container as common NLP token metadata
(see <code>nlpcraft:nlp:xxx</code> properties below).
In other words, their share the same namespace. It is important to remember that and choose unique names
for user-defined metadata properties. One possible way that is used by NLPCraft internally is to prefix
metadata name with some unique prefix based on the token ID.
</p>
</div>
<span id="nlpcraft:nlp" class="section-sub-title">Token ID <code>nlpcraft:nlp</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token's metadata provides common basic NLP properties that are part of any token.
<b>All tokens</b> without exception have these metadata properties. This metadata
represents a common set of NLP properties for a given token. All these metadata properties are <b>mandatory</b>.
Note also that interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a>
provides a direct access to most of these properties.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:nlp:unid</b></code></td>
<td><code>java.lang.String</code></td>
<td>Internal globally unique system ID of the token.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:bracketed</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token is surrounded by any of <code>'['</code>, <code>']'</code>, <code>'{'</code>, <code>'}'</code>, <code>'('</code>, <code>')'</code> brackets.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:freeword</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token represents a free word. A free word is a token that was detected neither as a part of user defined or system tokens.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:direct</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>Whether or not this token was matched on direct (not permutated) synonym.</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:english</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this token represents an English word. Note that this only checks that token's text
consists of characters of English alphabet, i.e. the text doesn't have to be necessary a
known valid English word. See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()" target="javadoc">NCModelView.isNonEnglishAllowed()</a> method
for corresponding model configuration.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:lemma</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Lemma of this token, i.e. a canonical form of this word. Note that stemming and
lemmatization allow to reduce inflectional forms and sometimes derivationally related forms
of a word to a common base form. Lemmatization refers to the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and to
return the base or dictionary form of a word, which is known as the lemma.
Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:stem</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Stem of this token. Note that stemming and lemmatization allow to reduce inflectional forms
and sometimes derivationally related forms of a word to a common base form. Unlike lemma,
stemming is a basic heuristic process that chops off the ends of words in the hope of
achieving this goal correctly most of the time, and often includes the removal of derivational
affixes.
Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:pos</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Penn Treebank POS tag for this token. Note that additionally to standard Penn Treebank POS
tags NLPCraft introduced '-&#45;&#45;' synthetic tag to indicate a POS tag for multiword tokens.
Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:posdesc</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Description of Penn Treebank POS tag.
Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a>
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:swear</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is a swear word. NLPCraft has built-in list of common English swear words.
See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()" target="javadoc">NCModelView.isSwearWordsAllowed()</a> for corresponding model configuration
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:origtext</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Original user input text for this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:normtext</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Normalized user input text for this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:sparsity</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Numeric value of how sparse the token is. Sparsity zero means that all individual words in
the token follow each other.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:minindex</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of the first word in this token. Note that token may not be contiguous.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:maxindex</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of the last word in this token. Note that token may not be contiguous.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:wordindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>
List of original word indexes in this token. Note that a token can have words that are not
contiguous in the original sentence. Always has at least one element in it.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:wordlength</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Number of individual words in this token. Equal to the size of <code>wordindexes</code> list.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:contiguous</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token has zero sparsity, i.e. consists of contiguous words.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:start</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Start character index of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:end</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
End character index of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:index</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Index of this token in the sentence.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:charlength</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Character length of this token.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:quoted</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is surrounded by single or double quotes.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:stopword</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is a stopword. Stopwords are some extremely common words which
add little value in helping understanding user input and are excluded from the processing entirely.
For example, words like a, the, can, of, about, over, etc. are typical stopwords in English.
NLPCraft has built-in set of stopwords.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:nlp:dict</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not this token is found in Princeton WordNet database.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:date" class="section-sub-title">Token ID <code>nlpcraft:date</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a date range including single days.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b>.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:date:from</b></code></td>
<td><code>java.lang.Long</code></td>
<td>
Start timestamp of the datetime range.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:date:to</b></code></td>
<td><code>java.lang.Long</code></td>
<td>
End timestamp of the datetime range.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:num" class="section-sub-title">Token ID <code>nlpcraft:num</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a single numerical value or a numeric condition.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:num:from</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
Start of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
and <code>to</code> are the same this token represent a single value (whole or fractional) in
which case <code>isequalcondition</code>> will be <code>true</code>.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:to</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
Ed of numeric range that satisfies the condition (exclusive). Note that if <code>from</code>
and <code>to</code> are the same this token represent a single value (whole or fractional) in
which case <code>isequalcondition</code>> will be <code>true</code>.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:fromincl</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not start of the numeric range is inclusive
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:toincl</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether or not end of the numeric range is inclusive
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isequalcondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is an equality condition. Note that single numeric values also default to equality
condition and this property will be <code>true</code>. Indeed, <code>A is equal to 2</code> and
<code>A is 2</code> have the same meaning.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isnotequalcondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is a not-equality condition.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isfromnegativeinfinity</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this range is from negative infinity.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:israngecondition</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this is a range condition.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:istopositiveinfinity</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this range is to positive infinity.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:isfractional</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether this token's value (single numeric value of a range) is a whole or a fractional number.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:unit</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>
Optional numeric value unit name (see below).
</td>
</tr>
<tr>
<td><code><b>nlpcraft:num:unittype</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>
Optional numeric value unit type (see below).
</td>
</tr>
</tbody>
</table>
<p>
Following table provides possible values for <code><b>nlpcraft:num:unit</b></code> and <code><b>nlpcraft:num:unittype</b></code>
properties:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>num:unittype</th>
<th>num:unit <sub>possible values</sub></th>
</tr>
</thead>
<tbody>
<tr><td><code>mass</code></td><td><code>feet per second</code><br/><code>grams</code><br/><code>kilogram</code><br/><code>grain</code><br/><code>dram</code><br/><code>ounce</code><br/><code>pound</code><br/><code>hundredweight</code><br/><code>ton</code><br/><code>tonne</code><br/><code>slug</code></td>
<tr><td><code>torque</code></td><td><code>newton meter</code></td>
<tr><td><code>area</code></td><td><code>square meter</code><br/><code>acre</code><br/><code>are</code><br/><code>hectare</code><br/><code>square inches</code><br/><code>square feet</code><br/><code>square yards</code><br/><code>square miles</code></td>
<tr><td><code>paper quantity</code></td><td><code>paper bale</code></td>
<tr><td><code>force</code></td><td><code>kilopond</code><br/><code>pond</code></td>
<tr><td><code>pressure</code></td><td><code>pounds per square inch</code></td>
<tr><td><code>solid angle</code></td><td><code>steradian</code></td>
<tr><td><code>pressure</code><br/><code>stress</code></td><td><code>pascal</code></td>
<tr><td><code>luminous</code></td><td><code>flux</code><br/><code>lumen</code></td>
<tr><td><code>amount of substance</code></td><td><code>mole</code></td>
<tr><td><code>luminance</code></td><td><code>candela per square metre</code></td>
<tr><td><code>angle</code></td><td><code>radian</code><br/><code>degree</code></td>
<tr><td><code>magnetic flux density</code><br/><code>magnetic field</code></td><td><code>tesla</code></td>
<tr><td><code>power</code><br/><code>radiant flux</code></td><td><code>watt</code></td>
<tr><td><code>datetime</code></td><td><code>second</code><br/><code>minute</code><br/><code>hour</code><br/><code>day</code><br/><code>week</code><br/><code>month</code><br/><code>year</code></td>
<tr><td><code>electrical inductance</code></td><td><code>henry</code></td>
<tr><td><code>electric charge</code></td><td><code>coulomb</code></td>
<tr><td><code>temperature</code></td><td><code>kelvin</code><br/><code>centigrade</code><br/><code>fahrenheit</code></td>
<tr><td><code>voltage</code><br/><code>electrical</code></td><td><code>volt</code></td>
<tr><td><code>momentum</code></td><td><code>kilogram meters per second</code></td>
<tr><td><code>amount of heat</code></td><td><code>calorie</code></td>
<tr><td><code>electrical capacitance</code></td><td><code>farad</code></td>
<tr><td><code>radioactive decay</code></td><td><code>becquerel</code></td>
<tr><td><code>electrical conductance</code></td><td><code>siemens</code></td>
<tr><td><code>luminous intensity</code></td><td><code>candela</code></td>
<tr><td><code>work</code><br/><code>energy</code></td><td><code>joule</code></td>
<tr><td><code>quantities</code></td><td><code>dozen</code></td>
<tr><td><code>density</code></td><td><code>density</code></td>
<tr><td><code>sound</code></td><td><code>decibel</code></td>
<tr><td><code>electrical resistance</code><br/><code>impedance</code></td><td><code>ohm</code></td>
<tr><td><code>force</code><br/><code>weight</code></td><td><code>newton</code></td>
<tr><td><code>light quantity</code></td><td><code>lumen seconds</code></td>
<tr><td><code>length</code></td><td><code>meter</code><br/><code>millimeter</code><br/><code>centimeter</code><br/><code>decimeter</code><br/><code>kilometer</code><br/><code>astronomical unit</code><br/><code>light year</code><br/><code>parsec</code><br/><code>inch</code><br/><code>foot</code><br/><code>yard</code><br/><code>mile</code><br/><code>nautical mile</code></td>
<tr><td><code>refractive index</code></td><td><code>diopter</code></td>
<tr><td><code>frequency</code></td><td><code>hertz</code><br/><code>angular frequency</code></td>
<tr><td><code>power</code></td><td><code>kilowatt</code><br/><code>horsepower</code><br/><code>bar</code></td>
<tr><td><code>magnetic flux</code></td><td><code>weber</code></td>
<tr><td><code>current</code></td><td><code>ampere</code></td>
<tr><td><code>acceleration of gravity</code></td><td><code>gravity imperial</code><br/><code>gravity metric</code></td>
<tr><td><code>volume</code></td><td><code>cubic meter</code><br/><code>liter</code><br/><code>milliliter</code><br/><code>centiliter</code><br/><code>deciliter</code><br/><code>hectoliter</code><br/><code>cubic inch</code><br/><code>cubic foot</code><br/><code>cubic yard</code><br/><code>acre-foot</code><br/><code>teaspoon</code><br/><code>tablespoon</code><br/><code>fluid ounce</code><br/><code>cup</code><br/><code>gill</code><br/><code>pint</code><br/><code>quart</code><br/><code>gallon</code></td>
<tr><td><code>speed</code></td><td><code>miles per hour</code><br/><code>meters per second</code></td>
<tr><td><code>illuminance</code></td><td><code>lux</code></td>
</tbody>
</table>
<br/>
<span id="nlpcraft:city" class="section-sub-title">Token ID <code>nlpcraft:city</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a city.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:city:city</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the city.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for city's country (see below).
</td>
</tr>
<tr>
<td><code><b>nlpcraft:city:citymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for city (see below).
</td>
</tr>
</tbody>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:city:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:city:citymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>latitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>City latitude.</td>
</tr>
<tr>
<td><code><b>longitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>City longitude.</td>
</tr>
<tr>
<td><code><b>population</b></code></td>
<td><code>java.lang.Long</code></td>
<td>City population.</td>
</tr>
<tr>
<td><code><b>elevation</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Integer</code></td>
<td>Optional city elevation in meters.</td>
</tr>
<tr>
<td><code><b>timezone</b></code></td>
<td><code>java.lang.String</code></td>
<td>City timezone.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:continent" class="section-sub-title">Token ID <code>nlpcraft:continent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a continent.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:continent:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the continent.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:subcontinent" class="section-sub-title">Token ID <code>nlpcraft:subcontinent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a subcontinent.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:subcontinent:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the continent.</td>
</tr>
<tr>
<td><code><b>nlpcraft:subcontinent:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the subcontinent.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:metro" class="section-sub-title">Token ID <code>nlpcraft:metro</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a metro area.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:metro:metro</b></code></td>
<td><code>java.lang.String</code></td>
<td>Name of the metro area.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:region" class="section-sub-title">Token ID <code>nlpcraft:region</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a geographical region.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
</tbody>
<tr>
<td><code><b>nlpcraft:region:region</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the region.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:region:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for region's country (see below).
</td>
</tr>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:region:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:country" class="section-sub-title">Token ID <code>nlpcraft:country</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a country.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
</tbody>
<tr>
<td><code><b>nlpcraft:country:country</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Name of the country.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Continent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:subcontinent</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Subcontinent name.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:country:countrymeta</b></code></td>
<td><code>java.util.Map</code></td>
<td>
Supplemental metadata for region's country (see below).
</td>
</tr>
</table>
<p>
Following tables provides possible values for <code><b>nlpcraft:country:countrymeta</b></code> map. The data is
obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Key</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>iso</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>iso3</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO 3166 country code.</td>
</tr>
<tr>
<td><code><b>isocode</b></code></td>
<td><code>java.lang.String</code></td>
<td>ISO country code.</td>
</tr>
<tr>
<td><code><b>capital</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country capital city name.</td>
</tr>
<tr>
<td><code><b>area</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Double</code></td>
<td>Optional country surface area.</td>
</tr>
<tr>
<td><code><b>population</b></code> <sub>opt.</sub></td>
<td><code>java.lang.Long</code></td>
<td>Optional country population.</td>
</tr>
<tr>
<td><code><b>continent</b></code></td>
<td><code>java.lang.String</code></td>
<td>Optional country continent.</td>
</tr>
<tr>
<td><code><b>currencycode</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency code.</td>
</tr>
<tr>
<td><code><b>currencyname</b></code></td>
<td><code>java.lang.String</code></td>
<td>Country currency name.</td>
</tr>
<tr>
<td><code><b>phone</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country phone code.</td>
</tr>
<tr>
<td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code format.</td>
</tr>
<tr>
<td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country postal code regular expression.</td>
</tr>
<tr>
<td><code><b>languages</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of languages.</td>
</tr>
<tr>
<td><code><b>neighbours</b></code> <sub>opt.</sub></td>
<td><code>java.lang.String</code></td>
<td>Optional country list of neighbours.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:coordinate" class="section-sub-title">Token ID <code>nlpcraft:coordinate</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a latitude and longitude coordinate.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>coordinate:latitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Coordinate latitude.</td>
</tr>
<tr>
<td><code><b>coordinate:longitude</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Coordinate longitude.</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:sort" class="section-sub-title">Token ID <code>nlpcraft:sort</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a sorting or ordering function.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:sort:subjindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>One of more indexes of the target tokens (i.e. the token that being sorted).</td>
</tr>
<tr>
<td><code><b>nlpcraft:sort:byindexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Zero or more (i.e. optional) indexes of the reference token (i.e. the token being sorted by).</td>
</tr>
<tr>
<td><code><b>nlpcraft:sort:asc</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether sorting is in ascending or descending order.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:limit" class="section-sub-title">Token ID <code>nlpcraft:limit</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a numeric limit value (like in "top 10" or "bottom five").
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:limit:indexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Index (always only one) of the reference token (i.e. the token being limited).</td>
</tr>
<tr>
<td><code><b>nlpcraft:limit:asc</b></code></td>
<td><code>java.lang.Boolean</code></td>
<td>
Whether limit order is ascending or descending.
</td>
</tr>
<tr>
<td><code><b>nlpcraft:limit:limit</b></code></td>
<td><code>java.lang.Integer</code></td>
<td>
Numeric value of the limit.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="nlpcraft:relation" class="section-sub-title">Token ID <code>nlpcraft:relation</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
This token denotes a numeric limit value (like in "top 10" or "bottom five").
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>nlpcraft:relation:indexes</b></code></td>
<td><code>java.util.List&lt;Integer&gt;</code></td>
<td>Index (always only one) of the reference token (i.e. the token being related to).</td>
</tr>
<tr>
<td><code><b>nlpcraft:relation:type</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Type of the relation. One of the following values:
<ul>
<li><code>compare</code></li>
<li><code>correlate</code></li>
</ul>
</td>
</tr>
</tbody>
</table>
<br/>
<span id="google:xxx" class="section-sub-title">Token ID <code>google:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e.
<code>google:person</code>, <code>google:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>google:salience</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by Google Natural Language.</td>
</tr>
<tr>
<td><code><b>google:meta</b></code></td>
<td><code>java.util.Map&lt;String&gt;</code></td>
<td>
Map-based container for Google Natural Language specific properties.
</td>
</tr>
<tr>
<td><code><b>google:mentionsbeginoffsets</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mention begin offsets in the original normalized text.
</td>
</tr>
<tr>
<td><code><b>google:mentionscontents</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mentions.
</td>
</tr>
<tr>
<td><code><b>google:mentionstypes</b></code></td>
<td><code>java.util.List&lt;String&gt;</code></td>
<td>
List of the mention types.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="stanford:xxx" class="section-sub-title">Token ID <code>stanford:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e.
<code>stanford:person</code>, <code>stanford:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>stanford:confidence</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by Stanford CoreNLP.</td>
</tr>
<tr>
<td><code><b>stanford:nne</b></code></td>
<td><code>java.lang.String</code></td>
<td>
Normalized Named Entity (NNE) text.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="spacy:xxx" class="section-sub-title">Token ID <code>spacy:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://spacy.io/">spaCy</a>, i.e.
<code>spacy:person</code>, <code>spacy:location</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>spacy:vector</b></code></td>
<td><code>java.lang.Double</code></td>
<td>spaCy span vector. </td>
</tr>
<tr>
<td><code><b>spacy:sentiment</b></code></td>
<td><code>java.lang.Double</code></td>
<td>
A scalar value indicating the positivity or negativity of the token.
</td>
</tr>
</tbody>
</table>
<br/>
<span id="opennlp:xxx" class="section-sub-title">Token ID <code>opennlp:xxx</code></span>
<p>
These tokens denote <code>xxx</code> that is a lower case name of the named entity
in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e.
<code>opennlp:person</code>, <code>opennlp:money</code>, etc.
Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following
metadata properties all of which are <b>mandatory</b> unless otherwise noted.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Property</th>
<th>Java Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code><b>opennlp:probability</b></code></td>
<td><code>java.lang.Double</code></td>
<td>Correctness probability of this token by OpenNLP.</td>
</tr>
</tbody>
</table>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Model Overview</a></li>
<li><a href="#dataflow">Model Dataflow</a></li>
<li><a href="#lifecycle">Model Lifecycle</a></li>
<li><a href="#config">Model Configuration</a></li>
<li><a href="#ne">Named Entities</a></li>
<li><a href="#elements">Model Elements</a></li>
<li><a class="toc2" href="#macros">Macros</a></li>
<li><a class="toc2" href="#regex">Regular Expressions</a></li>
<li><a class="toc2" href="#option-groups">Option Groups</a></li>
<li><a class="toc2" href="#dsl">IDL Expression</a></li>
<li><a class="toc2" href="#programmable_ners">Programmable NERs</a></li>
<li><a href="#logic">Model Logic</a></li>
<li><a href="#builtin">Built-In Tokens</a></li>
<li><a href="#meta">Token Metadata</a></li>
<li><a class="toc2" href="#nlpcraft:nlp"><code>nlpcraft:nlp</code></a></li>
<li><a class="toc2" href="#nlpcraft:date"><code>nlpcraft:date</code></a></li>
<li><a class="toc2" href="#nlpcraft:num"><code>nlpcraft:num</code></a></li>
<li><a class="toc2" href="#nlpcraft:city"><code>nlpcraft:city</code></a></li>
<li><a class="toc2" href="#nlpcraft:continent"><code>nlpcraft:continent</code></a></li>
<li><a class="toc2" href="#nlpcraft:subcontinent"><code>nlpcraft:subcontinent</code></a></li>
<li><a class="toc2" href="#nlpcraft:region"><code>nlpcraft:region</code></a></li>
<li><a class="toc2" href="#nlpcraft:country"><code>nlpcraft:country</code></a></li>
<li><a class="toc2" href="#nlpcraft:metro"><code>nlpcraft:metro</code></a></li>
<li><a class="toc2" href="#nlpcraft:coordinate"><code>nlpcraft:coordinate</code></a></li>
<li><a class="toc2" href="#nlpcraft:sort"><code>nlpcraft:sort</code></a></li>
<li><a class="toc2" href="#nlpcraft:limit"><code>nlpcraft:limit</code></a></li>
<li><a class="toc2" href="#nlpcraft:relation"><code>nlpcraft:relation</code></a></li>
<li><a class="toc2" href="#stanford:xxx"><code>stanford:xxx</code></a></li>
<li><a class="toc2" href="#spacy:xxx"><code>spacy:xxx</code></a></li>
<li><a class="toc2" href="#google:xxx"><code>google:xxx</code></a></li>
<li><a class="toc2" href="#opennlp:xxx"><code>opennlp:xxx</code></a></li>
{% include quick-links.html %}
</ul>
</div>