| --- |
| active_crumb: Data Model |
| layout: documentation |
| id: data_model |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Model Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Data model is a central concept in NLPCraft defining natural language interface to your data sources |
| like a database or a SaaS application. |
| NLPCraft employs a <em>model-as-a-code</em> approach where entire data model is an implementation of |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface which |
| can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy. |
| </p> |
| <p> |
| A data model defines: |
| </p> |
| <ul> |
| <li>Set of model <a href="#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li> |
| <li>Zero or more intents and their callbacks.</li> |
| <li>Common model configuration and various life-cycle callbacks.</li> |
| </ul> |
| <p> |
| Note that model-as-a-code approach natively supports any software life |
| cycle tools and frameworks like various build tools, CI/SCM tools, IDEs, etc. |
| You don't have to use additional web-based tools to manage some aspects of your |
| data models - your entire model and all of its components are part of your project source code. |
| </p> |
| <p> |
| Here's two quick examples of the fully-functional data model implementations (from <a href="/examples/light_switch.html">Light Switch</a> and |
| <a href="/examples/alarm_clock.html">Alarm Clock</a> examples). You will find specific details about these |
| implementations in the following sections: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch" role="tab"><b>LightSwitch <code><sub>ex</sub></code></b></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#alarm" role="tab"><b>Alarm <code><sub>ex</sub></code></b></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="lightswitch" role="tabpanel"> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#lightswitch_scala_model" role="tab"><code>LightSwitchModel.scala</code></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#lightswitch_yaml_model" role="tab"><code>lightswitch_model.yaml</code></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="lightswitch_scala_model" role="tabpanel"> |
| <pre class="brush: scala"> |
| package org.apache.nlpcraft.examples.lightswitch |
| |
| import org.apache.nlpcraft.model.{NCIntentTerm, _} |
| |
| class LightSwitchModel extends NCModelFileAdapter("lightswitch_model.yaml") { |
| @NCIntentRef("ls") |
| @NCIntentSample(Array( |
| "Turn the lights off in the entire house.", |
| "Switch on the illumination in the master bedroom closet.", |
| "Get the lights on.", |
| "Lights up in the kitchen.", |
| "Please, put the light out in the upstairs bedroom.", |
| "Set the lights on in the entire house.", |
| "Turn the lights off in the guest bedroom.", |
| "Could you please switch off all the lights?", |
| "Dial off illumination on the 2nd floor.", |
| "Please, no lights!", |
| "Kill off all the lights now!", |
| "No lights in the bedroom, please.", |
| "Light up the garage, please!" |
| )) |
| def onMatch( |
| @NCIntentTerm("act") actTok: NCToken, |
| @NCIntentTerm("loc") locToks: List[NCToken] |
| ): NCResult = { |
| val status = if (actTok.getId == "ls:on") "on" else "off" |
| val locations = |
| if (locToks.isEmpty) |
| "entire house" |
| else |
| locToks.map(_.meta[String]("nlpcraft:nlp:origtext")).mkString(", ") |
| |
| // Add HomeKit, Arduino or other integration here. |
| |
| // By default - return a descriptive action string. |
| NCResult.text(s"Lights are [$status] in [${locations.toLowerCase}].") |
| } |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="lightswitch_yaml_model" role="tabpanel"> |
| <pre class="brush: js"> |
| id: "nlpcraft.lightswitch.ex" |
| name: "Light Switch Example Model" |
| version: "1.0" |
| description: "NLI-powered light switch example model." |
| macros: |
| - name: "<ACTION>" |
| macro: "{turn|switch|dial|let|set|get|put}" |
| - name: "<KILL>" |
| macro: "{shut|kill|stop|eliminate}" |
| - name: "<ENTIRE_OPT>" |
| macro: "{entire|full|whole|total|_}" |
| - name: "<FLOOR_OPT>" |
| macro: "{upstairs|downstairs|{1st|first|2nd|second|3rd|third|4th|fourth|5th|fifth|top|ground} floor|_}" |
| - name: "<TYPE>" |
| macro: "{room|closet|attic|loft|{store|storage} {room|_}}" |
| - name: "<LIGHT>" |
| macro: "{all|_} {it|them|light|illumination|lamp|lamplight}" |
| enabledBuiltInTokens: [] # This example doesn't use any built-in tokens. |
| |
| # |
| # Allows for multi-word synonyms in this entire model |
| # to be sparse and permutate them for better detection. |
| # These two properties generally enable a free-form |
| # natural language comprehension. |
| # |
| permutateSynonyms: true |
| sparse: true |
| |
| elements: |
| - id: "ls:loc" |
| description: "Location of lights." |
| synonyms: |
| - "<ENTIRE_OPT> <FLOOR_OPT> {kitchen|library|closet|garage|office|playroom|{dinning|laundry|play} <TYPE>}" |
| - "<ENTIRE_OPT> <FLOOR_OPT> {master|kid|children|child|guest|_} {bedroom|bathroom|washroom|storage} {<TYPE>|_}" |
| - "<ENTIRE_OPT> {house|home|building|{1st|first} floor|{2nd|second} floor}" |
| |
| - id: "ls:on" |
| groups: |
| - "act" |
| description: "Light switch ON action." |
| synonyms: |
| - "<ACTION> {on|up|_} <LIGHT> {on|up|_}" |
| - "<LIGHT> {on|up}" |
| |
| - id: "ls:off" |
| groups: |
| - "act" |
| description: "Light switch OFF action." |
| synonyms: |
| - "<ACTION> <LIGHT> {off|out|down}" |
| - "{<ACTION>|<KILL>} {off|out|down} <LIGHT>" |
| - "<KILL> <LIGHT>" |
| - "<LIGHT> <KILL>" |
| - "{out|no|off|down} <LIGHT>" |
| - "<LIGHT> {out|off|down}" |
| |
| intents: |
| - "intent=ls term(act)={has(tok_groups, 'act')} term(loc)={# == 'ls:loc'}*" |
| </pre> |
| </div> |
| </div> |
| </div> |
| <div class="tab-pane fade show" id="alarm" role="tabpanel"> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#alarm_java_model" role="tab"><code>AlarmModel.java</code></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#alarm_intents_idl" role="tab"><code>intents.idl</code></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#alarm_json_model" role="tab"><code>alarm_model.json</code></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="alarm_java_model" role="tabpanel"> |
| <pre class="brush: java"> |
| package org.apache.nlpcraft.examples.alarm; |
| |
| import org.apache.nlpcraft.model.*; |
| |
| import java.time.*; |
| import java.util.*; |
| |
| import static java.time.temporal.ChronoUnit.MILLIS; |
| |
| public class AlarmModel extends NCModelFileAdapter { |
| private static final DateTimeFormatter FMT = |
| DateTimeFormatter.ofPattern("HH'h' mm'm' ss's'").withZone(ZoneId.systemDefault()); |
| |
| private final Timer timer = new Timer(); |
| |
| public AlarmModel() { |
| // Loading the model from the file. |
| super("alarm_model.json"); |
| } |
| |
| @NCIntentRef("alarm") // Intent is defined in JSON model file (alarm_model.json and intents.idl). |
| @NCIntentSampleRef("alarm_samples.txt") // Samples supplied in an external file. |
| NCResult onMatch( |
| NCIntentMatch ctx, |
| @NCIntentTerm("nums") List<NCToken> numToks |
| ) { |
| long ms = calculateTime(numToks); |
| |
| assert ms >= 0; |
| |
| timer.schedule( |
| new TimerTask() { |
| @Override |
| public void run() { |
| System.out.println( |
| "BEEP BEEP BEEP for: " + ctx.getContext().getRequest().getNormalizedText() + "" |
| ); |
| } |
| }, |
| ms |
| ); |
| |
| return NCResult.text("Timer set for: " + FMT.format(LocalDateTime.now().plus(ms, MILLIS))); |
| } |
| |
| @Override |
| public void onDiscard() { |
| // Clean up when model gets discarded (e.g. during testing). |
| timer.cancel(); |
| } |
| |
| public static long calculateTime(List<NCToken> numToks) { |
| LocalDateTime now = LocalDateTime.now(); |
| LocalDateTime dt = now; |
| |
| for (NCToken num : numToks) { |
| String unit = num.meta("nlpcraft:num:unit"); |
| |
| // Skip possible fractional to simplify. |
| long v = ((Double)num.meta("nlpcraft:num:from")).longValue(); |
| |
| if (v <= 0) |
| throw new NCRejection("Value must be positive: " + unit); |
| |
| switch (unit) { |
| case "second": { dt = dt.plusSeconds(v); break; } |
| case "minute": { dt = dt.plusMinutes(v); break; } |
| case "hour": { dt = dt.plusHours(v); break; } |
| case "day": { dt = dt.plusDays(v); break; } |
| case "week": { dt = dt.plusWeeks(v); break; } |
| case "month": { dt = dt.plusMonths(v); break; } |
| case "year": { dt = dt.plusYears(v); break; } |
| |
| default: |
| // It shouldn't be an assertion, because 'datetime' unit can be extended outside. |
| throw new NCRejection("Unsupported time unit: " + unit); |
| } |
| } |
| |
| return now.until(dt, MILLIS); |
| } |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="alarm_intents_idl" role="tabpanel"> |
| <pre class="brush: idl"> |
| // Fragments (mostly for demo purposes here). |
| fragment=buzz term~{# == 'x:alarm'} |
| fragment=when |
| term(nums)~{ |
| // Demonstrating term variables. |
| @type = meta_tok('nlpcraft:num:unittype') |
| @iseq = meta_tok('nlpcraft:num:isequalcondition') // Excludes conditional statements. |
| |
| # == 'nlpcraft:num' && @type == 'datetime' && @iseq == true |
| }[1,7] |
| |
| // Intents (using fragments). |
| intent=alarm |
| fragment(buzz) |
| fragment(when) |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="alarm_json_model" role="tabpanel"> |
| <pre class="brush: js"> |
| { |
| "id": "nlpcraft.alarm.ex", |
| "name": "Alarm Example Model", |
| "version": "1.0", |
| "description": "Alarm example model.", |
| "enabledBuiltInTokens": [ |
| "nlpcraft:num" |
| ], |
| "elements": [ |
| { |
| "id": "x:alarm", |
| "description": "Alarm token indicator.", |
| "synonyms": [ |
| "{ping|buzz|wake|call|hit} {me|up|me up|_}", |
| "{set|_} {my|_} {wake|wake up|_} {alarm|timer|clock|buzzer|call} {clock|_} {up|_}" |
| ] |
| } |
| ], |
| "intents": [ |
| "import('intents.idl')" // Import intents from external file. |
| ] |
| } |
| </pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <p> |
| Further sub-sections will provide details on model's static configuration and dynamic programmable |
| logic implementation. |
| </p> |
| </section> |
| <section id="dataflow"> |
| <h2 class="section-title">Model Dataflow <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <figure> |
| <img alt="data model dataflow" class="img-fluid" src="/images/homepage-fig1.1.png"> |
| <figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption> |
| </figure> |
| <p> |
| Let's review the general dataflow of the user request in NLPCraft (from right to left). |
| User request starts with the user application (like a chatbot or NLI-based system) making a |
| REST call using <a href="/using-rest.html">NLPCraft REST API</a>. That REST call carries among |
| other things the input text and data model ID, and it arrives first to the REST server. |
| </p> |
| <p> |
| Upon receiving the user request, the REST server performs NLP pre-processing converting the input |
| text into a sequence of tokens and enriching them with additional information. |
| Once finished, the sequence of tokens is sent further down to the probe where the requested data model |
| is deployed. |
| </p> |
| <p> |
| Upon receiving that sequence of tokens, the data probe further |
| enriches it based on the user data model and <a href="/intent-matching.html">matches</a> it against declared intents. When a matching |
| intent is found its callback method is called and its result travels back from the data probe to the |
| REST server and eventually to the user that made the REST call. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Security <span class="amp">&</span> Isolation</b> |
| </p> |
| <p> |
| Note that in this architecture the user-defined data model is fully isolated from the REST server accepting |
| user calls. Users never access data probes and hence data models directly. Typically REST server |
| should be deployed in DMZ and only <em>ingress connectivity is required</em> from the REST server to data probes. |
| </p> |
| </div> |
| </section> |
| <section id="lifecycle"> |
| <h2 class="section-title">Model Lifecycle <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Data model is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface. |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface has |
| defaults for most of its methods. These are the only methods that must to be implemented by its sub-class: |
| </p> |
| <ul> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">getId()</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">getName()</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">getVersion()</a></li> |
| </ul> |
| <p> |
| You can either implement <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> |
| interface directly or use one of the adapters (recommended in most cases): |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelAdapter.html">NCModelAdapter</a> - when |
| entire model definition is in sub-class source code. |
| </li> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> - when |
| using external JSON/YAML declaration for model definition. |
| </li> |
| </ul> |
| <p> |
| Note that you can also use 3rd party IoC frameworks like <a target=_ href="https://spring.io">Spring</a> to construct your data models. See |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFactory.html">NCModelFactory</a> for more information. |
| </p> |
| <div class="bq success"> |
| <div class="bq-idea-container"> |
| <div><div>💡</div></div> |
| <div> |
| <p> |
| <b>Using Adapters</b> |
| </p> |
| <p> |
| It is recommended to use one of the adapter classes when defining your |
| own data model in the most uses cases. |
| </p> |
| </div> |
| </div> |
| </div> |
| <h2 id="deployment" class="section-sub-title">Deployment <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Data models get <a href="/server-and-probe.html">deployed</a> to and hosted by the data probes - a lightweight |
| container whose job is to host data models and securely transfer requests between REST server and the data |
| models. When a data probe starts it reads its <a href="/server-and-probe.html">configuration</a> |
| to see which models to deploy. |
| </p> |
| <p> |
| Note that data probes don't support hot-redeployment. To redeploy the data model you need to restart |
| the data probe. Note also that data probe can be started in <a href="/tools/embedded_probe.html">embedded mode</a>, i.e. it can be started |
| from within an existing JVM process like user application. |
| </p> |
| <h2 id="callbacks" class="section-sub-title">Callbacks <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| There are two lifecycle callbacks on |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface |
| (by way of extending <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html">NCLifecycle</a> interface) that you can override to affect the the default lifecycle behavior: |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onInit()">onInit()</a> - called |
| right after the model was loaded and deployed. |
| </li> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onDiscard()">onDiscard()</a> - called to |
| discard the data model when and only when data probe is orderly shutting down. |
| </li> |
| </ul> |
| <p> |
| There are also several callbacks that you can override to affect model behavior during |
| <a href="/intent-matching.html#model_callbacks">intent matching</a> |
| to perform logging, debugging, statistic or usage collection, explicit update or initialization of |
| conversation context, security audit or validation: |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onParsedVariant(org.apache.nlpcraft.model.NCVariant)">onParsedVariant(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onResult(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCResult)">onResult(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onError(org.apache.nlpcraft.model.NCContext,java.lang.Throwable)">onError(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onRejection(org.apache.nlpcraft.model.NCIntentMatch,org.apache.nlpcraft.model.NCRejection)">onRejection(...)</a> |
| </li> |
| </ul> |
| <div class="bq info"> |
| <b>Conversation Reset</b> |
| <p> |
| Callbacks |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext(org.apache.nlpcraft.model.NCContext)">onContext(...)</a> and |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent(org.apache.nlpcraft.model.NCIntentMatch)">onMatchedIntent(...)</a> |
| are especially handy to perform a soft reset on the conversation context. Read their Javadoc documentation |
| to understand these callbacks protocol. |
| </p> |
| </div> |
| |
| <div class="bq info"> |
| <b>Lifecycle Components</b> |
| <p> |
| Note that both the server and the probe provide their own lifecycle components support. When registered in |
| the probe or server configuration the lifecycle components will be called |
| during various stages of the probe or server startup or shutdown procedures. These callbacks can be used |
| to control lifecycle of external libraries and systems that the data probe or the server rely on, i.e. |
| <a href="metrics-and-tracing.html">OpenCensus exporters</a>, security environment, devops hooks, etc. |
| </p> |
| <p> |
| See server and probe <a href="">configuration</a>. |
| </p> |
| </div> |
| </section> |
| <section id="config"> |
| <h2 class="section-title">Model Configuration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Apart from mandatory model <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId()">ID</a>, |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName()">name</a> and |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion()">version</a> |
| there is a number of static model configurations that you can set. All of these properties have sensible |
| defaults that you can override, when required, in either sub-classes or via external JSON/YAML declaration: |
| </p> |
| <ul> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getAdditionalStopWords()">getAdditionalStopWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">getEnabledBuiltInTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExcludedStopWords()">getExcludedStopWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxFreeWords()">getMaxFreeWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxSuspiciousWords()">getMaxSuspiciousWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTokens()">getMaxTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTotalSynonyms()">getMaxTotalSynonyms</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxUnknownWords()">getMaxUnknownWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxWords()">getMaxWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMetadata()">getMetadata</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinNonStopwords()">getMinNonStopwords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinTokens()">getMinTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinWords()">getMinWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getSuspiciousWords()">getSuspiciousWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isDupSynonymsAllowed()">isDupSynonymsAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()">isNonEnglishAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoNounsAllowed()">isNoNounsAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNotLatinCharsetAllowed()">isNotLatinCharsetAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoUserTokensAllowed()">isNoUserTokensAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isPermutateSynonyms()">isPermutateSynonyms</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSparse()">isSparse</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()">isSwearWordsAllowed</a></li> |
| </ul> |
| <h2 class="section-sub-title">External JSON/YAML Declaration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You can move out all the static model configuration into an external JSON or YAML file. To load that |
| configuration you need to use <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> |
| adapter when creating your data model. Here are JSON and YAML sample templates and you can find more details in |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Javadoc and in |
| <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">examples</a>. |
| </p> |
| |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#model-json" role="tab">JSON</a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#model-yaml" role="tab">YAML</a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="model-json" role="tabpanel"> |
| <pre class="brush: js"> |
| { |
| "id": "user.defined.id", |
| "name": "User Defined Name", |
| "version": "1.0", |
| "description": "Short model description.", |
| "enabledBuiltInTokens": ["google:person", "google:location"] |
| "macros": [], |
| "metadata": {}, |
| "elements": [ |
| { |
| "id": "x:id", |
| "description": "", |
| "groups": [], |
| "parentId": "", |
| "synonyms": [], |
| "metadata": {}, |
| "values": [] |
| } |
| ], |
| ... |
| "intents": [] |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="model-yaml" role="tabpanel"> |
| <pre class="brush: js"> |
| id: "user.defined.id" |
| name: "User Defined Name" |
| version: "1.0" |
| description: "Short model description." |
| macros: |
| enabledBuiltInTokens: |
| elements: |
| - id: "x:id" |
| description: "" |
| synonyms: |
| groups: |
| values: |
| parentId: |
| metadata: |
| ... |
| intents: |
| </pre> |
| </div> |
| </div> |
| <div class="bq success"> |
| <div class="bq-idea-container"> |
| <div><div>💡</div></div> |
| <div> |
| Note that using JSON/YAML-based configuration is a <b>canonical way</b> for |
| creating data models in NLPCraft as it allows to cleanly separate static configuration from model's |
| programmable logic. |
| </div> |
| </div> |
| </div> |
| </section> |
| <section id="ne"> |
| <h2 class="section-title">Named Entities <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Named entity, also known as a model element or a token, is one of the main a components defined by the NLPCraft data model. |
| A named entity is one or more individual words that have a consistent semantic meaning and typically denote a |
| real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such |
| object can be abstract or have a physical existence. |
| </p> |
| <p> |
| For example, in the following sentence: |
| </p> |
| <figure> |
| <img alt="named entities" class="img-fluid" src="/images/named-entities.png"> |
| <figcaption><b>Fig 2.</b> Named Entities</figcaption> |
| </figure> |
| <p> |
| the following named entities can be detected: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Words</th> |
| <th>Type</th> |
| <th>Normalized Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><b>Top 20</b></td> |
| <td><code>nlpcraft:limit</code></td> |
| <td>top 20</td> |
| </tr> |
| <tr> |
| <td><b>best pages</b></td> |
| <td><code>user:element</code></td> |
| <td>best pages</td> |
| </tr> |
| <tr> |
| <td><b>California USA</b></td> |
| <td><code>nlpcraft:geo</code></td> |
| <td>USA, California</td> |
| </tr> |
| <tr> |
| <td><b>last 3 months</b></td> |
| <td><code>nlpcraft:date</code></td> |
| <td>1/1/2021 - 4/1/2021</td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many |
| notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>, |
| <code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form. |
| </p> |
| <p> |
| The process of detecting named entities is called Named Entity Recognition (NER). There are many ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using |
| statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based |
| named entities definition as well as the ability to compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL) |
| combining other named entities including named entities from |
| <a href="/integrations.html">external project</a> such OpenNLP, spaCy or Stanford CoreNLP. |
| </p> |
| <p> |
| Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic |
| abstractions like geographical location or time when you are trying to understand the meaning of the sentence. |
| One of the main goals of named entities is to act as an input ingredients for <a href="/intent-matching.html">intent matching</a>. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>😀 User Input → Named Entities → Parsing Variants → Intent Matcher → Winning Intent 🚀</b> |
| </p> |
| <p> |
| User input is parsed into the list of named entities. That list is then further transformed into one or more |
| parsing variants where each variant represents a particular order and combination of detected named entities. |
| Finally, the list of variants act as an input to intent matching where each variant is matched against every intent |
| in the process of detecting the best matching intent for the original user input. |
| </p> |
| </div> |
| </section> |
| <section id="elements"> |
| <h2 class="section-title">Model Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Data model element defines a named entity that will be detected in the user input. |
| Model element is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> |
| interface. <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> provides |
| its elements via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getElements()">getElements()</a> method. |
| Typically, you create model elements by either: |
| </p> |
| <ul> |
| <li> |
| Implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> interface directly, or |
| </li> |
| <li> |
| Using JSON or YAML static model configuration (the preferred way in most cases). |
| </li> |
| </ul> |
| <p> |
| Note that when you use external static model configuration with JSON or YAML you can still modify it after it was loaded |
| using <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> |
| adapter. It is particular convenient when synonyms or values are loaded separately from, or in |
| addition to, the model elements themselves, i.e. from a database or another file. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Model Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b> |
| </p> |
| <p> |
| Terms 'model element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably: |
| </p> |
| <dl> |
| <dt>Model Element</dt> |
| <dd> |
| Denotes a named entity <em>declared</em> in NLPCraft model. |
| </dd> |
| <dt>Token</dt> |
| <dd> |
| Denotes a model element that was <em>detected</em> by NLPCraft in the user input. |
| </dd> |
| <dt>Named Entity</dt> |
| <dd> |
| Denotes a classic term, i.e. one or more individual words that have a |
| consistent semantic meaning and typically define a real-world object. |
| </dd> |
| </dl> |
| </div> |
| <p> |
| Although model element and named entity describe a similar concept, the NLPCraft model |
| elements provide a much more powerful instrument. Unlike named entities support in other projects |
| NLPCraft model elements have number of unique capabilities: |
| </p> |
| <ul> |
| <li> |
| New model elements can be added declaratively via a subset of NLPCraft <a href="/intent-matching.html">IDL</a>, regex and macro expansion. |
| </li> |
| <li> |
| New model elements can be also added programmatically for ultimate flexibility. |
| </li> |
| <li> |
| Model elements can have many-to-many group memberships. |
| </li> |
| <li> |
| Model elements can form a hierarchical structure. |
| </li> |
| <li> |
| Model elements are composable, i.e. a model element can use other model elements in its definition. |
| </li> |
| <li> |
| Model elements can be declared with user defined metadata. |
| </li> |
| <li> |
| Model elements provide normalized values and can define their own "proper nouns". |
| </li> |
| <li> |
| Model elements can compose named entities from many <a href="integrations.html#nlp">3rd party libraries</a>. |
| </li> |
| <li> |
| All properties of model elements (id, groups, parent & ancestors, values, and metadata) can be used in NLPCraft <a href="/intent-matching.html">IDL</a>. |
| </li> |
| </ul> |
| <h2 class="section-title">User vs. Built-In Elements <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Additionally to the model elements that are defined by the user in the data model (i.e. <em>user model elements</em>) |
| NLPCraft provides its own <a href="#builtin">built-in named entities</a> as well as the integration with number of <a href="integrations.html#nlp">3rd party projects</a>. You can think of these built-in elements as if they were implicitly defined in your model - you |
| can use them in exactly the same way as if you defined them yourself. |
| You can find more information on how to configure external token providers |
| in <a href="/integrations.html#nlp">Integrations</a> section. |
| </p> |
| <p> |
| Note that you can't directly change group membership, parent-child relationship or metadata of the |
| built-in elements. You can, however, "wrap" built-in entity into your own one using <code>^^{tok_id() == 'external.id'}^^</code> |
| <a href="/intent-matching.html">IDL</a> expression as its synonym where you can define all necessary additional |
| configuration properties (more on that below). |
| </p> |
| <span id="synonyms" class="section-sub-title">Synonyms <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that |
| would require pre-existing marked up data sets and extensive training. For each model element you can either provide a |
| set of synonyms to match on or specify a piece of code that would be responsible for detecting that named |
| entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its |
| implicit synonym so that even if no additional synonyms are defined at least one synonym always exists. Note |
| also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both |
| a synonym and user input. |
| </p> |
| <p> |
| Here's an example of a simple model element definition in JSON: |
| </p> |
| <pre class="brush: js, highlight: [6,7,8,9,10,11,12]"> |
| ... |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "truck", |
| "light duty truck" |
| "heavy duty truck" |
| "sedan", |
| "coupe" |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <p> |
| While adding multi-word synonyms looks somewhat |
| trivial - in real models, the naive approach can lead to thousands and even tens of thousands of |
| possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if |
| performed manually. |
| </p> |
| <p> |
| NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible |
| multi-word synonyms one by one you can use combination of following techniques: |
| </p> |
| <ul> |
| <li><a href="#macros">Macros</a></li> |
| <li><a href="#regex">Regular expressions</a></li> |
| <li><a href="#option-groups">Option Groups</a></li> |
| <li><a href="#dsl">IDL expressions</a></li> |
| <li><a href="#custom_ners">Programmable NERs</a></li> |
| </ul> |
| <p> |
| Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example |
| where it will be matched on using its normalized and stemmatized form) or one of the above expression. |
| </p> |
| <p> |
| Note that this synonyms definition is also used in the following |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> methods: |
| </p> |
| <ul> |
| <li><code>getSynonyms()</code> - gets synonyms to match on.</li> |
| <li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li> |
| </ul> |
| <span id="values" class="section-sub-title">Element Values <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Model element can have an optional set of special synonyms called <em>values</em> or "proper nouns" for this element. |
| Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value, |
| and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an |
| implicit synonym even when no additional synonyms added for that value. |
| </p> |
| <p> |
| When a model element is recognized it is made available to the model's matching logic as an instance of |
| the <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> interface. |
| This interface has a method |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue()">getValue()</a> which |
| returns the name of the value, if any, by which |
| that model element was recognized. That value name can be further used in intent matching. |
| </p> |
| <p> |
| To understand the importance of the values consider the following changes to our transportation |
| example model: |
| </p> |
| <pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]"> |
| ... |
| "macros": [ |
| { |
| "name": "<TRUCK_TYPE>", |
| "macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}" |
| } |
| ] |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "{<TRUCK_TYPE>|_} {pickup|_} truck" |
| "sedan", |
| "coupe" |
| ], |
| "values": [ |
| { |
| "value": "mercedes", |
| "synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"] |
| }, |
| { |
| "value": "bmw", |
| "synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"] |
| } |
| { |
| "value": "chevrolet", |
| "synonyms": ["chevy"] |
| } |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <p> |
| With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string: |
| </p> |
| <ul> |
| <li><code>car</code></li> |
| <li><code>benz</code> (with value <code>mercedes</code>)</li> |
| <li><code>3/4 ton pickup truck</code></li> |
| <li><code>light duty truck</code></li> |
| <li><code>chevy</code> (with value <code>chevrolet</code>)</li> |
| <li><code>bimmer</code> (with value <code>bmw</code>)</li> |
| <li><code>transport.vehicle</code></li> |
| </ul> |
| <span id="groups" class="section-sub-title">Element Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Each model element always belongs to one or more groups. Model element provides its groups via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getGroups()">getGroups()</a> method. |
| By default, if element group is not specified, the element ID will act as its default group ID. |
| Group membership is a quick and easy way to organise similar model elements together and use this |
| categorization in <a href="/intent-matching.html">IDL</a> intents. |
| </p> |
| <p> |
| Note that the proper grouping of the elements is also necessary for the correct operation of |
| Short-Term-Memory (STM) in the conversational context. Consider a |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> that |
| represents a previously found model element that is stored in the conversation. Such token |
| will be overridden in the conversation by the more <b>recent token</b> |
| from the <b>same group</b> - a critical rule of maintaining the proper conversational context. |
| See |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a> |
| for mode details. |
| </p> |
| <span id="parent" class="section-sub-title">Element Parent <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Each model element can form an optional hierarchical relationship with other element by specifying its |
| parent element ID via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html#getParentId()">getParentID()</a> method. |
| The main idea here is that sometimes model elements can act not only individually but |
| their place in the hierarchy can be important too. |
| </p> |
| <p> |
| For example, we could have designed our transportation example model in a different way by using |
| multiple model elements linked with this hierarchy: |
| </p> |
| <pre> |
| +-- vehicle |
| | +--truck |
| | | |-- light.duty.truck |
| | | |-- heavy.duty.truck |
| | | +-- medium.duty.truck |
| | +--car |
| | | |-- coupe |
| | | |-- sedan |
| | | |-- hatchback |
| | | +-- wagon |
| </pre> |
| <p> |
| Then in our intent, for example, we could look for any token with root parent ID <code>vehicle</code> |
| or immediate parent ID <code>truck</code> or <code>car</code> without a need to match on all current and |
| future individual sub-IDs. For example: |
| </p> |
| <pre class="brush: idl"> |
| intent=vehicle.intent term~{has(tok_ancestors, 'vehicle')} |
| intent=truck.intent term~{tok_parent == 'truck'} |
| intent=car.intent term~{tok_parent == 'car'} |
| </pre> |
| </section> |
| <section id="syns-tools"> |
| <span id="macros" class="section-sub-title">Macros <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros |
| together with option groups allow for significant simplification of this task. |
| Macros allow you to give a name to an often used set of words or option groups and reuse it without |
| repeating those words or option groups again and again. A model provides a list of macros via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMacros()">getMacros()</a> method. |
| Each macro has a name in a form of <code><X></code> where <code>X</code> |
| is any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include |
| references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively |
| replaced with its value. |
| </p> |
| <p> |
| Here's a code snippet of macro definitions using JSON definition: |
| </p> |
| <pre class="brush: js"> |
| "macros": [ |
| { |
| "name": "<A>", |
| "macro": "aaa" |
| }, |
| { |
| "name": "<B>", |
| "macro": "<A> bbb" |
| }, |
| { |
| "name": "<C>", |
| "macro": "<A> bbb {z|w}" |
| } |
| ] |
| </pre> |
| <span id="option-groups" class="section-sub-title">Option Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Option groups are similar to wildcard patterns that operates on a single word base. One line of |
| option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened |
| synonyms notation. The following examples demonstrate how to use option groups. |
| </p> |
| <p> |
| Consider the following macros defined below (note that macros <code><B></code> and <code><C></code> |
| are nested): |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Name</th> |
| <th>Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><A></code></td> |
| <td><code>aaa</code></td> |
| </tr> |
| <tr> |
| <td><code><B></code></td> |
| <td><code><A> bbb</code></td> |
| </tr> |
| <tr> |
| <td><code><C></code></td> |
| <td><code><A> bbb {z|w}</code></td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Then the following option group expansions will occur in these examples: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Synonym</th> |
| <th>Synonym Expansions</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><A> {b|_} c</code></td> |
| <td> |
| <code>"aaa b c"</code><br> |
| <code>"aaa c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code><A> {b|a}[1,2] c</code></td> |
| <td> |
| <code>"aaa b c"</code><br> |
| <code>"aaa b b c"</code><br> |
| <code>"aaa a c"</code><br> |
| <code>"aaa a a c"</code><br> |
| <code>"aaa c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <code><B> {b|_} c</code><br> |
| or<br> |
| <code><B> {b}[0,1] c</code> |
| </td> |
| <td> |
| <code>"aaa bbb b c"</code><br> |
| <code>"aaa bbb c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>{b|\{\_\}}</code></td> |
| <td> |
| <code>"b"</code><br> |
| <code>"b {_}"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a {b|_}. c</code></td> |
| <td> |
| <code>"a b. c"</code><br> |
| <code>"a . c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a .{b, |_}. c</code></td> |
| <td> |
| <code>"a .b, . c"</code><br> |
| <code>"a .. c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code> |
| {% raw %}a {{b|c}|_}.{% endraw %}</code></td> |
| <td> |
| <code>"a ."</code><br> |
| <code>"a b."</code><br> |
| <code>"a c."</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a {% raw %}{{{<C>}}|{_}}{% endraw %} c</code></td> |
| <td> |
| <code>"a aaa bbb z c"</code><br> |
| <code>"a aaa bbb w c"</code><br> |
| <code>"a c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>{% raw %}{{{a}}} {b||_|{{_}}||_}{% endraw %}</code></td> |
| <td> |
| <code>"a b"</code><br> |
| <code>"a"</code> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Specifically: |
| </p> |
| <ul> |
| <li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li> |
| <li> |
| <code>{A|B|_}</code> denotes either <code>A</code> or <code>B</code> or nothing. |
| <ul> |
| <li>Symbol <code>_</code> cam appear anywhere in the list of options, i.e. <code>{A|B|_}</code> is equal to <code>{A|_|B}</code>.</li> |
| </ul> |
| </li> |
| <li> |
| <code>{C}[x,y]</code> denotes an option group with quantifier, i.e. group <code>C</code> appearing from <code>x</code> to <code>y</code> times inclusive. |
| <ul> |
| <li>For example, <code>{C}[1,3]</code> is the same as <code>{C|C C|C C C}</code> notation.</li> |
| <li>Note that <code>{C|_}</code> is equal to <code>{C}[0,1]</code></li> |
| </ul> |
| </li> |
| <li>Excessive curly brackets are ignored, when safe to do so.</li> |
| <li>Macros cannot be recursive but can be nested.</li> |
| <li>Option groups can be nested.</li> |
| <li> |
| <code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and |
| <code>'_'</code> special symbols used by the option groups. |
| </li> |
| <li>Excessive whitespaces are trimmed when expanding option groups.</li> |
| </ul> |
| <p> |
| We can rewrite our transportation model element in a more efficient way using macros and option groups. |
| Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms |
| we would have to write out manually otherwise: |
| </p> |
| <pre class="brush: js, highlight: [4,5,14]"> |
| ... |
| "macros": [ |
| { |
| "name": "<TRUCK_TYPE>", |
| "macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}" |
| } |
| ] |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "{<TRUCK_TYPE>|_} {pickup|_} truck" |
| "sedan", |
| "coupe" |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <span id="regex" class="section-sub-title">Regular Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is |
| considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that |
| regular expression can only span a single word, i.e. only individual words from the user input will be |
| matched against given regular expression and no whitespaces are allowed within regular expression. Note |
| also that option group special symbols <code>{</code>, <code>}</code>, |
| <code>|</code> and <code>_</code> have to be escaped in the regular expression using <code>\</code> |
| (backslash). |
| </p> |
| <p> |
| For example, the following synonym: |
| </p> |
| <pre class="brush: js"> |
| "synonyms": [ |
| "{foo|//[bar].+//}}" |
| ] |
| </pre> |
| <p> |
| will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as |
| this string doesn't contain whitespaces. |
| </p> |
| <div class="bq info"> |
| <b>Regular Expressions Performance</b> |
| <p> |
| It's important to note that regular expressions can significantly affect the performance of the |
| NLPCraft processing if used uncontrolled. Use it with caution and test the performance |
| of your model to ensure it meets your requirements. |
| </p> |
| </div> |
| <h2 id="dsl" class="section-sub-title">IDL Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Any individual synonym word that that starts and ends with <code>^^</code> is a |
| <a href="/intent-matching.html#idl">IDL expression.</a> IDL |
| expression inside of <code>^^ ... ^^</code> markers allows you to define a predicate on already parsed and detected token. |
| It is very important to note that unlike all other synonyms the IDL expression operates on a |
| already detected <em>token</em>, not on an individual unparsed <em>word</em>. |
| </p> |
| <p> |
| IDL expressions allows you to <em>compose</em> named entities, i.e. use one name entity when defining another one. For example, |
| we could define a model element for the race car using our previous transportation example (note how synonym on |
| <b>line 18</b> |
| references the element defined on <b>line 4</b>): |
| </p> |
| <pre class="brush: js, highlight: [4, 18]"> |
| ... |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "truck", |
| "{light|heavy|super|medium} duty {pickup|_} truck" |
| "sedan", |
| "coupe" |
| ] |
| }, |
| { |
| "id": "race.vehicle", |
| "description": "Race vehicle", |
| "synonyms": [ |
| "{race|speed|track} ^^{# == 'transport.vehicle'}^^" |
| ] |
| } |
| |
| ] |
| ... |
| </pre> |
| <div class="bq warn"> |
| <p> |
| <b>Greedy NERs <span class="amp">&</span> Synonyms Conflicts</b> |
| </p> |
| <p> |
| Note that in the above example you need to ensure that words <code>race</code>, |
| <code>speed</code> or <code>track</code> are not part of the <code>transport.vehicle</code> |
| token. It is particular important for the 3rd party NERs where specific rules about what |
| words can or cannot be part of the token are unclear or undefined. In such cases the only remedy is |
| to extensively test with 3rd party NERs and verify the synonyms recognition in data probe logs. |
| </p> |
| </div> |
| <p> |
| Another use case is to wrap 3rd party named entities to add group membership, metadata or hierarchical |
| relationship to the externally defined named entity. For example, you can wrap <code>google:location</code> |
| token and add group membership for <code>my_group</code> group: |
| </p> |
| <pre class="brush: js, highlight: [6,8]"> |
| ... |
| "elements": [ |
| { |
| "id": "google.loc.wrap", |
| "description": "Wrapper for google location", |
| "groups": ["my_group"], |
| "synonyms": [ |
| "^^{# == 'google:location'}^^" |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <b>IDL Expression Syntax</b> |
| <p> |
| IDL expressions are a subset of overall <a href="/intent-matching.html#idl">IDL syntax</a>. You can |
| review formal |
| <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft/src/main/scala/org/apache/nlpcraft/model/intent/compiler/antlr4/NCIdl.g4">ANTLR4 grammar</a> |
| but basically |
| an IDL expression for synonym is a term expression with the optional alias at the beginning. |
| Here's an example of IDL expression defining a synonym for the population of any city in France: |
| </p> |
| <pre class="brush: js"> |
| "synonyms": [ |
| "population {of|for} ^^[city]{# == 'nlpcraft:city' && lowercase(meta_tok('city:country')) == 'france'}^^" |
| ] |
| </pre> |
| <b>NOTES:</b> |
| <ul> |
| <li>Optional alias <code>city</code> can be used to access a constituent part token (with ID <code>nlpcraft:city</code>).</li> |
| <li> |
| The expression between <code>{</code> and <code>}</code> brackets is a standard IDL term expression. |
| </li> |
| </ul> |
| <h2 id="custom_ners" class="section-sub-title">Custom NERs <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| By default, the data model detects its elements by their synonyms, regexp or IDL expressions. However, in some cases |
| these methods are either not expressive enough or cannot be used. For example, detecting model elements based |
| on neural networks or integration with a non-standard 3rd-party NER components. In such cases, a user-defined parser |
| can be defined for the model that would allow the user to define its own arbitrary NER logic to detect the model elements |
| in the user input programmatically. Note that a custom parser can detect any number of model elements. |
| </p> |
| <p> |
| Model provides its custom parsers via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getParsers()">getParsers()</a> method. |
| </p> |
| </section> |
| <section id="logic"> |
| <h2 class="section-title">Model Logic <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| When a user sends its request via REST API it is received by the REST server. Upon receipt, |
| the REST server does the basic NLP processing and enriching. Once finished, the REST server |
| sends the enriched request down to a specific data probe selected based on the requested data model. |
| </p> |
| <p> |
| The model logic is defined in <a href="intent-matching.html">intents</a>, specifically in the intent callbacks that get called when |
| their intent is chosen as a winning match against the user request. |
| Below we will quickly discuss the key APIs that are essential for developing intent callbacks. |
| Note that this does now replace a more detailed <a target=_ href="/apis/latest/index.html">Javadoc</a> |
| documentation that you are encouraged to read through as well: |
| </p> |
| <ul> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li> |
| <li>Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li> |
| </ul> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| This interface provides read-only view on data model. Model view defines a declarative, or configurable, part of the model. |
| All properties in this interface can be defined or overridden in JSON/YAML external |
| presentation when used with <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> adapter. |
| </p> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| This interface defines a context of a particular intent match. It can be passed into the callback of the matched intent |
| and provides the following: |
| </p> |
| <ul> |
| <li>ID of the matched intent.</li> |
| <li>Specific parsing variant that was matched against this intent.</li> |
| <li>Access to the original query context (<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a>).</li> |
| <li>Various access APIs for intent tokens.</li> |
| </ul> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| This interface provides all available data about the parsed user input and all its |
| supplemental information. It's accessible from <code>NCIntentMatch</code> interface and |
| provide large amount of information to the intent callback logic: |
| </p> |
| <ul> |
| <li> |
| Server request ID. Server request is defined as a processing of one user input sentence. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a> |
| for controlling STM of conversation manager and dialog flow. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> |
| instance that the intent callback method belongs to giving access to entire static model configuration. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> that |
| provides detailed information about the user input. |
| </li> |
| <li> |
| List of parsing variants provided |
| by <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a> |
| method. When the user sentence gets parsed into individual tokens (i.e. detected model elements) there is generally |
| more than one way to do it. This ambiguity is perfectly fine because only the data model has all the |
| necessary information to select one parsing variant that fits that model the best. Without the data model |
| there isn't enough context to determine which variant is the best fitting. |
| Method <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants()">getVariants()</a> |
| returns list of all parsing variants for a given user input. |
| </li> |
| </ul> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> interface |
| is one of the several important entities in Data Model API that you as a model developer will be working with. You |
| should review its <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">Javadoc</a> but |
| here is an outline of the information it provides: |
| </p> |
| <ul> |
| <li> |
| Information about the user that issued the request. |
| </li> |
| <li> |
| User agent and remote address, if any available, of the user's application that made the initial REST call. |
| </li> |
| <li> |
| Original request text, timestamp of its receipt, and server request ID. |
| </li> |
| </ul> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> object is another |
| key abstraction in Data Model API. A token is a detected model element and is a part of a fully parsed user input. |
| Sequence of tokens represents parsed user input. A single token corresponds to a one or more words, sequential |
| or not, in the user sentence. |
| </p> |
| <p> |
| Most of the token's information is stored in map-based metadata accessible via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">getMetadata()</a> method. |
| Depending on the token ID each token will have different set of <a href="#meta">metadata properties</a>. Some common NLP properties |
| are always present for tokens of all types. |
| </p> |
| <h2 class="section-sub-title">Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| This class defines the result returned from model's intent callbacks. Result consists of the |
| text body and the type. The result types are similar in notion to MIME type and have specific meaning only for REST applications |
| that interpret them accordingly. For example, the REST client interfacing between NLPCraft and Amazon Alexa or Apple HomeKit could |
| only accept text result type and ignore everything else. |
| </p> |
| <h2 class="section-sub-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html">NCMetadata</a> |
| provides support for mutable runtime-only metadata. This interface can be used to attach user-defined runtime data |
| to variety of different objects in NLPCraft API. This interface is implemented by the following types: |
| </p> |
| <ul> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCompany.html">NCCompany</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCUser.html">NCUser</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomElement.html">NCCustomElement</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCDialogFlowItem.html">NCDialogFlowItem</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCVariant.html">NCVariant</a></li> |
| </ul> |
| </section> |
| <section id="builtin"> |
| <h2 class="section-title">Built-In Tokens <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| NLPCraft provides a number of built-in model elements (i.e. tokens) including the |
| <a href="integrations.html">integration</a> with several popular 3rd party NER frameworks. Table |
| below provides information about these built-in tokens. Section about <a href="#meta">token metadata</a> provides |
| further information about <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCMetadata.html#getMetadata()">metadata</a> that each type of token carries. |
| </p> |
| <p> |
| Built-in tokens have to be explicitly enabled on both the REST server and in the model. See |
| <code>nlpcraft.server.tokenProviders</code> configuration property and |
| <a target="javadoc" href="apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView#getEnabledBuiltInTokens()</a> |
| method for more details. By default, only NLPCraft tokens are enabled (token ID |
| starting with <code>nlpcraft</code>). |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Token ID</th> |
| <th>Description</th> |
| <th>Example</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>nlpcraft:nlp</code></td> |
| <td> |
| <p> |
| This token denotes a word (always a single word) that is not a part of any other token. It's |
| also call a free-word, i.e. a word that is not linked to any other detected model element. |
| </p> |
| <p> |
| <b>NOTE:</b> the metadata from this token defines a common set of NLP properties and |
| is present in every other token as well. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:date</code></td> |
| <td> |
| This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not |
| currently recognize time component. |
| </td> |
| <td> |
| <ul> |
| <li>Meeting <code>next tuesday</code>.</li> |
| <li>Report for entire <code>2018 year</code>.</li> |
| <li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:num</code></td> |
| <td> |
| This token denotes a single numeric value or numeric condition. |
| </td> |
| <td> |
| <ul> |
| <li>Price <code>> 100</code>.</li> |
| <li>Price is <code>less than $100</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:continent</code></td> |
| <td> |
| This token denotes a geographical continent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Africa</code>.</li> |
| <li>Surface area of <code>America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:subcontinent</code></td> |
| <td> |
| This token denotes a geographical subcontinent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Alaskan peninsula</code>.</li> |
| <li>Surface area of <code>South America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:region</code></td> |
| <td> |
| This token denotes a geographical region/state. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>California</code>.</li> |
| <li>Surface area of <code>South Dakota</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:country</code></td> |
| <td> |
| This token denotes a country. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>France</code>.</li> |
| <li>Surface area of <code>USA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:city</code></td> |
| <td> |
| This token denotes a city. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Paris</code>.</li> |
| <li>Surface area of <code>Washington DC</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:metro</code></td> |
| <td> |
| This token denotes a metro area. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li> |
| <li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:sort</code></td> |
| <td> |
| This token denotes a sorting or ordering. |
| </td> |
| <td> |
| <ul> |
| <li>Report <code>sorted from top to bottom</code>.</li> |
| <li>Analysis <code>sorted in descending order</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:limit</code></td> |
| <td> |
| This token denotes a numerical limit. |
| </td> |
| <td> |
| <ul> |
| <li>Show <code>top 5</code> brands.</li> |
| <li>Show <code>several</code> brands.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:coordinate</code></td> |
| <td> |
| This token denotes a latitude and longitude coordinates. |
| </td> |
| <td> |
| <ul> |
| <li>Route the path to <code>55.7558, 37.6173</code> location.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:relation</code></td> |
| <td> |
| This token denotes a relation function: |
| <code>compare</code> or |
| <code>correlate</code>. Note this token always need another two tokens that it references. |
| </td> |
| <td> |
| <ul> |
| <li> |
| What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code> |
| (assuming that 'price' and 'location' are also detected tokens). |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>google:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e. |
| <code>google:person</code>, <code>google:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#google">integration</a> section for more details on how |
| to configure Google named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>opennlp:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e. |
| <code>opennlp:person</code>, <code>opennlp:money</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#opennlp">integration</a> section for more details on how |
| to configure Apache OpenNLP named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants under <code>100$</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>spacy:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://spacy.io/">spaCy</a>, i.e. |
| <code>spacy:person</code>, <code>spacy:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#spacy">integration</a> section for more details on how |
| to configure spaCy named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>stanford:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e. |
| <code>stanford:person</code>, <code>stanford:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#stanford">integration</a> section for more details on how |
| to configure Stanford CoreNLP named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| <section id="meta"> |
| <h2 class="section-title">Token Metadata <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Each token has different set of metadata. Sections below describe metadata for each built-in token |
| supported by NLPCraft: |
| </p> |
| <ul> |
| <li><a href="#nlpcraft:nlp">Token ID <code>nlpcraft:nlp</code></a></li> |
| <li><a href="#nlpcraft:date">Token ID <code>nlpcraft:date</code></a></li> |
| <li><a href="#nlpcraft:num">Token ID <code>nlpcraft:num</code></a></li> |
| <li><a href="#nlpcraft:city">Token ID <code>nlpcraft:city</code></a></li> |
| <li><a href="#nlpcraft:continent">Token ID <code>nlpcraft:continent</code></a></li> |
| <li><a href="#nlpcraft:subcontinent">Token ID <code>nlpcraft:subcontinent</code></a></li> |
| <li><a href="#nlpcraft:region">Token ID <code>nlpcraft:region</code></a></li> |
| <li><a href="#nlpcraft:country">Token ID <code>nlpcraft:country</code></a></li> |
| <li><a href="#nlpcraft:metro">Token ID <code>nlpcraft:metro</code></a></li> |
| <li><a href="#nlpcraft:coordinate">Token ID <code>nlpcraft:coordinate</code></a></li> |
| <li><a href="#nlpcraft:sort">Token ID <code>nlpcraft:sort</code></a></li> |
| <li><a href="#nlpcraft:limit">Token ID <code>nlpcraft:limit</code></a></li> |
| <li><a href="#nlpcraft:relation">Token ID <code>nlpcraft:relation</code></a></li> |
| <li><a href="#stanford:xxx">Token ID <code>stanford:xxx</code></a></li> |
| <li><a href="#spacy:xxx">Token ID <code>spacy:xxx</code></a></li> |
| <li><a href="#google:xxx">Token ID <code>google:xxx</code></a></li> |
| <li><a href="#opennlp:xxx">Token ID <code>opennlp:xxx</code></a></li> |
| </ul> |
| <div class="bq info"> |
| <p> |
| <b>Metadata Name Conflicts</b> |
| </p> |
| <p> |
| Note that model element metadata gets merged into the same map container as common NLP token metadata |
| (see <code>nlpcraft:nlp:xxx</code> properties below). |
| In other words, their share the same namespace. It is important to remember that and choose unique names |
| for user-defined metadata properties. One possible way that is used by NLPCraft internally is to prefix |
| metadata name with some unique prefix based on the token ID. |
| </p> |
| </div> |
| <span id="nlpcraft:nlp" class="section-sub-title">Token ID <code>nlpcraft:nlp</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token's metadata provides common basic NLP properties that are part of any token. |
| <b>All tokens</b> without exception have these metadata properties. This metadata |
| represents a common set of NLP properties for a given token. All these metadata properties are <b>mandatory</b>. |
| Note also that interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> |
| provides a direct access to most of these properties. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:nlp:unid</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Internal globally unique system ID of the token.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:bracketed</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token is surrounded by any of <code>'['</code>, <code>']'</code>, <code>'{'</code>, <code>'}'</code>, <code>'('</code>, <code>')'</code> brackets.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:freeword</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token represents a free word. A free word is a token that was detected neither as a part of user defined or system tokens.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:direct</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token was matched on direct (not permutated) synonym.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:english</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this token represents an English word. Note that this only checks that token's text |
| consists of characters of English alphabet, i.e. the text doesn't have to be necessary a |
| known valid English word. See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed()" target="javadoc">NCModelView.isNonEnglishAllowed()</a> method |
| for corresponding model configuration. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:lemma</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Lemma of this token, i.e. a canonical form of this word. Note that stemming and |
| lemmatization allow to reduce inflectional forms and sometimes derivationally related forms |
| of a word to a common base form. Lemmatization refers to the use of a vocabulary and |
| morphological analysis of words, normally aiming to remove inflectional endings only and to |
| return the base or dictionary form of a word, which is known as the lemma. |
| Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:stem</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Stem of this token. Note that stemming and lemmatization allow to reduce inflectional forms |
| and sometimes derivationally related forms of a word to a common base form. Unlike lemma, |
| stemming is a basic heuristic process that chops off the ends of words in the hope of |
| achieving this goal correctly most of the time, and often includes the removal of derivational |
| affixes. |
| Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:pos</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Penn Treebank POS tag for this token. Note that additionally to standard Penn Treebank POS |
| tags NLPCraft introduced '---' synthetic tag to indicate a POS tag for multiword tokens. |
| Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:posdesc</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Description of Penn Treebank POS tag. |
| Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:swear</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is a swear word. NLPCraft has built-in list of common English swear words. |
| See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed()" target="javadoc">NCModelView.isSwearWordsAllowed()</a> for corresponding model configuration |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:origtext</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Original user input text for this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:normtext</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Normalized user input text for this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:sparsity</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Numeric value of how sparse the token is. Sparsity zero means that all individual words in |
| the token follow each other. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:minindex</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of the first word in this token. Note that token may not be contiguous. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:maxindex</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of the last word in this token. Note that token may not be contiguous. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:wordindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td> |
| List of original word indexes in this token. Note that a token can have words that are not |
| contiguous in the original sentence. Always has at least one element in it. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:wordlength</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Number of individual words in this token. Equal to the size of <code>wordindexes</code> list. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:contiguous</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token has zero sparsity, i.e. consists of contiguous words. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:start</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Start character index of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:end</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| End character index of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:index</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of this token in the sentence. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:charlength</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Character length of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:quoted</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is surrounded by single or double quotes. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:stopword</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is a stopword. Stopwords are some extremely common words which |
| add little value in helping understanding user input and are excluded from the processing entirely. |
| For example, words like a, the, can, of, about, over, etc. are typical stopwords in English. |
| NLPCraft has built-in set of stopwords. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:dict</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is found in Princeton WordNet database. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:date" class="section-sub-title">Token ID <code>nlpcraft:date</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a date range including single days. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b>. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:date:from</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td> |
| Start timestamp of the datetime range. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:date:to</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td> |
| End timestamp of the datetime range. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:num" class="section-sub-title">Token ID <code>nlpcraft:num</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a single numerical value or a numeric condition. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:num:from</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| Start of numeric range that satisfies the condition (exclusive). Note that if <code>from</code> |
| and <code>to</code> are the same this token represent a single value (whole or fractional) in |
| which case <code>isequalcondition</code>> will be <code>true</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:to</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| Ed of numeric range that satisfies the condition (exclusive). Note that if <code>from</code> |
| and <code>to</code> are the same this token represent a single value (whole or fractional) in |
| which case <code>isequalcondition</code>> will be <code>true</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:fromincl</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not start of the numeric range is inclusive |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:toincl</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not end of the numeric range is inclusive |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isequalcondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is an equality condition. Note that single numeric values also default to equality |
| condition and this property will be <code>true</code>. Indeed, <code>A is equal to 2</code> and |
| <code>A is 2</code> have the same meaning. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isnotequalcondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is a not-equality condition. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isfromnegativeinfinity</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this range is from negative infinity. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:israngecondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is a range condition. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:istopositiveinfinity</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this range is to positive infinity. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isfractional</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this token's value (single numeric value of a range) is a whole or a fractional number. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:unit</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Optional numeric value unit name (see below). |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:unittype</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Optional numeric value unit type (see below). |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following table provides possible values for <code><b>nlpcraft:num:unit</b></code> and <code><b>nlpcraft:num:unittype</b></code> |
| properties: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>num:unittype</th> |
| <th>num:unit <sub>possible values</sub></th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr><td><code>mass</code></td><td><code>feet per second</code><br/><code>grams</code><br/><code>kilogram</code><br/><code>grain</code><br/><code>dram</code><br/><code>ounce</code><br/><code>pound</code><br/><code>hundredweight</code><br/><code>ton</code><br/><code>tonne</code><br/><code>slug</code></td> |
| <tr><td><code>torque</code></td><td><code>newton meter</code></td> |
| <tr><td><code>area</code></td><td><code>square meter</code><br/><code>acre</code><br/><code>are</code><br/><code>hectare</code><br/><code>square inches</code><br/><code>square feet</code><br/><code>square yards</code><br/><code>square miles</code></td> |
| <tr><td><code>paper quantity</code></td><td><code>paper bale</code></td> |
| <tr><td><code>force</code></td><td><code>kilopond</code><br/><code>pond</code></td> |
| <tr><td><code>pressure</code></td><td><code>pounds per square inch</code></td> |
| <tr><td><code>solid angle</code></td><td><code>steradian</code></td> |
| <tr><td><code>pressure</code><br/><code>stress</code></td><td><code>pascal</code></td> |
| <tr><td><code>luminous</code></td><td><code>flux</code><br/><code>lumen</code></td> |
| <tr><td><code>amount of substance</code></td><td><code>mole</code></td> |
| <tr><td><code>luminance</code></td><td><code>candela per square metre</code></td> |
| <tr><td><code>angle</code></td><td><code>radian</code><br/><code>degree</code></td> |
| <tr><td><code>magnetic flux density</code><br/><code>magnetic field</code></td><td><code>tesla</code></td> |
| <tr><td><code>power</code><br/><code>radiant flux</code></td><td><code>watt</code></td> |
| <tr><td><code>datetime</code></td><td><code>second</code><br/><code>minute</code><br/><code>hour</code><br/><code>day</code><br/><code>week</code><br/><code>month</code><br/><code>year</code></td> |
| <tr><td><code>electrical inductance</code></td><td><code>henry</code></td> |
| <tr><td><code>electric charge</code></td><td><code>coulomb</code></td> |
| <tr><td><code>temperature</code></td><td><code>kelvin</code><br/><code>centigrade</code><br/><code>fahrenheit</code></td> |
| <tr><td><code>voltage</code><br/><code>electrical</code></td><td><code>volt</code></td> |
| <tr><td><code>momentum</code></td><td><code>kilogram meters per second</code></td> |
| <tr><td><code>amount of heat</code></td><td><code>calorie</code></td> |
| <tr><td><code>electrical capacitance</code></td><td><code>farad</code></td> |
| <tr><td><code>radioactive decay</code></td><td><code>becquerel</code></td> |
| <tr><td><code>electrical conductance</code></td><td><code>siemens</code></td> |
| <tr><td><code>luminous intensity</code></td><td><code>candela</code></td> |
| <tr><td><code>work</code><br/><code>energy</code></td><td><code>joule</code></td> |
| <tr><td><code>quantities</code></td><td><code>dozen</code></td> |
| <tr><td><code>density</code></td><td><code>density</code></td> |
| <tr><td><code>sound</code></td><td><code>decibel</code></td> |
| <tr><td><code>electrical resistance</code><br/><code>impedance</code></td><td><code>ohm</code></td> |
| <tr><td><code>force</code><br/><code>weight</code></td><td><code>newton</code></td> |
| <tr><td><code>light quantity</code></td><td><code>lumen seconds</code></td> |
| <tr><td><code>length</code></td><td><code>meter</code><br/><code>millimeter</code><br/><code>centimeter</code><br/><code>decimeter</code><br/><code>kilometer</code><br/><code>astronomical unit</code><br/><code>light year</code><br/><code>parsec</code><br/><code>inch</code><br/><code>foot</code><br/><code>yard</code><br/><code>mile</code><br/><code>nautical mile</code></td> |
| <tr><td><code>refractive index</code></td><td><code>diopter</code></td> |
| <tr><td><code>frequency</code></td><td><code>hertz</code><br/><code>angular frequency</code></td> |
| <tr><td><code>power</code></td><td><code>kilowatt</code><br/><code>horsepower</code><br/><code>bar</code></td> |
| <tr><td><code>magnetic flux</code></td><td><code>weber</code></td> |
| <tr><td><code>current</code></td><td><code>ampere</code></td> |
| <tr><td><code>acceleration of gravity</code></td><td><code>gravity imperial</code><br/><code>gravity metric</code></td> |
| <tr><td><code>volume</code></td><td><code>cubic meter</code><br/><code>liter</code><br/><code>milliliter</code><br/><code>centiliter</code><br/><code>deciliter</code><br/><code>hectoliter</code><br/><code>cubic inch</code><br/><code>cubic foot</code><br/><code>cubic yard</code><br/><code>acre-foot</code><br/><code>teaspoon</code><br/><code>tablespoon</code><br/><code>fluid ounce</code><br/><code>cup</code><br/><code>gill</code><br/><code>pint</code><br/><code>quart</code><br/><code>gallon</code></td> |
| <tr><td><code>speed</code></td><td><code>miles per hour</code><br/><code>meters per second</code></td> |
| <tr><td><code>illuminance</code></td><td><code>lux</code></td> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:city" class="section-sub-title">Token ID <code>nlpcraft:city</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a city. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:city:city</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the city. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for city's country (see below). |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:citymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for city (see below). |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:city:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:city:citymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>latitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>City latitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>longitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>City longitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td>City population.</td> |
| </tr> |
| <tr> |
| <td><code><b>elevation</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Integer</code></td> |
| <td>Optional city elevation in meters.</td> |
| </tr> |
| <tr> |
| <td><code><b>timezone</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>City timezone.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:continent" class="section-sub-title">Token ID <code>nlpcraft:continent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a continent. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:continent:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the continent.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:subcontinent" class="section-sub-title">Token ID <code>nlpcraft:subcontinent</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a subcontinent. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:subcontinent:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:subcontinent:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the subcontinent.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:metro" class="section-sub-title">Token ID <code>nlpcraft:metro</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a metro area. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:metro:metro</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the metro area.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:region" class="section-sub-title">Token ID <code>nlpcraft:region</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a geographical region. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| </tbody> |
| <tr> |
| <td><code><b>nlpcraft:region:region</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the region. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for region's country (see below). |
| </td> |
| </tr> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:region:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:country" class="section-sub-title">Token ID <code>nlpcraft:country</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a country. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| </tbody> |
| <tr> |
| <td><code><b>nlpcraft:country:country</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the country. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for region's country (see below). |
| </td> |
| </tr> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:country:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:coordinate" class="section-sub-title">Token ID <code>nlpcraft:coordinate</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a latitude and longitude coordinate. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>coordinate:latitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Coordinate latitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>coordinate:longitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Coordinate longitude.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:sort" class="section-sub-title">Token ID <code>nlpcraft:sort</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a sorting or ordering function. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:sort:subjindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>One of more indexes of the target tokens (i.e. the token that being sorted).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:sort:byindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Zero or more (i.e. optional) indexes of the reference token (i.e. the token being sorted by).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:sort:asc</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether sorting is in ascending or descending order. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:limit" class="section-sub-title">Token ID <code>nlpcraft:limit</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a numeric limit value (like in "top 10" or "bottom five"). |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:limit:indexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Index (always only one) of the reference token (i.e. the token being limited).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:limit:asc</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether limit order is ascending or descending. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:limit:limit</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Numeric value of the limit. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:relation" class="section-sub-title">Token ID <code>nlpcraft:relation</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| This token denotes a numeric limit value (like in "top 10" or "bottom five"). |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:relation:indexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Index (always only one) of the reference token (i.e. the token being related to).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:relation:type</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Type of the relation. One of the following values: |
| <ul> |
| <li><code>compare</code></li> |
| <li><code>correlate</code></li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="google:xxx" class="section-sub-title">Token ID <code>google:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e. |
| <code>google:person</code>, <code>google:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>google:salience</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by Google Natural Language.</td> |
| </tr> |
| <tr> |
| <td><code><b>google:meta</b></code></td> |
| <td><code>java.util.Map<String></code></td> |
| <td> |
| Map-based container for Google Natural Language specific properties. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionsbeginoffsets</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mention begin offsets in the original normalized text. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionscontents</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mentions. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionstypes</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mention types. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="stanford:xxx" class="section-sub-title">Token ID <code>stanford:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e. |
| <code>stanford:person</code>, <code>stanford:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>stanford:confidence</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by Stanford CoreNLP.</td> |
| </tr> |
| <tr> |
| <td><code><b>stanford:nne</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Normalized Named Entity (NNE) text. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="spacy:xxx" class="section-sub-title">Token ID <code>spacy:xxx</code> <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://spacy.io/">spaCy</a>, i.e. |
| <code>spacy:person</code>, <code>spacy:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>spacy:vector</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>spaCy span vector. </td> |
| </tr> |
| <tr> |
| <td><code><b>spacy:sentiment</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| A scalar value indicating the positivity or negativity of the token. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="opennlp:xxx" class="section-sub-title">Token ID <code>opennlp:xxx</code></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e. |
| <code>opennlp:person</code>, <code>opennlp:money</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>opennlp:probability</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by OpenNLP.</td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Model Overview</a></li> |
| <li><a href="#dataflow">Model Dataflow</a></li> |
| <li><a href="#lifecycle">Model Lifecycle</a></li> |
| <li><a href="#config">Model Configuration</a></li> |
| <li><a href="#ne">Named Entities</a></li> |
| <li><a href="#elements">Model Elements</a></li> |
| <li><a class="toc2" href="#macros">Macros</a></li> |
| <li><a class="toc2" href="#regex">Regular Expressions</a></li> |
| <li><a class="toc2" href="#option-groups">Option Groups</a></li> |
| <li><a class="toc2" href="#dsl">IDL Expression</a></li> |
| <li><a class="toc2" href="#custom_ners">Custom NERs</a></li> |
| <li><a href="#logic">Model Logic</a></li> |
| <li><a href="#builtin">Built-In Tokens</a></li> |
| <li><a href="#meta">Token Metadata</a></li> |
| <li><a class="toc2" href="#nlpcraft:nlp"><code><b>nlpcraft:nlp</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:date"><code><b>nlpcraft:date</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:num"><code><b>nlpcraft:num</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:city"><code><b>nlpcraft:city</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:continent"><code><b>nlpcraft:continent</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:subcontinent"><code><b>nlpcraft:subcontinent</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:region"><code><b>nlpcraft:region</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:country"><code><b>nlpcraft:country</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:metro"><code><b>nlpcraft:metro</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:coordinate"><code><b>nlpcraft:coordinate</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:sort"><code><b>nlpcraft:sort</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:limit"><code><b>nlpcraft:limit</b></code></a></li> |
| <li><a class="toc2" href="#nlpcraft:relation"><code><b>nlpcraft:relation</b></code></a></li> |
| <li><a class="toc2" href="#stanford:xxx"><code><b>stanford:xxx</b></code></a></li> |
| <li><a class="toc2" href="#spacy:xxx"><code><b>spacy:xxx</b></code></a></li> |
| <li><a class="toc2" href="#google:xxx"><code><b>google:xxx</b></code></a></li> |
| <li><a class="toc2" href="#opennlp:xxx"><code><b>opennlp:xxx</b></code></a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |