| --- |
| active_crumb: Data Model |
| layout: documentation |
| id: data_model |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Model Overview</h2> |
| <p> |
| Data model is a central concept in NLPCraft defining interface to your data sources |
| like a database or a SaaS application. |
| NLPCraft employs <em>model-as-a-code</em> approach where entire data model is an implementation of |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface which |
| can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy. |
| </p> |
| <p> |
| A data model defines: |
| </p> |
| <ul> |
| <li>Set of model <a href="#elements">elements</a> (a.k.a. named entities) to be detected in the user input.</li> |
| <li>Zero or more intent callbacks.</li> |
| <li>Common model configuration and various life-cycle callbacks.</li> |
| </ul> |
| <p> |
| Note that model-as-a-code approach natively supports any software life |
| cycle tools and frameworks like various build tools, CI/SCM tools, IDEs, etc. |
| You don't have to use additional web-based tools to manage some aspects of your |
| data models - your entire model and all of its components are part of your project source code. |
| </p> |
| </section> |
| <section id="dataflow"> |
| <h2 class="section-title">Model Dataflow</h2> |
| <figure> |
| <img alt="data model dataflow" class="img-fluid" src="/images/homepage-fig1.1.png"> |
| <figcaption><b>Fig 1.</b> NLPCraft Architecture</figcaption> |
| </figure> |
| <p> |
| User request starts with the user application (like a chatbot or NLI-based system) making a |
| REST call using <a href="/using-rest.html">NLPCraft REST API</a>. That REST call carries among |
| other things the input text and data model ID, and it arrives first to the REST server. |
| </p> |
| <p> |
| Upon receiving the user request, the REST server performs NLP pre-processing converting the input |
| text into a sequence of tokens and enriching them with additional information. |
| </p> |
| <p> |
| Once finished, the encrypted sequence of tokens is sent further down to the probe where the requested data model |
| is deployed. |
| </p> |
| <p> |
| Upon receiving that sequence of tokens, the data probe further |
| enriches it based on the user data model and matches it against declared intents. When a matching |
| intent is found its callback method is called and its result travels back from the data probe to the |
| REST server and eventually to the user that made the REST call. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Security <span class="amp">&</span> Isolation</b> |
| </p> |
| <p> |
| Note that in this architecture the user-defined data model is fully isolated from the REST server accepting |
| user calls. Users never access data probes and hence user data models directly. Typically REST server |
| should be deployed in DMZ and only ingress connectivity is necessary between REST server and the data probes. |
| </p> |
| </div> |
| </section> |
| <section id="lifecycle"> |
| <h2 class="section-title">Model Lifecycle</h2> |
| <p> |
| Data model is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface. |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface has |
| defaults for most of its methods. These are the only methods that need to be implemented by its sub-class: |
| </p> |
| <ul> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId--">getId()</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName--">getName()</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion--">getVersion()</a></li> |
| </ul> |
| <p> |
| You can either implement <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> |
| interface directly or use one of the adapters (recommended in most cases): |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelAdapter.html">NCModelAdapter</a> - when |
| entire model definition is in sub-class source code. |
| </li> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> - when |
| using external JSON/YAML declaration for model definition. |
| </li> |
| </ul> |
| <p> |
| Note that you can also use 3rd party IoC frameworks like <a target=_ href="https://spring.io">Spring</a> to construct your data models. See |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFactory.html">NCModelFactory</a> for more information. |
| </p> |
| <div class="bq success"> |
| <p> |
| <b>Using Adapters</b> |
| </p> |
| <p> |
| It is recommended to use one of the adapter classes when defining your |
| own data model in the most uses cases. |
| </p> |
| </div> |
| <h3 class="section-title">Deployment</h3> |
| <p> |
| Data models get <a href="/server-and-probe.html">deployed</a> to and hosted by the data probes - a lightweight |
| container whose job is to host data models and securely transfer requests between REST server and the data |
| models. When a data probe starts it reads its <a href="/server-and-probe.html">configuration</a> |
| to see which models to deploy. |
| </p> |
| <p> |
| Note that data probes don't support hot-redeployment. To redeploy the data model you need to restart |
| the data probe. Note also that data probe can be started in embedded mode, i.e. it can be started |
| from within an existing JVM process like user application. |
| </p> |
| <h3 class="section-title">Callbacks</h3> |
| <p> |
| There are two callbacks on |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> interface |
| (by way of extending <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html">NCLifecycle</a> interface) that you can optionally override to affect the the default lifecycle behavior: |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onInit--">onInit()</a> - called |
| right after the model was loaded and deployed. |
| </li> |
| <li> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCLifecycle.html#onDiscard--">onDiscard()</a> - called to |
| discard the data model when and only when data probe is orderly shutting down. |
| </li> |
| </ul> |
| <p> |
| Note that there are also several callbacks that you can override to affect model behavior |
| to perform logging, debugging, statistic or usage collection, explicit update or initialization of |
| conversation context, security audit or validation: |
| </p> |
| <ul> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onParsedVariant-org.apache.nlpcraft.model.NCVariant-">onParsedVariant(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext-org.apache.nlpcraft.model.NCContext-">onContext(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent-org.apache.nlpcraft.model.NCIntentMatch-">onMatchedIntent(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onResult-org.apache.nlpcraft.model.NCIntentMatch-org.apache.nlpcraft.model.NCResult-">onResult(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onError-org.apache.nlpcraft.model.NCContext-java.lang.Throwable-">onError(...)</a> |
| </li> |
| <li> |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onRejection-org.apache.nlpcraft.model.NCIntentMatch-org.apache.nlpcraft.model.NCRejection-">onRejection(...)</a> |
| </li> |
| </ul> |
| <div class="bq info"> |
| <b>Conversation Reset</b> |
| <p> |
| Callbacks |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onContext-org.apache.nlpcraft.model.NCContext-">onContext(...)</a> and |
| <a target="javadoc" |
| href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent-org.apache.nlpcraft.model.NCIntentMatch-">onMatchedIntent(...)</a> |
| are especially handy to perform a soft reset on the conversation context. Read their Javadoc documentation |
| to understand these callbacks protocol. |
| </p> |
| </div> |
| |
| <div class="bq info"> |
| <b>Lifecycle Components</b> |
| <p> |
| Note that both the server and the probe provide their own lifecycle components support. When registered in |
| the probe or server configuration the lifecycle components will be called |
| during various stages of the probe or server startup or shutdown procedures. These callbacks can be used |
| to control lifecycle of external libraries and systems that the data probe or the server rely on, i.e. |
| <a href="metrics-and-tracing.html">OpenCensus exporters</a>, |
| security environment, devops hooks, etc. |
| </p> |
| <p> |
| See server and probe <a href="">configuration</a> as well as <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCProbeLifecycle.html">NCProbeLifecycle</a> |
| interface for more details. |
| </p> |
| </div> |
| </section> |
| <section id="config"> |
| <h2 class="section-title">Model Configuration</h2> |
| <p> |
| Apart from mandatory model <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getId--">ID</a>, |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getName--">name</a> and |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getVersion--">version</a> |
| there is a number of static model configurations that you can set. All of these properties have sensible |
| defaults that you can override, when required, in either sub-classes or via external JSON/YAML declaration: |
| </p> |
| <ul> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getAdditionalStopWords--">getAdditionalStopWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens--">getEnabledBuiltInTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExamples--">getExamples</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getExcludedStopWords--">getExcludedStopWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getJiggleFactor--">getJiggleFactor</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxFreeWords--">getMaxFreeWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxSuspiciousWords--">getMaxSuspiciousWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTokens--">getMaxTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxTotalSynonyms--">getMaxTotalSynonyms</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxUnknownWords--">getMaxUnknownWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMaxWords--">getMaxWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMetadata--">getMetadata</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinNonStopwords--">getMinNonStopwords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinTokens--">getMinTokens</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMinWords--">getMinWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getSuspiciousWords--">getSuspiciousWords</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isDupSynonymsAllowed--">isDupSynonymsAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed--">isNonEnglishAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoNounsAllowed--">isNoNounsAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNotLatinCharsetAllowed--">isNotLatinCharsetAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNoUserTokensAllowed--">isNoUserTokensAllowed</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isPermutateSynonyms--">isPermutateSynonyms</a></li> |
| <li><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed--">isSwearWordsAllowed</a></li> |
| </ul> |
| <h3 class="section-title">External JSON/YAML Declaration</h3> |
| <p> |
| You can move out all the static model configuration into an external JSON or YAML file. To load that |
| configuration you need to use <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> |
| adapter when creating your data model. Here are JSON and YAML templates and you can find more details in |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> Javadoc and in |
| <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/src/main/scala/org/apache/nlpcraft/examples">examples</a>. |
| </p> |
| |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#model-json" role="tab" aria-controls="nav-home" aria-selected="true">JSON</a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#model-yaml" role="tab" aria-controls="nav-home" aria-selected="true">YAML</a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="model-json" role="tabpanel"> |
| <pre class="brush: js"> |
| { |
| "id": "user.defined.id", |
| "name": "User Defined Name", |
| "version": "1.0", |
| "description": "Short model description.", |
| "enabledBuiltInTokens": ["google:person", "google:location"] |
| "examples": [], |
| "macros": [], |
| "metadata": {}, |
| "elements": [ |
| { |
| "id": "x:id", |
| "description": "", |
| "groups": [], |
| "parentId": "", |
| "synonyms": [], |
| "metadata": {}, |
| "values": [] |
| } |
| ], |
| ... |
| "intents": [] |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="model-yaml" role="tabpanel"> |
| <pre class="brush: js"> |
| id: "user.defined.id" |
| name: "User Defined Name" |
| version: "1.0" |
| description: "Short model description." |
| examples: |
| macros: |
| enabledBuiltInTokens: |
| elements: |
| - id: "x:id" |
| description: "" |
| synonyms: |
| groups: |
| values: |
| parentId: |
| metadata: |
| ... |
| intents: |
| </pre> |
| </div> |
| </div> |
| <div class="bq success"> |
| <p> |
| Note that using JSON/YAML-based configuration is a <b>canonical way</b> for |
| creating data models in NLPCraft as it allows to cleanly separate static configuration from model's |
| programmable logic. |
| </p> |
| </div> |
| </section> |
| <section id="elements"> |
| <h2 class="section-title">Model Elements</h2> |
| <p> |
| Data model element defines a semantic entity that will be detected in the user input. |
| A model element typically is one or more individual words that have a consistent semantic meaning and typically denote a |
| real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such |
| object can be abstract or have a physical existence. |
| </p> |
| <p> |
| Model element is an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> |
| interface. <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a> provides |
| its elements via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getElements--">getElements()</a> method. |
| Typically, you create model elements by either: |
| </p> |
| <ul> |
| <li> |
| Implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> interface directly, or |
| </li> |
| <li> |
| <U></U>sing JSON or YAML static model configuration (the preferred way in most cases). |
| </li> |
| </ul> |
| <p> |
| Note that when you use external static model configuration with JSON or YAML you can still modify it after it was loaded |
| using <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> |
| adapter. It is particular convenient when synonyms or values are loaded separately from, or in |
| addition to, the model elements themselves, i.e. from a database or another file. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Model Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b> |
| </p> |
| <p> |
| Terms 'model element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably: |
| </p> |
| <dl> |
| <dt>Model Element</dt> |
| <dd> |
| Denotes a named entity <em>declared</em> in NLPCraft model. |
| </dd> |
| <dt>Token</dt> |
| <dd> |
| Denotes a named entity that was <em>detected</em> by NLPCraft in the user input. |
| </dd> |
| <dt>Named Entity</dt> |
| <dd> |
| Denotes a classic term, i.e. one or more individual words that have a |
| consistent semantic meaning and typically define a real-world object. |
| </dd> |
| </dl> |
| </div> |
| <p> |
| Although model element and named entity describe a similar concept, the NLPCraft model |
| elements provide a much more powerful instrument. Unlike named entities support in other projects |
| NLPCraft model elements have number of unique capabilities: |
| </p> |
| <ul> |
| <li> |
| New model elements can be added declaratively via token DSL, regex and macro expansion. |
| </li> |
| <li> |
| New model elements can be also added programmatically for ultimate flexibility. |
| </li> |
| <li> |
| Model elements can have many-to-many group memberships. |
| </li> |
| <li> |
| Model elements can form a hierarchical structure. |
| </li> |
| <li> |
| Model elements are composable, i.e. a model element can use other model elements in its definition. |
| </li> |
| <li> |
| Model elements can be declared with user defined metadata. |
| </li> |
| <li> |
| Model elements provide normalized values and can define their own "proper nouns". |
| </li> |
| <li> |
| Model elements can compose named entities from many <a href="integrations.html#nlp">3rd party libraries</a>. |
| </li> |
| <li> |
| All properties of model elements (id, groups, parent & ancestors, values, and metadata) can be used in token and intent DSLs. |
| </li> |
| </ul> |
| <h3 class="section-title">User vs. Built-In Elements</h3> |
| <p> |
| Additionally to the model elements that are defined by the user in the data model (i.e. <em>user model elements</em>) |
| NLPCraft provides <a href="#builtin">its own named entities</a> as well as the integration with number of <a href="integrations.html#nlp">3rd party projects</a>. You can think of these built-in elements as if they were implicitly defined in your model - you |
| can use them in exactly the same way as if you defined them yourself. |
| You can find more information on how to configure external token providers |
| in <a href="/integrations.html#nlp">Integrations</a> section. |
| </p> |
| <p> |
| Note that you can't directly change group membership, parent-child relationship or metadata of the |
| built-in elements. You can, however, "wrap" built-in entity into your own one using <code>^^id == 'external.id'^^</code> |
| token DSL expression where you can define all necessary additional configuration properties (more on that below). |
| </p> |
| <span id="synonyms" class="section-sub-title">Synonyms</span> |
| <p> |
| NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that |
| would require pre-existing marked up data sets and extensive training. For each model element you can either provide a |
| set of synonyms to match on or specify a piece of code that would be responsible for detecting that named |
| entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its |
| implicit synonym so that even if no additional synonyms are defined at least one synonym always exists. Note |
| also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both |
| a synonym and user input. |
| </p> |
| <p> |
| Here's an example of a simple model element definition in JSON: |
| </p> |
| <pre class="brush: js, highlight: [6,7,8,9,10,11,12]"> |
| ... |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "truck", |
| "light duty truck" |
| "heavy duty truck" |
| "sedan", |
| "coupe" |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <p> |
| During synonym matching NLPCraft uses <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getJiggleFactor--">jiggle factor</a> to rearrange (or "jiggle") |
| the individual words in the user input in attempt to match a given synonym. Jiggle factor is a measure of |
| how much sparsity is allowed when user input words are reordered in attempt to match the multi-word |
| synonyms. Zero means no reordering is allowed. One means that a word can move only one |
| position left or right, and so on. Empirically the value of 2 proved to be a good default value in |
| most cases. Note that larger values mean that synonym words can be almost in any random place in the user |
| input which makes synonym matching less meaningful. |
| </p> |
| <p> |
| While adding multi-word synonyms looks somewhat |
| trivial - in real models, the naive approach can lead to thousands and even tens of thousands of |
| possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if |
| performed manually. |
| </p> |
| <p> |
| NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible |
| multi-word synonyms one by one you can use combination of following expressions: |
| </p> |
| <ul> |
| <li><a href="#macros">Macros</a></li> |
| <li><a href="#regex">Regular expressions</a></li> |
| <li><a href="#option-groups">Option Groups</a></li> |
| <li><a href="#dsl">Token DSL</a></li> |
| </ul> |
| <p> |
| Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example |
| where it will be matched on using its normalized and stemmatized form) or one of the above expression. |
| </p> |
| <p> |
| Note that this universal synonyms definition is used in the following |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCElement.html">NCElement</a> methods: |
| </p> |
| <ul> |
| <li><code>getSynonyms()</code> - gets synonyms to match on.</li> |
| <li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li> |
| </ul> |
| <span id="macros" class="section-sub-title">Macros</span> |
| <p> |
| Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros |
| together with option groups allow for significant simplification of this process. |
| Macros let you give a name to an often used set of words or option groups and reuse it without |
| repeating those words or option groups again and again. A model provides a list of macros via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getMacros--">getMacros()</a> method on |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> interface. Each macro |
| has a name in a form of <code><X></code> where <code>X</code> |
| is just any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include |
| references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively |
| replaced with its value. |
| </p> |
| <p> |
| Here's a code snippet of macro definitions using JSON definition: |
| </p> |
| <pre class="brush: js"> |
| "macros": [ |
| { |
| "name": "<A>", |
| "macro": "aaa" |
| }, |
| { |
| "name": "<B>", |
| "macro": "<A> bbb" |
| }, |
| { |
| "name": "<C>", |
| "macro": "<A> bbb {z|w}" |
| } |
| ] |
| </pre> |
| <span id="option-groups" class="section-sub-title">Option Groups</span> |
| <p> |
| Option groups are similar to wildcard patterns that operates on a single word base. One line of |
| option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened |
| synonyms notation. The following examples demonstrate how to use option groups. |
| </p> |
| <p> |
| Consider the following macros defined below (note that macros <code><B></code> and <code><C></code> |
| are nested): |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Name</th> |
| <th>Value</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><A></code></td> |
| <td><code>aaa</code></td> |
| </tr> |
| <tr> |
| <td><code><B></code></td> |
| <td><code><A> bbb</code></td> |
| </tr> |
| <tr> |
| <td><code><C></code></td> |
| <td><code><A> bbb {z|w}</code></td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Then the following option group expansions will occur in these examples: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Synonym</th> |
| <th>Synonym Expansions</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><A> {b|*} c</code></td> |
| <td> |
| <code>"aaa b c"</code><br> |
| <code>"aaa c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code><B> {b|*} c</code></td> |
| <td> |
| <code>"aaa bbb b c"</code><br> |
| <code>"aaa bbb c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>{b|\{\*\}}</code></td> |
| <td> |
| <code>"b"</code><br> |
| <code>"b {*}"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a {b|*}. c</code></td> |
| <td> |
| <code>"a b. c"</code><br> |
| <code>"a . c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a .{b, |*}. c</code></td> |
| <td> |
| <code>"a .b, . c"</code><br> |
| <code>"a .. c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code> |
| {% raw %}a {{b|c}|*}.{% endraw %}</code></td> |
| <td> |
| <code>"a ."</code><br> |
| <code>"a b."</code><br> |
| <code>"a c."</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>a {% raw %}{{{<C>}}|{*}}{% endraw %} c</code></td> |
| <td> |
| <code>"a aaa bbb z c"</code><br> |
| <code>"a aaa bbb w c"</code><br> |
| <code>"a c"</code> |
| </td> |
| </tr> |
| <tr> |
| <td><code>{% raw %}{{{a}}} {b||*|{{*}}||*}{% endraw %}</code></td> |
| <td> |
| <code>"a b"</code><br> |
| <code>"a"</code> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Specifically: |
| </p> |
| <ul> |
| <li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li> |
| <li><code>{A|B|*}</code> denotes either <code>A</code> or <code>B</code> or nothing.</li> |
| <li>Excessive curly brackets are ignored, when safe to do so.</li> |
| <li>Macros cannot be recursive but can be nested.</li> |
| <li>Option groups can be nested.</li> |
| <li> |
| <code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and |
| <code>'*'</code> special symbols used by the option groups. |
| </li> |
| <li>Excessive whitespaces are trimmed when expanding option groups.</li> |
| </ul> |
| <p> |
| We can rewrite our transportation model element in a bit more efficient way using macros and option groups. |
| Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms |
| we would have to write out manually otherwise: |
| </p> |
| <pre class="brush: js, highlight: [4,5,14]"> |
| ... |
| "macros": [ |
| { |
| "name": "<TRUCK_TYPE>", |
| "macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}" |
| } |
| ] |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "{<TRUCK_TYPE>|*} {pickup|*} truck" |
| "sedan", |
| "coupe" |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <span id="regex" class="section-sub-title">Regular Expressions</span> |
| <p> |
| Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is |
| considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that |
| regular expression can only span a single word, i.e. only individual words from the user input will be |
| matched against given regular expression and no whitespaces are allowed within regular expression. Note |
| also that option group special symbols <code>{</code>, <code>}</code>, |
| <code>|</code> and <code>*</code> have to be escaped in the regular expression using <code>\</code> |
| (backslash). |
| </p> |
| <p> |
| For example, the following synonym: |
| </p> |
| <pre class="brush: js"> |
| "synonyms": [ |
| "{foo|//[bar].+//}}" |
| ] |
| </pre> |
| <p> |
| will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as |
| this string doesn't contain whitespaces. |
| </p> |
| <div class="bq info"> |
| <b>Regular Expressions Performance</b> |
| <p> |
| It's important to note that regular expressions can significantly affect the performance of the |
| underlying NLPCraft implementation if used uncontrolled. Use it with caution and test the performance |
| of your model to ensure it meets your expectations. |
| </p> |
| </div> |
| <span id="values" class="section-sub-title">Element Values</span> |
| <p> |
| Model element can have an optional set of special synonyms called <em>values</em> or proper nouns for this element. |
| Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value, |
| and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an |
| implicit synonym even when no additional synonyms added for that value. |
| </p> |
| <p> |
| When a model element is recognized it is made available to the model's matching logic as an instance of |
| the <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> interface. |
| This interface has a method |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue--">getValue()</a> which |
| returns the name of the value, if any, by which |
| that model element was recognized. That value name can be further used in intent matching. |
| </p> |
| <p> |
| To understand the importance of the values consider the following changes to our transportation |
| example model: |
| </p> |
| <pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]"> |
| ... |
| "macros": [ |
| { |
| "name": "<TRUCK_TYPE>", |
| "macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}" |
| } |
| ] |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "{<TRUCK_TYPE>|*} {pickup|*} truck" |
| "sedan", |
| "coupe" |
| ], |
| "values": [ |
| { |
| "value": "mercedes", |
| "synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"] |
| }, |
| { |
| "value": "bmw", |
| "synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"] |
| } |
| { |
| "value": "chevrolet", |
| "synonyms": ["chevy"] |
| } |
| ] |
| } |
| ] |
| ... |
| </pre> |
| <p> |
| With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string: |
| </p> |
| <ul> |
| <li><code>car</code></li> |
| <li><code>benz</code> (with value <code>mercedes</code>)</li> |
| <li><code>3/4 ton pickup truck</code></li> |
| <li><code>light duty truck</code></li> |
| <li><code>chevy</code> (with value <code>chevrolet</code>)</li> |
| <li><code>bimmer</code> (with value <code>bmw</code>)</li> |
| <li><code>transport.vehicle</code></li> |
| </ul> |
| <p> |
| Note that element value can be used in token and intent DSLs. |
| </p> |
| <span id="groups" class="section-sub-title">Element Groups</span> |
| <p> |
| Each model element belongs to one or more groups. Model element provides its groups via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelElement.html#getGroups--">getGroups()</a> method. |
| By default, if element group is not specified, the element ID will act as its default group ID. |
| </p> |
| <p> |
| Group membership is a quick and easy way to organise similar model elements together and use this |
| categorization in token and intent DSL. |
| </p> |
| <p> |
| Note that the proper grouping of the elements is also necessary for the correct operation of |
| Short-Term-Memory (STM) in the conversational context |
| when using intent-based matching. See |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a> |
| for mode details. |
| </p> |
| <p> |
| Consider a <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> that |
| represents a previously found model element that is stored in the conversation. Such token |
| will be overridden in the conversation by the more <b>recent token</b> |
| from the <b>same group</b> - a critical rule of maintaining the proper conversational context. |
| </p> |
| <p> |
| Note that token's groups can be used in token and intent DSLs. |
| </p> |
| <span id="parent" class="section-sub-title">Element Parent</span> |
| <p> |
| Each model element can form an optional hierarchical relationship with other element by specifying its |
| parent element ID via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelElement.html#getParnetId--">getParentID()</a> method. The main idea here is that sometimes model elements can act not only individually but |
| their place in the hierarchy can be important too for token and intent DSL. |
| </p> |
| <p> |
| For example, we could have designed our transportation example model in a different way by using |
| multiple model elements linked with this hierarchy: |
| </p> |
| <pre> |
| +-- vehicle |
| | +--truck |
| | | |-- light.duty.truck |
| | | |-- heavy.duty.truck |
| | | +-- medium.duty.truck |
| | +--car |
| | | |-- coupe |
| | | |-- sedan |
| | | |-- hatchback |
| | | +-- wagon |
| </pre> |
| <p> |
| Then in our intent DSL, for example, we could look for any token with root parent ID <code>vehicle</code> |
| or immediate parent ID <code>truck</code> or <code>car</code> without a need to match on all current and |
| future individual sub-IDs: |
| </p> |
| <pre class="brush: plain"> |
| "intent=vehicle.intent term={ancestors @@ 'vehicle'}" |
| "intent=truck.intent term={parent == 'truck'}" |
| "intent=car.intent term={parent == 'car'}" |
| </pre> |
| </section> |
| <section id="dsl" > |
| <h2 class="section-title">Token DSL</h2> |
| <p> |
| Any individual synonym word that that starts and ends with <code>^^</code> is a token DSL expression. A token |
| DSL expression inside of <code>^^ ... ^^</code> markers allows you to define a predicate on already parsed and detected token. It is very important to |
| note that unlike all other synonyms the token DSL predicate operates on a already detected <em>token</em>, not on an |
| individual unparsed <em>word</em>. |
| </p> |
| <p> |
| Token DSL allows you to <em>compose</em> named entities, i.e. use one name entity when defining another one. For example, |
| we could define a model element for the race car using our previous transportation example (note how synonym on |
| <b>line 18</b> |
| references the element defined on <b>line 4</b>): |
| </p> |
| <pre class="brush: js, highlight: [4, 18]"> |
| ... |
| "elements": [ |
| { |
| "id": "transport.vehicle", |
| "description": "Transportation vehicle", |
| "synonyms": [ |
| "car", |
| "truck", |
| "{light|heavy|super|medium} duty {pickup|*} truck" |
| "sedan", |
| "coupe" |
| ] |
| }, |
| { |
| "id": "race.vehicle", |
| "description": "Race vehicle", |
| "synonyms": [ |
| "{race|speed|track} ^^id == 'transport.vehicle'^^" |
| ] |
| } |
| |
| ] |
| ... |
| </pre> |
| <div class="bq warn"> |
| <p> |
| <b>Greedy NERs <span class="amp">&</span> Synonyms Conflicts</b> |
| </p> |
| <p> |
| Note that in the above example you need to ensure that words <code>race</code>, |
| <code>speed</code> or <code>track</code> are not part of the <code>transport.vehicle</code> |
| token. It is particular important for the 3rd party NERs where specific rules about what |
| words can or cannot be part of the token are unclear or undefined. In such cases the only remedy is |
| to extensively test with 3rd party NERs and verify the synonyms recognition in data probe logs. |
| </p> |
| </div> |
| <p> |
| Another often used use case is to wrap 3rd party named entities to add group membership, metadata or hierarchical |
| relationship to the externally detected named entity. For example, you can wrap <code>google:location</code> |
| token and add group membership for <code>my_group</code> group: |
| </p> |
| <pre class="brush: js, highlight: [6,8]"> |
| ... |
| "elements": [ |
| { |
| "id": "google.loc.wrap", |
| "description": "Wrapper for google location", |
| "groups": ["my_group"], |
| "synonyms": [ |
| "^^id == 'google:location'^^" |
| ] |
| } |
| |
| ] |
| ... |
| </pre> |
| <span id="dsl-syntax" class="section-sub-title">Token DSL Syntax</span> |
| <p> |
| Token DSL is a simple expression language for defining a single predicate over a token - a detected model |
| element. Remember that unlike token DSL all other types of synonyms work with simple words (vs. tokens). |
| Here's a full <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/src/main/scala/org/apache/nlpcraft/probe/mgrs/model/antlr4/NCSynonymDsl.g4">ANTLR4 grammar</a> for token DSL. |
| Note that this is exactly the same syntax as |
| used by <a href="intent-matching.html#syntax">intent DSL</a> for token predicates in intents - except for |
| aliases which we will explain below. |
| </p> |
| <p> |
| Here's an example of token DSL defining a synonym for the population of any city in France: |
| </p> |
| <pre class="brush: js"> |
| "synonyms": [ |
| "population {of|for} ^^[city](id == 'nlpcraft:city' && lowercase(~city:country) == 'france')^^" |
| ] |
| </pre> |
| <p> |
| Few notes on token DSL syntax: |
| </p> |
| <ul> |
| <li> |
| This synonym defines a composed named entity, i.e. named entity that consists of other named entities. |
| In our example, we utilize token <code>nlpcraft:city</code> along with other basic synonym. |
| </li> |
| <li> |
| Token DSL expression always results in one and only one token when matched, however, the synonym can have multiple |
| token DSL expressions. |
| </li> |
| <li> |
| Token DSL expression can have optional alias (<code>[city]</code>) that can be used in other token DSL |
| expressions when referencing the token matched by that expression. |
| </li> |
| <li> |
| You can get all participant nested tokens, if required, using <code>NCToken#getPartTokens()</code> method call chain. |
| You can also reference participant tokens in the token DSL expression itself by using dot-notation (see below) |
| with either token IDs or aliases. |
| </li> |
| <li> |
| All string values should be places in single quotes, as in <code>'some string'</code>. |
| For numeric literals you can use underscores to help readability, i.e. <code>~list:size >= <b>1_000_000</b></code> |
| </li> |
| <li> |
| You can use <code>null</code>, <code>true</code> and <code>false</code> literals as a values. |
| </li> |
| <li> |
| Individual token expressions can be combined with <code>&&</code>, <code>||</code> and <code>!</code> |
| logical combinators and <code>(</code> <code>)</code> brackets that obey standard precedence rules. |
| </li> |
| </ul> |
| <p> |
| The individual token DSL expression can be one of the following forms: |
| </p> |
| <pre class="brush: js"> |
| {qual}param op value |
| func({qual}param) op value |
| </pre> |
| <p> |
| The <code>{qual}param</code> is the left side parameter and it can have optional qualifier (<code>qual</code>). |
| Qualifier allows to reference participant tokens either by their ID or their DSL expression's alias using |
| dot-notation. For example: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Qualifier</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td> |
| <code><b>partId.</b>groups @@ 'my_grp'</code> |
| </td> |
| <td> |
| There must be a participant token (i.e. constituent token) with either token ID or alias |
| of <code>partId</code>. That participant token should belong to group <code>my_grp</code>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <code><b>alias1.alias2.</b>~meta['key'] >= 10</code> |
| </td> |
| <td> |
| There must be two nested participant tokens with either token ID or alias |
| of <code>alias1</code> and <code>alias2</code>. That second (inner-most <code>alias2</code>) participant token |
| should have metadata property <code>meta</code> of type map with key <code>key</code> which value |
| should be greater or equal to 10. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="bq warn"> |
| <p> |
| <b>NOTE:</b> If qualifier is present it <b>must</b> be valid and found, i.e. the participant tokens this qualifier |
| is referencing must be present. If qualifier is present but referenced participant tokens cannot be |
| found - the processing will abort with an exception rather than simply rejecting given synonym. In other |
| words, if specified - qualifiers are not optional. |
| </p> |
| </div> |
| <p> |
| The <code>param</code> itself can be one of the following literals: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Parameter</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>id</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getId--">ID</a> as |
| a <code>java.lang.String</code> object. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>id</b> == 'nlpcraft:city'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>groups</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getGroups--">groups</a> |
| as <code>java.util.Collection</code> of token IDs. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>groups</b> @@ 'my_group'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>aliases</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getAliases--">aliases</a> |
| as <code>java.util.Collection</code> of token aliases. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>aliases</b> @@ 'my_alias'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>startidx</code></td> |
| <td> |
| <p> |
| Token start character <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getStartCharIndex--">index</a> in the original text. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>startidx</b> > 5^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>endidx</code></td> |
| <td> |
| <p> |
| Token end character <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getStartCharIndex--">index</a> in the original text. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>endidx</b> < 15^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>parent</code></td> |
| <td> |
| <p> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getParentId--">ID</a> of |
| the parent token as a <code>java.lang.String</code> object. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>parent</b> == 'root'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>ancestors</code></td> |
| <td> |
| <p> |
| <code>java.util.List</code> of all token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getParentId--">parent ID</a> |
| from the current one to the root. List can be empty if current token has no parent ID. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>ancestors</b> @@ 'tok:id'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>value</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getValue--">value</a> |
| as a <code>java.lang.String</code> object. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>value</b> == 'brand_name'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>~propName</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">metadata</a> |
| property for given <code>propName</code>. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^~<b>city:country</b> == 'france'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>~propName[key]</code></td> |
| <td> |
| <p> |
| Token <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">metadata</a> |
| property for given <code>propName</code> |
| of type <code>java.util.List</code> or <code>java.util.Map</code>. |
| Returns indexed or keyed value. Note that <code>key</code> should be integer |
| for <code>java.util.List</code> and string for <code>java.util.Map</code>. |
| Nested indexing is not allowed. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^~<b>my:list[0]</b> >= 1_000_000^^</code><br> |
| <code>^^~<b>my:map['key']</b> >= 1_000_000^^</code> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| The optional <code>func</code> function can alter the value of the left-side parameter. Only one function call is allowed, i.e. |
| function calls cannot be nested. The primary use case for functions is dealing with 3rd party metadata where you |
| don't have a direct control on the values supplied from 3rd party named entity providers. The following functions are |
| supported: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Function Name</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>keys</code></td> |
| <td> |
| <p> |
| Calling <code>java.util.Map#keySet()</code> function on given parameter to a collection of |
| map keys. Applicable to <code>java.util.Map</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>keys</b>(~my:map) @@ 'my_key'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>values</code></td> |
| <td> |
| <p> |
| Calling <code>java.util.Map#values()</code> function on given parameter to get a collection |
| of map values. Applicable to <code>java.util.Map</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>values</b>(~my:map) @@ (200_000, 100_000)^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>trim</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.String#trim()</code> function on given parameter. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>trim</b>(~nlp:origtext) == '//^[Pp]aris$//'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>isalpha</code></td> |
| <td> |
| <p> |
| Checks that given string parameter contains only Unicode letters. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>isalpha</b>(~nlp:origtext) == true^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>isalphanum</code></td> |
| <td> |
| <p> |
| Checks that given string parameter contains only Unicode letters or digits. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>isalphanum</b>(~nlp:origtext) == true^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>isnumeric</code></td> |
| <td> |
| <p> |
| Checks that given string parameter contains only Unicode digits. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>isnumeric</b>(~zipcode) == true^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>iswhitespace</code></td> |
| <td> |
| <p> |
| Checks that given string parameter contains only whitespaces. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>iswhitespace</b>(~my_txt) == false^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>uppercase</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.String#toUpperCase()</code> function on given parameter. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>uppercase</b>(~nlp:origtext) == 'PARIS'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>lowercase</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.String#toLowerCase()</code> function on given parameter. |
| Applicable to <code>java.lang.String</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>lowercase</b>(~nlp:origtext) == 'paris'^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>ceil</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Math#ceil()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>ceil</b>(~custom:double) > 1.0^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>floor</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Math#floor()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>floor</b>(~custom:double) > 1.0^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>rint</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Math#rint()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>rint</b>(~custom:double) > 1.0^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>round</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Map#round()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code> and <code>java.lang.Float</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>round</b>(~custom:double) > 1.0^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>size</code>, <code>count</code> or <code>length</code></td> |
| <td> |
| <p> |
| Getting size of the <code>java.util.Collection</code> or <code>java.util.Map</code>, or number |
| of characters for <code>java.lang.String</code> parameter. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>size</b>(~custom:coll) > 0^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>signum</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Math#signum()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code> and <code>java.lang.Float</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>signum</b>(~custom:double) == -1^^</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><code>abs</code></td> |
| <td> |
| <p> |
| Calling <code>java.lang.Math#abs()</code> function on given parameter. |
| Applicable to <code>java.lang.Double</code>, <code>java.lang.Float</code>, |
| <code>java.lang.Long</code> and <code>java.lang.Integer</code> parameters only. |
| </p> |
| <p> |
| <b>Example:</b><br/> |
| <code>^^<b>abs</b>(~custom:int) > 10_000^^</code> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| The <code>op</code> (operation) can be one of the following: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Operation</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td> |
| <code>==</code><br/> |
| <code>!=</code> |
| </td> |
| <td> |
| <p> |
| Both operators perform equality check and work differently depending on the type of the left |
| and right parameter: |
| </p> |
| <ul> |
| <li> |
| <p> |
| If both left and right parameters are of type <code>java.util.Collection</code> then |
| it checks that both collections contain (do not contain) |
| exactly the same elements with exactly the same cardinalities. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~col <b>==</b> (1, 2, 3)^^</code></dt> |
| <dd>'col' metadata collection should contain only three elements: 1, 2, and 3.</dd> |
| <dt><code>^^groups <b>!=</b> ('null', 'void')^^</code></dt> |
| <dd>Token cannot belong to the exact two groups 'null' and 'void'.</dd> |
| <dt><code>^^keys(~map) <b>==</b> ('key1', 'key2')^^</code></dt> |
| <dd>'map' metadata map should contain only two keys: 'key1' and 'key2'.</dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| If only right parameters is of type <code>java.util.Collection</code> and the left |
| parameter is a single value then it checks that given single value is (is not) present |
| in the right side collection. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^id <b>==</b> ('id1', 'id2')^^</code></dt> |
| <dd>'id' should be either 'id1' or 'id2'.</dd> |
| <dt><code>^^~index <b>!=</b> (-1, 0)^^</code></dt> |
| <dd> |
| 'index' metadata should NOT be either -1 or 0. |
| </dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| If both left and right parameters are of type <code>java.lang.Number</code> |
| then method <code>java.lang.Double.compare()</code> is used to compare two numbers. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~score <b>==</b> 100_000^^</code></dt> |
| <dd> |
| 'score' metadata (of any numeric type) should be equal to 100,000 when compared using |
| double values. |
| </dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| If both left and right parameters are of type <code>java.lang.String</code> |
| and either one is a regular expression written using <code>//</code> prefix and suffix |
| syntax then that regular expression is used to perform equality check. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~txt <b>==</b> '//^[tT]ext$//'^^</code></dt> |
| <dd>'txt' metadata matches given regex.</dd> |
| <dt><code>^^~my_regex <b>!=</b> 'test'^^</code></dt> |
| <dd> |
| 'my_regex' metadata regex string matches 'test' value. Note that 'my_regex' metadata string |
| should use <code>//</code>...<code>//</code> syntax for regular expression. |
| </dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| In all other cases the standard Java <code>java.lang.Object.equal()</code> equality check |
| is used. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~value <b>==</b> null^^</code></dt> |
| <dd>Token does not have a value.</dd> |
| <dt><code>^^parentId <b>!=</b> null^^</code></dt> |
| <dd>Token's parent ID is not null.</dd> |
| <dt><code>^^~flag <b>==</b> true^^</code></dt> |
| <dd>'flag' metadata is true.</dd> |
| </dl> |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <code>@@</code><br/> |
| <code>!@</code> |
| </td> |
| <td> |
| <p> |
| Both operators perform collection containment check and work differently depending on the type of the left |
| and right parameter: |
| </p> |
| <ul> |
| <li> |
| <p> |
| If left parameter is of type <code>java.util.Collection</code> and the right side |
| parameter is a single value then it checks that given collection contains (does not |
| contain) given single value. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~col <b>@@</b> 100_000^^</code></dt> |
| <dd>'col' metadata collection should contain 100,000 value.</dd> |
| <dt><code>^^groups <b>!@</b> 'null'^^</code></dt> |
| <dd>Token should not belong to 'null' group.</dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| If both left and right parameters are of type <code>java.util.Collection</code> then |
| it checks that a left side collection contains (does not contain) <b>all elements</b> |
| from the right side collection. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^~col <b>@@</b> (1, 2, 3)^^</code></dt> |
| <dd>'col' metadata collection should contain all three elements: 1, 2, and 3.</dd> |
| <dt><code>^^groups <b>!@</b> ('null', 'void')^^</code></dt> |
| <dd> |
| Token should not belong to both 'null' and 'void' groups in the same time. |
| Note that it can belong to other groups. |
| </dd> |
| </dl> |
| </li> |
| <li> |
| <p> |
| If both left and right parameters are of type <code>java.lang.String</code> then |
| it checks that a left side string contains (does not contain) the right side string as |
| its sub-string. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^id <b>@@</b> 'sub'^^</code></dt> |
| <dd>Token ID should contain 'sub' sub-string.</dd> |
| <dt><code>^^~name <b>!@</b> 'nlp'^^</code></dt> |
| <dd> |
| Metadata 'name' should not contain 'nlp' substring. |
| </dd> |
| </dl> |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <code>></code><br/> |
| <code><=</code><br/> |
| <code><=</code><br/> |
| <code><</code> |
| </td> |
| <td> |
| <p> |
| Standard relational operators that are applicable to <code>java.lang.Number</code> left and |
| right side values only. |
| </p> |
| <b>Example:</b> |
| <dl> |
| <dt><code>^^startidx <b>>=</b> 10^^</code></dt> |
| <dd>Token start index should be greater or equal to 10.</dd> |
| <dt><code>^^~score <b><</b> 100_000^^</code></dt> |
| <dd> |
| Metadata 'score' should be less then 100,000. |
| </dd> |
| </dl> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <span id="combinators" class="section-sub-title">Logical Combinators</span> |
| <p> |
| Individual token expressions can be combined with <code>&&</code>, <code>||</code> and <code>!</code> |
| logical combinators and <code>( )</code> brackets that obey standard precedence rules as well as short-cut |
| processing of logical <code>&&</code> and <code>||</code> combinators. For example: |
| </p> |
| <p> |
| <code>^^[alias](my:list[0] >= 1_000_000 <b>&&</b> alias1.groups @@ 'clients')^^</code><br> |
| <code>^^<b>(</b>id == 'myid' && ~score > 10<b>)</b> <b>||</b> <b>(</b>alias1.groups @@ 'clients' && ~score <= 10<b>)</b>^^</code><br> |
| </p> |
| <span id="custom" class="section-sub-title">Custom Parsers</span> |
| <p> |
| In cases when declarative synonyms (macros, option groups, regexp and token DSL) are not expressive enough |
| you create your model element recognizer programmatically: |
| </p> |
| <ul> |
| <li> |
| Model provides its custom parsers via <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getParsers--">getParsers()</a> method. |
| </li> |
| <li> |
| Custom parser is defined by the following classes: |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomElement.html">NCCustomElement</a>, |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomParser.html">NCCustomParser</a> and |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCCustomWord.html">NCCustomWord</a>. |
| </li> |
| </ul> |
| </section> |
| <section id="logic"> |
| <h2 class="section-title">Model Logic</h2> |
| <p> |
| When a user sends its request via REST API it is received by the REST server. Upon receipt, |
| the REST server does the basic NLP processing and enriching. Once finished, the REST server |
| sends the enriched request down to a specific data probe selected based on the requested data model. |
| </p> |
| <p> |
| The model logic is defined in <a href="intent-matching.html">intents</a>, specifically in the intent callbacks that get called when |
| their intent is chosen as a winning match against the user request. |
| Below we will quickly discuss the key APIs that are essential for developing intent callbacks. |
| Note that this does now replace a more detailed <a target=_ href="/apis/latest/index.html">Javadoc</a> |
| documentation that you are encouraged to read through as well: |
| </p> |
| <ul> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></li> |
| <li>Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></li> |
| <li>Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></li> |
| </ul> |
| <h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a></h3> |
| <p> |
| This interface provides read-only view on data model. Model view defines a declarative, or configurable, part of the model. |
| All properties in this interface can be defined or overridden in JSON/YAML external |
| presentation when used with <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelFileAdapter.html">NCModelFileAdapter</a> adapter. |
| </p> |
| <h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a></h3> |
| <p> |
| This interface defines a context of a particular intent match. It can be passed into the callback of the matched intent |
| and provides the following: |
| </p> |
| <ul> |
| <li>ID of the matched intent.</li> |
| <li>Specific parsing variant that was matched against this intent.</li> |
| <li>Access to the original query context (<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a>).</li> |
| <li>Various access APIs for intent tokens.</li> |
| </ul> |
| <h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html">NCContext</a></h3> |
| <p> |
| This interface provides all available data about the parsed user input and all its |
| supplemental information. It's accessible from <code>NCIntentMatch</code> interface and |
| provide large amount of information to the intent callback logic: |
| </p> |
| <ul> |
| <li> |
| Server request ID. Server request is defined as a processing of one user input sentence. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCConversation.html">NCConversation</a> |
| for controlling STM of conversation manager and dialog flow. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html">NCModelView</a> |
| instance that the intent callback method belongs to giving access to entire static model configuration. |
| </li> |
| <li> |
| Reference to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a> that |
| provides detailed information about the user input. |
| </li> |
| <li> |
| List of parsing variants provided |
| by <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants--">getVariants()</a> |
| method. When the user sentence gets parsed into individual tokens (i.e. detected model elements) there is generally |
| more than one way to do it. This ambiguity is perfectly fine because only the data model has all the |
| necessary information to select one parsing variant that fits that model the best. Without the data model |
| there isn't enough context to determine which variant is the best fitting. |
| Method <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCContext.html#getVariants--">getVariants()</a> |
| returns list of all parsing variants for a given user input. |
| </li> |
| </ul> |
| <h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">NCRequest</a></h3> |
| <p> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCSentence.html">NCRequest</a> interface |
| is one of the several important entities in Data Model API that you as a model developer will be working with. You |
| should review its <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html">Javadoc</a> but |
| here is an outline of the information it provides: |
| </p> |
| <ul> |
| <li> |
| Information about the user that issued the request. |
| </li> |
| <li> |
| User agent and remote address, if any available, of the user's application that made the initial REST call. |
| </li> |
| <li> |
| Original request text, timestamp of its receipt, and server request ID. |
| </li> |
| </ul> |
| <h3 class="section-title">Interface <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a></h3> |
| <p> |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> object is another |
| key abstraction in Data Model API. A token is a detected model element and is a part of a fully parsed user input. |
| Sequence of tokens represents parsed user input. A single token corresponds to a one or more words, sequential |
| or not, in the user sentence. |
| </p> |
| <p> |
| Most of the token's information is stored in map-based metadata accessible via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html#getMetadata--">getMetadata()</a> method. |
| Depending on the token ID each token will have different set of <a href="#meta">metadata properties</a>. Some common NLP properties |
| are always present for tokens of all types. |
| </p> |
| <h3 class="section-title">Class <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a></h3> |
| <p> |
| This class defines data model result returned from model's intent callbacks. Result consists of the |
| text body and the type. The type is similar in notion to MIME types. Intent callbacks must use this class |
| to provide their results. |
| </p> |
| </section> |
| <section id="builtin"> |
| <h2 class="section-title">Built-In Tokens</h2> |
| <p> |
| NLPCraft provides a number of built-in model elements (i.e. tokens) including the |
| <a href="integrations.html">integration</a> with several popular 3rd party NER frameworks. Table |
| below provides information about these built-in tokens. Section about <a href="#meta">token metadata</a> provides |
| further information about metadata that each type of token carries. |
| </p> |
| <p> |
| Built-in tokens have to be explicitly enabled on both the REST server and in the model. See |
| <code>nlpcraft.server.tokenProviders</code> configuration property and |
| <a target="javadoc" href="apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens--">NCModelView#getEnabledBuiltInTokens()</a> |
| method for more details. By default, only NLPCraft tokens are enabled (token ID |
| starting with <code>nlpcraft</code>). |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Token ID</th> |
| <th>Description</th> |
| <th>Example</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>nlpcraft:nlp</code></td> |
| <td> |
| <p> |
| This token denotes a word (always a single word) that is not a part of any other token. It's |
| also call a free-word, i.e. a word that is not linked to any other detected model element. |
| </p> |
| <p> |
| <b>NOTE:</b> the metadata from this token defines a common set of NLP properties and |
| is present in every other token as well. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:date</code></td> |
| <td> |
| This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not |
| currently recognize time component. |
| </td> |
| <td> |
| <ul> |
| <li>Meeting <code>next tuesday</code>.</li> |
| <li>Report for entire <code>2018 year</code>.</li> |
| <li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:num</code></td> |
| <td> |
| This token denotes a single numeric value or numeric condition. |
| </td> |
| <td> |
| <ul> |
| <li>Price <code>> 100</code>.</li> |
| <li>Price is <code>less than $100</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:continent</code></td> |
| <td> |
| This token denotes a geographical continent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Africa</code>.</li> |
| <li>Surface area of <code>America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:subcontinent</code></td> |
| <td> |
| This token denotes a geographical subcontinent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Alaskan peninsula</code>.</li> |
| <li>Surface area of <code>South America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:region</code></td> |
| <td> |
| This token denotes a geographical region/state. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>California</code>.</li> |
| <li>Surface area of <code>South Dakota</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:country</code></td> |
| <td> |
| This token denotes a country. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>France</code>.</li> |
| <li>Surface area of <code>USA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:city</code></td> |
| <td> |
| This token denotes a city. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Paris</code>.</li> |
| <li>Surface area of <code>Washington DC</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:metro</code></td> |
| <td> |
| This token denotes a metro area. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li> |
| <li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:sort</code></td> |
| <td> |
| This token denotes a sorting or ordering. |
| </td> |
| <td> |
| <ul> |
| <li>Report <code>sorted from top to bottom</code>.</li> |
| <li>Analysis <code>sorted in descending order</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:limit</code></td> |
| <td> |
| This token denotes a numerical limit. |
| </td> |
| <td> |
| <ul> |
| <li>Show <code>top 5</code> brands.</li> |
| <li>Show <code>several</code> brands.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:coordinate</code></td> |
| <td> |
| This token denotes a latitude and longitude coordinates. |
| </td> |
| <td> |
| <ul> |
| <li>Route the path to <code>55.7558, 37.6173</code> location.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:relation</code></td> |
| <td> |
| This token denotes a relation function: |
| <code>compare</code> or |
| <code>correlate</code>. Note this token always need another two tokens that it references. |
| </td> |
| <td> |
| <ul> |
| <li> |
| What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code> |
| (assuming that 'price' and 'location' are also detected tokens). |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>google:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e. |
| <code>google:person</code>, <code>google:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#google">integration</a> section for more details on how |
| to configure Google named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>opennlp:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e. |
| <code>opennlp:person</code>, <code>opennlp:money</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#opennlp">integration</a> section for more details on how |
| to configure Apache OpenNLP named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants under <code>100$</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>spacy:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://spacy.io/">spaCy</a>, i.e. |
| <code>spacy:person</code>, <code>spacy:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#spacy">integration</a> section for more details on how |
| to configure spaCy named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>stanford:xxx</code></td> |
| <td> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e. |
| <code>stanford:person</code>, <code>stanford:location</code>, etc. |
| </p> |
| <p> |
| See <a href="integrations.html#stanford">integration</a> section for more details on how |
| to configure Stanford CoreNLP named entity provider. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li> |
| Articles by <code>Ken Thompson</code>. |
| </li> |
| <li> |
| Best restaurants in <code>Paris</code>. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| <section id="meta"> |
| <h2 class="section-title">Token Metadata</h2> |
| <p> |
| Each token has different set of metadata. Sections below describe metadata for each built-in token |
| supported by NLPCraft: |
| </p> |
| <ul> |
| <li><a href="#nlpcraft:nlp">Token ID <code>nlpcraft:nlp</code></a></li> |
| <li><a href="#nlpcraft:date">Token ID <code>nlpcraft:date</code></a></li> |
| <li><a href="#nlpcraft:num">Token ID <code>nlpcraft:num</code></a></li> |
| <li><a href="#nlpcraft:city">Token ID <code>nlpcraft:city</code></a></li> |
| <li><a href="#nlpcraft:continent">Token ID <code>nlpcraft:continent</code></a></li> |
| <li><a href="#nlpcraft:subcontinent">Token ID <code>nlpcraft:subcontinent</code></a></li> |
| <li><a href="#nlpcraft:region">Token ID <code>nlpcraft:region</code></a></li> |
| <li><a href="#nlpcraft:country">Token ID <code>nlpcraft:country</code></a></li> |
| <li><a href="#nlpcraft:metro">Token ID <code>nlpcraft:metro</code></a></li> |
| <li><a href="#nlpcraft:coordinate">Token ID <code>nlpcraft:coordinate</code></a></li> |
| <li><a href="#nlpcraft:sort">Token ID <code>nlpcraft:sort</code></a></li> |
| <li><a href="#nlpcraft:limit">Token ID <code>nlpcraft:limit</code></a></li> |
| <li><a href="#nlpcraft:relation">Token ID <code>nlpcraft:relation</code></a></li> |
| <li><a href="#stanford:xxx">Token ID <code>stanford:xxx</code></a></li> |
| <li><a href="#spacy:xxx">Token ID <code>spacy:xxx</code></a></li> |
| <li><a href="#google:xxx">Token ID <code>google:xxx</code></a></li> |
| <li><a href="#opennlp:xxx">Token ID <code>opennlp:xxx</code></a></li> |
| </ul> |
| <div class="bq info"> |
| <p> |
| <b>Metadata Name Conflicts</b> |
| </p> |
| <p> |
| Note that model element metadata gets merged into the same map container as common NLP token metadata |
| (see <code>nlpcraft:nlp:xxx</code> properties below). |
| In other words, their share the same namespace. It is important to remember that and choose unique names |
| for user-defined metadata properties. One possible way that is used by NLPCraft internally is to prefix |
| metadata name with some unique prefix based on the token ID. |
| </p> |
| </div> |
| <span id="nlpcraft:nlp" class="section-sub-title">Token ID <code>nlpcraft:nlp</code></span> |
| <p> |
| This token's metadata provides common basic NLP properties that are part of any token. |
| <b>All tokens</b> without exception have these metadata properties. This metadata |
| represents a common set of NLP properties for a given token. All these metadata properties are <b>mandatory</b>. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:nlp:unid</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Internal globally unique system ID of the token.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:bracketed</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token is surrounded by any of <code>'['</code>, <code>']'</code>, <code>'{'</code>, <code>'}'</code>, <code>'('</code>, <code>')'</code> brackets.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:freeword</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token represents a free word. A free word is a token that was detected neither as a part of user defined or system tokens.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:direct</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td>Whether or not this token was matched on direct (not permutated) synonym.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:english</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this token represents an English word. Note that this only checks that token's text |
| consists of characters of English alphabet, i.e. the text doesn't have to be necessary a |
| known valid English word. See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isNonEnglishAllowed--" target="javadoc">NCModelView.isNonEnglishAllowed()</a> method |
| for corresponding model configuration. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:lemma</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Lemma of this token, i.e. a canonical form of this word. Note that stemming and |
| lemmatization allow to reduce inflectional forms and sometimes derivationally related forms |
| of a word to a common base form. Lemmatization refers to the use of a vocabulary and |
| morphological analysis of words, normally aiming to remove inflectional endings only and to |
| return the base or dictionary form of a word, which is known as the lemma. |
| Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:stem</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Stem of this token. Note that stemming and lemmatization allow to reduce inflectional forms |
| and sometimes derivationally related forms of a word to a common base form. Unlike lemma, |
| stemming is a basic heuristic process that chops off the ends of words in the hope of |
| achieving this goal correctly most of the time, and often includes the removal of derivational |
| affixes. |
| Learn more at <a target=_ href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:pos</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Penn Treebank POS tag for this token. Note that additionally to standard Penn Treebank POS |
| tags NLPCraft introduced '---' synthetic tag to indicate a POS tag for multiword tokens. |
| Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:posdesc</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Description of Penn Treebank POS tag. |
| Learn more at <a target=_ href="http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</a> |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:swear</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is a swear word. NLPCraft has built-in list of common English swear words. |
| See <a href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#isSwearWordsAllowed--" target="javadoc">NCModelView.isSwearWordsAllowed()</a> for corresponding model configuration |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:origtext</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Original user input text for this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:normtext</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Normalized user input text for this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:sparsity</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Numeric value of how sparse the token is. Sparsity zero means that all individual words in |
| the token follow each other. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:minindex</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of the first word in this token. Note that token may not be contiguous. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:maxindex</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of the last word in this token. Note that token may not be contiguous. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:wordindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td> |
| List of original word indexes in this token. Note that a token can have words that are not |
| contiguous in the original sentence. Always has at least one element in it. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:wordlength</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Number of individual words in this token. Equal to the size of <code>wordindexes</code> list. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:contiguous</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token has zero sparsity, i.e. consists of contiguous words. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:start</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Start character index of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:end</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| End character index of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:index</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Index of this token in the sentence. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:charlength</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Character length of this token. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:quoted</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is surrounded by single or double quotes. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:stopword</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is a stopword. Stopwords are some extremely common words which |
| add little value in helping understanding user input and are excluded from the processing entirely. |
| For example, words like a, the, can, of, about, over, etc. are typical stopwords in English. |
| NLPCraft has built-in set of stopwords. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:nlp:dict</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not this token is found in Princeton WordNet database. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:date" class="section-sub-title">Token ID <code>nlpcraft:date</code></span> |
| <p> |
| This token denotes a date range including single days. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b>. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:date:from</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td> |
| Start timestamp of the datetime range. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:date:to</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td> |
| End timestamp of the datetime range. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:num" class="section-sub-title">Token ID <code>nlpcraft:num</code></span> |
| <p> |
| This token denotes a single numerical value or a numeric condition. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:num:from</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| Start of numeric range that satisfies the condition (exclusive). Note that if <code>from</code> |
| and <code>to</code> are the same this token represent a single value (whole or fractional) in |
| which case <code>isequalcondition</code>> will be <code>true</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:to</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| Ed of numeric range that satisfies the condition (exclusive). Note that if <code>from</code> |
| and <code>to</code> are the same this token represent a single value (whole or fractional) in |
| which case <code>isequalcondition</code>> will be <code>true</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:fromincl</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not start of the numeric range is inclusive |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:toincl</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether or not end of the numeric range is inclusive |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isequalcondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is an equality condition. Note that single numeric values also default to equality |
| condition and this property will be <code>true</code>. Indeed, <code>A is equal to 2</code> and |
| <code>A is 2</code> have the same meaning. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isnotequalcondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is a not-equality condition. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isfromnegativeinfinity</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this range is from negative infinity. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:israngecondition</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this is a range condition. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:istopositiveinfinity</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this range is to positive infinity. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:isfractional</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether this token's value (single numeric value of a range) is a whole or a fractional number. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:unit</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Optional numeric value unit name (see below). |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:num:unittype</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Optional numeric value unit type (see below). |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following table provides possible values for <code><b>nlpcraft:num:unit</b></code> and <code><b>nlpcraft:num:unittype</b></code> |
| properties: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>num:unittype</th> |
| <th>num:unit <sub>possible values</sub></th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr><td><code>mass</code></td><td><code>feet per second</code><br/><code>grams</code><br/><code>kilogram</code><br/><code>grain</code><br/><code>dram</code><br/><code>ounce</code><br/><code>pound</code><br/><code>hundredweight</code><br/><code>ton</code><br/><code>tonne</code><br/><code>slug</code></td> |
| <tr><td><code>torque</code></td><td><code>newton meter</code></td> |
| <tr><td><code>area</code></td><td><code>square meter</code><br/><code>acre</code><br/><code>are</code><br/><code>hectare</code><br/><code>square inches</code><br/><code>square feet</code><br/><code>square yards</code><br/><code>square miles</code></td> |
| <tr><td><code>paper quantity</code></td><td><code>paper bale</code></td> |
| <tr><td><code>force</code></td><td><code>kilopond</code><br/><code>pond</code></td> |
| <tr><td><code>pressure</code></td><td><code>pounds per square inch</code></td> |
| <tr><td><code>solid angle</code></td><td><code>steradian</code></td> |
| <tr><td><code>pressure</code><br/><code>stress</code></td><td><code>pascal</code></td> |
| <tr><td><code>luminous</code></td><td><code>flux</code><br/><code>lumen</code></td> |
| <tr><td><code>amount of substance</code></td><td><code>mole</code></td> |
| <tr><td><code>luminance</code></td><td><code>candela per square metre</code></td> |
| <tr><td><code>angle</code></td><td><code>radian</code><br/><code>degree</code></td> |
| <tr><td><code>magnetic flux density</code><br/><code>magnetic field</code></td><td><code>tesla</code></td> |
| <tr><td><code>power</code><br/><code>radiant flux</code></td><td><code>watt</code></td> |
| <tr><td><code>datetime</code></td><td><code>second</code><br/><code>minute</code><br/><code>hour</code><br/><code>day</code><br/><code>week</code><br/><code>month</code><br/><code>year</code></td> |
| <tr><td><code>electrical inductance</code></td><td><code>henry</code></td> |
| <tr><td><code>electric charge</code></td><td><code>coulomb</code></td> |
| <tr><td><code>temperature</code></td><td><code>kelvin</code><br/><code>centigrade</code><br/><code>fahrenheit</code></td> |
| <tr><td><code>voltage</code><br/><code>electrical</code></td><td><code>volt</code></td> |
| <tr><td><code>momentum</code></td><td><code>kilogram meters per second</code></td> |
| <tr><td><code>amount of heat</code></td><td><code>calorie</code></td> |
| <tr><td><code>electrical capacitance</code></td><td><code>farad</code></td> |
| <tr><td><code>radioactive decay</code></td><td><code>becquerel</code></td> |
| <tr><td><code>electrical conductance</code></td><td><code>siemens</code></td> |
| <tr><td><code>luminous intensity</code></td><td><code>candela</code></td> |
| <tr><td><code>work</code><br/><code>energy</code></td><td><code>joule</code></td> |
| <tr><td><code>quantities</code></td><td><code>dozen</code></td> |
| <tr><td><code>density</code></td><td><code>density</code></td> |
| <tr><td><code>sound</code></td><td><code>decibel</code></td> |
| <tr><td><code>electrical resistance</code><br/><code>impedance</code></td><td><code>ohm</code></td> |
| <tr><td><code>force</code><br/><code>weight</code></td><td><code>newton</code></td> |
| <tr><td><code>light quantity</code></td><td><code>lumen seconds</code></td> |
| <tr><td><code>length</code></td><td><code>meter</code><br/><code>millimeter</code><br/><code>centimeter</code><br/><code>decimeter</code><br/><code>kilometer</code><br/><code>astronomical unit</code><br/><code>light year</code><br/><code>parsec</code><br/><code>inch</code><br/><code>foot</code><br/><code>yard</code><br/><code>mile</code><br/><code>nautical mile</code></td> |
| <tr><td><code>refractive index</code></td><td><code>diopter</code></td> |
| <tr><td><code>frequency</code></td><td><code>hertz</code><br/><code>angular frequency</code></td> |
| <tr><td><code>power</code></td><td><code>kilowatt</code><br/><code>horsepower</code><br/><code>bar</code></td> |
| <tr><td><code>magnetic flux</code></td><td><code>weber</code></td> |
| <tr><td><code>current</code></td><td><code>ampere</code></td> |
| <tr><td><code>acceleration of gravity</code></td><td><code>gravity imperial</code><br/><code>gravity metric</code></td> |
| <tr><td><code>volume</code></td><td><code>cubic meter</code><br/><code>liter</code><br/><code>milliliter</code><br/><code>centiliter</code><br/><code>deciliter</code><br/><code>hectoliter</code><br/><code>cubic inch</code><br/><code>cubic foot</code><br/><code>cubic yard</code><br/><code>acre-foot</code><br/><code>teaspoon</code><br/><code>tablespoon</code><br/><code>fluid ounce</code><br/><code>cup</code><br/><code>gill</code><br/><code>pint</code><br/><code>quart</code><br/><code>gallon</code></td> |
| <tr><td><code>speed</code></td><td><code>miles per hour</code><br/><code>meters per second</code></td> |
| <tr><td><code>illuminance</code></td><td><code>lux</code></td> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:city" class="section-sub-title">Token ID <code>nlpcraft:city</code></span> |
| <p> |
| This token denotes a city. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:city:city</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the city. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for city's country (see below). |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:city:citymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for city (see below). |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:city:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:city:citymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>latitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>City latitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>longitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>City longitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code></td> |
| <td><code>java.lang.Long</code></td> |
| <td>City population.</td> |
| </tr> |
| <tr> |
| <td><code><b>elevation</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Integer</code></td> |
| <td>Optional city elevation in meters.</td> |
| </tr> |
| <tr> |
| <td><code><b>timezone</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>City timezone.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:continent" class="section-sub-title">Token ID <code>nlpcraft:continent</code></span> |
| <p> |
| This token denotes a continent. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:continent:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the continent.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:subcontinent" class="section-sub-title">Token ID <code>nlpcraft:subcontinent</code></span> |
| <p> |
| This token denotes a subcontinent. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:subcontinent:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:subcontinent:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the subcontinent.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:metro" class="section-sub-title">Token ID <code>nlpcraft:metro</code></span> |
| <p> |
| This token denotes a metro area. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:metro:metro</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Name of the metro area.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:region" class="section-sub-title">Token ID <code>nlpcraft:region</code></span> |
| <p> |
| This token denotes a geographical region. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| </tbody> |
| <tr> |
| <td><code><b>nlpcraft:region:region</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the region. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:region:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for region's country (see below). |
| </td> |
| </tr> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:region:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:country" class="section-sub-title">Token ID <code>nlpcraft:country</code></span> |
| <p> |
| This token denotes a country. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| </tbody> |
| <tr> |
| <td><code><b>nlpcraft:country:country</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Name of the country. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Continent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:subcontinent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Subcontinent name. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:country:countrymeta</b></code></td> |
| <td><code>java.util.Map</code></td> |
| <td> |
| Supplemental metadata for region's country (see below). |
| </td> |
| </tr> |
| </table> |
| <p> |
| Following tables provides possible values for <code><b>nlpcraft:country:countrymeta</b></code> map. The data is |
| obtained from <a href="http://unstats.un.org" target=_>The United Nations Statistics Division</a> datasets: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Key</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>iso</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>iso3</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO 3166 country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>isocode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>ISO country code.</td> |
| </tr> |
| <tr> |
| <td><code><b>capital</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country capital city name.</td> |
| </tr> |
| <tr> |
| <td><code><b>area</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Optional country surface area.</td> |
| </tr> |
| <tr> |
| <td><code><b>population</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.Long</code></td> |
| <td>Optional country population.</td> |
| </tr> |
| <tr> |
| <td><code><b>continent</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country continent.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencycode</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency code.</td> |
| </tr> |
| <tr> |
| <td><code><b>currencyname</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td>Country currency name.</td> |
| </tr> |
| <tr> |
| <td><code><b>phone</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country phone code.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcodeformat</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code format.</td> |
| </tr> |
| <tr> |
| <td><code><b>postalcoderegex</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country postal code regular expression.</td> |
| </tr> |
| <tr> |
| <td><code><b>languages</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of languages.</td> |
| </tr> |
| <tr> |
| <td><code><b>neighbours</b></code> <sub>opt.</sub></td> |
| <td><code>java.lang.String</code></td> |
| <td>Optional country list of neighbours.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:coordinate" class="section-sub-title">Token ID <code>nlpcraft:coordinate</code></span> |
| <p> |
| This token denotes a latitude and longitude coordinate. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>coordinate:latitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Coordinate latitude.</td> |
| </tr> |
| <tr> |
| <td><code><b>coordinate:longitude</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Coordinate longitude.</td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:sort" class="section-sub-title">Token ID <code>nlpcraft:sort</code></span> |
| <p> |
| This token denotes a sorting or ordering function. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:sort:subjindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>One of more indexes of the target tokens (i.e. the token that being sorted).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:sort:byindexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Zero or more (i.e. optional) indexes of the reference token (i.e. the token being sorted by).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:sort:asc</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether sorting is in ascending or descending order. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:limit" class="section-sub-title">Token ID <code>nlpcraft:limit</code></span> |
| <p> |
| This token denotes a numeric limit value (like in "top 10" or "bottom five"). |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:limit:indexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Index (always only one) of the reference token (i.e. the token being limited).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:limit:asc</b></code></td> |
| <td><code>java.lang.Boolean</code></td> |
| <td> |
| Whether limit order is ascending or descending. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:limit:limit</b></code></td> |
| <td><code>java.lang.Integer</code></td> |
| <td> |
| Numeric value of the limit. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="nlpcraft:relation" class="section-sub-title">Token ID <code>nlpcraft:relation</code></span> |
| <p> |
| This token denotes a numeric limit value (like in "top 10" or "bottom five"). |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>nlpcraft:relation:indexes</b></code></td> |
| <td><code>java.util.List<Integer></code></td> |
| <td>Index (always only one) of the reference token (i.e. the token being related to).</td> |
| </tr> |
| <tr> |
| <td><code><b>nlpcraft:relation:type</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Type of the relation. One of the following values: |
| <ul> |
| <li><code>compare</code></li> |
| <li><code>correlate</code></li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="google:xxx" class="section-sub-title">Token ID <code>google:xxx</code></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://cloud.google.com/natural-language/">Google APIs</a>, i.e. |
| <code>google:person</code>, <code>google:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>google:salience</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by Google Natural Language.</td> |
| </tr> |
| <tr> |
| <td><code><b>google:meta</b></code></td> |
| <td><code>java.util.Map<String></code></td> |
| <td> |
| Map-based container for Google Natural Language specific properties. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionsbeginoffsets</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mention begin offsets in the original normalized text. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionscontents</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mentions. |
| </td> |
| </tr> |
| <tr> |
| <td><code><b>google:mentionstypes</b></code></td> |
| <td><code>java.util.List<String></code></td> |
| <td> |
| List of the mention types. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="stanford:xxx" class="section-sub-title">Token ID <code>stanford:xxx</code></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a>, i.e. |
| <code>stanford:person</code>, <code>stanford:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>stanford:confidence</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by Stanford CoreNLP.</td> |
| </tr> |
| <tr> |
| <td><code><b>stanford:nne</b></code></td> |
| <td><code>java.lang.String</code></td> |
| <td> |
| Normalized Named Entity (NNE) text. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="spacy:xxx" class="section-sub-title">Token ID <code>spacy:xxx</code></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://spacy.io/">spaCy</a>, i.e. |
| <code>spacy:person</code>, <code>spacy:location</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>spacy:vector</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>spaCy span vector. </td> |
| </tr> |
| <tr> |
| <td><code><b>spacy:sentiment</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td> |
| A scalar value indicating the positivity or negativity of the token. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <br/> |
| <span id="opennlp:xxx" class="section-sub-title">Token ID <code>opennlp:xxx</code></span> |
| <p> |
| These tokens denote <code>xxx</code> that is a lower case name of the named entity |
| in <a target=_ href="https://opennlp.apache.org/">Apache OpenNLP</a>, i.e. |
| <code>opennlp:person</code>, <code>opennlp:money</code>, etc. |
| Additionally to <code><b>nlpcraft:nlp:xxx</b></code> properties this type of token will have the following |
| metadata properties all of which are <b>mandatory</b> unless otherwise noted. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Property</th> |
| <th>Java Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code><b>opennlp:probability</b></code></td> |
| <td><code>java.lang.Double</code></td> |
| <td>Correctness probability of this token by OpenNLP.</td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Model Overview</a></li> |
| <li><a href="#dataflow">Model Dataflow</a></li> |
| <li><a href="#lifecycle">Model Lifecycle</a></li> |
| <li><a href="#config">Model Configuration</a></li> |
| <li><a href="#elements">Model Elements</a></li> |
| <li><a href="#dsl">Token DSL</a></li> |
| <li><a href="#logic">Model Logic</a></li> |
| <li><a href="#builtin">Built-In Tokens</a></li> |
| <li><a href="#meta">Token Metadata</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |
| |
| |