| --- |
| active_crumb: Docs |
| layout: documentation |
| id: key_concepts |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html"> |
| <section id="overview"> |
| <h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| NLPCraft is based on three main concepts: |
| </p> |
| <ul> |
| <li> |
| {% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation. |
| </li> |
| <li> |
| {% scaladoc NCPipeline NCPipeline %} is a part of the model configuration that defines |
| specifics of the user input processing. |
| </li> |
| <li> |
| {% scaladoc NCModelClient NCModelClient %} is responsible for communication with the data model. |
| </li> |
| </ul> |
| |
| <p>Here's the typical code structure when working with NLPCraft:</p> |
| |
| <pre class="brush: scala, highlight: []"> |
| // Init data model. |
| val mdl = new CustomNlpModel() |
| |
| // Creates client for given model. |
| val cli = new NCModelClient(mdl) |
| |
| // Sends text request to model by user ID "user01". |
| val result = client.ask("Some user command", "user01") |
| </pre> |
| </section> |
| |
| <section id="terminology"> |
| <h2 class="section-title">Terminology<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Let's start with the nomenclature of the main NLPCraft types: |
| </p> |
| |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><b>{% scaladoc NCToken NCToken %}</b></td> |
| <td> |
| <code>Token</code> is simple string, part of user input, which is obtained by splitting user input |
| according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens: |
| "<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>". |
| Usually <code>tokens</code> are words and punctuation symbols which also contain additional |
| information like point of speech tags, relative position in the overall input text, stopword flag, |
| stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCEntity NCEntity %}</b></td> |
| <td> |
| <code>Entity</code> typically represents a real-world object, such as a person, location, organization, |
| or product that can often be denoted with a proper name. It can be abstract or have a physical existence. |
| Each <code>entity</code> consists of zero or more <code>tokens</code>. Combination of entities form one or more parsing |
| <code>variants</code>. |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCVariant NCVariant %}</b></td> |
| <td> |
| <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group |
| of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible |
| interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>. |
| For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>, |
| one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>. |
| Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>. |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCPipeline NCPipeline %}</b></td> |
| <td> |
| <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group |
| of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible |
| interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>. |
| For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>, |
| one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>. |
| </td> |
| </tr> |
| <tr> |
| <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td> |
| <td> |
| <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group |
| of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible |
| interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>. |
| For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>, |
| one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>. |
| </td> |
| </tr> |
| |
| </tbody> |
| </table> |
| |
| <figure> |
| <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png"> |
| <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption> |
| </figure> |
| |
| <p> |
| When <code>Variant</code> is prepared, the suitable <code>Intent</code> is trying to matched with it. |
| </p> |
| |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Term</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| |
| <tr> |
| <td><code>Intent</code></td> |
| <td> |
| <code>Intent</code> is user defined callback method and rule according to which this callback should be called. |
| Most often rule is some template based on expected set of <code>entities</code> in user input, |
| but it can be defined more flexible. |
| Parameters extracted from user text input are passed into callback method. |
| This method execution result is provided to user as answer on his request. |
| <code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by |
| <code>intent</code> rules via <a href="intent-matching.html">IDL</a>. |
| </td> |
| </tr> |
| <tr> |
| <td><code>IDL</code></td> |
| <td> |
| IDL, Intent Definition Language, is a relatively straightforward declarative language which |
| defines a match between the parsed user input represented as the collection of tokens, |
| and the user-define callback method. |
| IDL intents are bound to their callbacks via Java annotation and can be located |
| in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files. |
| </td> |
| </tr> |
| <tr> |
| <td><code>Callback</code></td> |
| <td> |
| The user defined Scala method which mapped to the <code>intent</code>. |
| This method receives as its parameters normalized values from user input text according to |
| IDL matched terms. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p> |
| So, <code>Data Model</code> must be able to do tree following things: |
| </p> |
| |
| <ul> |
| <li> |
| Parse user input text as the <code>tokens</code>. |
| They are input for searching <code>named entities</code>. |
| <code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>. |
| </li> |
| <li> |
| Find <code>named entities</code> based on these parsed <code>tokens</code>. |
| They are input for searching <code>intents</code>. |
| <code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>. |
| </li> |
| <li> |
| Prepare <code>intents</code> with their callbacks methods which contain business logic. |
| These methods should be defined directly in the model class definition or the model should have references on them. |
| It will be described below. Callback can de defined in model scala class directly or via references. |
| Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details. |
| </li> |
| </ul> |
| |
| <p> |
| As example, let's prepare the system which can call persons from your contact list. |
| Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>". |
| For solving this task this model should be able to recognize in user text following entities: |
| <code>command</code> and <code>person</code> to apply this command. |
| </p> |
| |
| <p> |
| So, when request "<b>Please call to John Smith</b>" received, our model should be able to: |
| </p> |
| |
| <ul> |
| <li> |
| Parse tokens splitting user text input: |
| "<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>". |
| </li> |
| <li> |
| Find two named entities: |
| <ul> |
| <li> |
| <code>command</code> by token "<code>call</code>". |
| </li> |
| <li> |
| <code>person</code> by tokens "<code>john</code>" and "<code>smith</code>". |
| </li> |
| </ul> |
| </li> |
| <li> |
| Have prepared intent: |
| <pre class="brush: scala, highlight: [1, 2, 5, 6]"> |
| @NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}") |
| def onCommand( |
| ctx: NCContext, |
| im: NCIntentMatch, |
| @NCIntentTerm("command") command: NCEntity, |
| @NCIntentTerm("person") person: NCEntity |
| ): NCResult = ? // Implement business logic here. |
| </pre> |
| |
| <ul> |
| <li> |
| <code>Line 1</code> defines intent <code>call</code> with two conditions |
| which expects two named entities in user input text. |
| </li> |
| <li> |
| <code>Line 2</code> defines related callback method <code>onCommand()</code>. |
| </li> |
| <li> |
| <code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to |
| <code>call</code> intent terms conditions. You can extract normalized value |
| <code>john smith</code> from the <code>person</code> parameter and use it in the method body |
| for getting his phone number etc. |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </section> |
| |
| <section id="model-configuration"> |
| <h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| <code>Data Model</code> configuration represented as |
| {% scaladoc NCModelConfig NCModelConfig %} |
| contains set of parameters which are described below. |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Name</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>id</code>, <code>name</code> and <code>version</code></td> |
| <td> |
| Mandatory model properties. |
| </td> |
| </tr> |
| <tr> |
| <td><code>description</code>, <code>origin</code></td> |
| <td> |
| Optional model properties. |
| </td> |
| </tr> |
| <tr> |
| <td><code>conversationTimeout</code></td> |
| <td> |
| Timeout of the user's conversation. |
| If user doesn't communicate with the model this time period STM is going to be cleared. |
| Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details. |
| It is the mandatory parameter with default value. |
| </td> |
| </tr> |
| <tr> |
| <td><code>conversationDepth</code></td> |
| <td> |
| Maximum supported depth the user's conversation. |
| Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details. |
| It is the mandatory parameter with default value. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| |
| <section id="model-pipeline"> |
| <h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and |
| contains following components: |
| </p> |
| |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Component</th> |
| <th>Mandatory</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>{% scaladoc NCTokenParser NCTokenParser %}</td> |
| <td>Mandatory single</td> |
| <td> |
| <code>Token parser</code> should be able to parse user input plain text and split this text |
| into <code>tokens</code> list. |
| NLPCraft provides two default English language implementations of token parser. |
| Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and |
| <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td> |
| <td>Optional list</td> |
| <td> |
| <code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens, |
| like part of speech, quote, stop-words flags or any other. |
| NLPCraft provides built-in English language set of token enrichers implementations. |
| Here is an <a href="custom-components.html#token-enrichers">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCTokenValidator NCTokenValidator %}</td> |
| <td>Optional list</td> |
| <td> |
| <code>Token validator</code> is a component which allow to inspect prepared tokens and |
| throw an exception to break user input processing. |
| Here is an <a href="custom-components.html#token-validators">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCEntityParser NCEntityParser %}</td> |
| <td>Mandatory list</td> |
| <td> |
| <code>Entity parser</code> is a component which allow to find user defined named entities |
| based on prepared tokens as input. |
| NLPCraft provides wrappers for named-entity recognition components of |
| <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and |
| <a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations. |
| Note that at least one entity parser must be defined. |
| Here is an <a href="custom-components.html#entity-parsers">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td> |
| <td>Optional list</td> |
| <td> |
| <code>Entity enricher</code> is component which allows to add additional properties for prepared entities. |
| Can be useful for extending existing entity enrichers functionality. |
| Here is an <a href="custom-components.html#entity-enrichers">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCEntityMapper NCEntityMapper %}</td> |
| <td>Optional list</td> |
| <td> |
| <code>Entity mappers</code> is component which allows to map one set of entities to another after the entities |
| were parsed and enriched. Can be useful for building complex parsers based on existing. |
| Here is an <a href="custom-components.html#entity-mappers">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCEntityValidator NCEntityValidator %}</td> |
| <td>Optional list</td> |
| <td> |
| <code>Entity validator</code> is a component which allow to inspect prepared entities and |
| throw an exception to break user input processing. |
| Here is an <a href="custom-components.html#entity-validators">example</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> {% scaladoc NCVariantFilter NCVariantFilter %}</td> |
| <td>Optional single</td> |
| <td> |
| <code>Variant filter</code> is a component which allows filtering detected variants and |
| rejecting undesirable. |
| Here is an <a href="custom-components.html#variant-filters">example</a>. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <figure> |
| <img alt="pipeline" class="img-fluid" src="/images/pipeline.png"> |
| <figcaption><b>Fig 2.</b> Pipeline</figcaption> |
| </figure> |
| |
| <p> |
| Below {% scaladoc NCModel NCModel %} creation example. |
| {% scaladoc NCPipeline NCPipeline %} is prepared using |
| {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper. |
| </p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = |
| new NCPipelineBuilder(). |
| withTokenParser(new NCFrTokenParser()). |
| withTokenEnricher(new NCFrLemmaPosTokenEnricher()). |
| withTokenEnricher(new NCFrStopWordsTokenEnricher()). |
| withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")). |
| build |
| val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0") |
| |
| val mdl = new NCModel(cfg, pipeline): |
| // Add your callbacks definition or references on them here. |
| </pre> |
| |
| <p> |
| This flexible system allows to create any pipelines on any language. |
| You can collect NLPCraft predefined components, write your own and easy reuse custom components. |
| </p> |
| </section> |
| |
| <section id="model-behavior"> |
| <h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| There are also several {% scaladoc NCModel NCModel %} |
| callbacks that you can override to affect model behavior during |
| <a href="/intent-matching.html#model_callbacks">intent matching</a> |
| to perform logging, debugging, statistic or usage collection, explicit update or initialization of |
| conversation context, security audit or validation: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Method</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>{% scaladoc NCModel#onContext-38d onContext() %}</td> |
| <td> |
| Overriding this method allows to prepare result before intent matching. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td> |
| <td> |
| Overriding this method allows to reject matched intent and continue matching process. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td> |
| <td> |
| Overriding this method allows to replace callback method execution result. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td> |
| <td> |
| Overriding this method allows to change operation result when rejection occurs. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModel#onError-fffff759 onError() %}</td> |
| <td> |
| Overriding this method allows to change operation result when any error occurs. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| |
| <section id="client"> |
| <h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| <code>Client</code> represented as {% scaladoc NCModelClient NCModelClient %} |
| is necessary for communication with the <code>Data Model</code>. Base client methods are described below. |
| </p> |
| |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Method</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td> |
| <td> |
| Passes user text input to the model and receives back execution |
| {% scaladoc NCResult NCResult %} or |
| rejection exception if there isn't any triggered intents. |
| {% scaladoc NCResult NCResult %} is wrapper on |
| callback method execution result with additional information. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td> |
| <td> |
| Passes user text input to the model and receives back callback and its parameters or |
| rejection exception if there isn't any triggered intents. |
| Main difference from <code>ask</code> that triggered intent callback method is not called. |
| This method and this parameter can be useful in tests scenarios. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td> |
| <td> |
| Clears STM state. Memory is cleared wholly or with some predicate. |
| Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details. |
| Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td> |
| <td> |
| Clears dialog state. Dialog is cleared wholly or with some predicate. |
| Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details. |
| Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}. |
| </td> |
| </tr> |
| <tr> |
| <td>{% scaladoc NCModelClient#close-94c close() %}</td> |
| <td> |
| Closes client. You can't call another client's methods after this method was closed. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Key Concepts</a></li> |
| <li><a href="#terminology">Terminology</a></li> |
| <!-- <li><a href="#model-configuration">Model Configuration</a></li> --> |
| <!-- <li><a href="#model-pipeline">Model Pipeline</a></li> --> |
| <!-- <li><a href="#model-behavior">Model Behavior Overriding</a></li> --> |
| <!-- <li><a href="#client">Client Responsibility</a></li> --> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |