WIP
diff --git a/key-concepts-old.html b/key-concepts-old.html
new file mode 100644
index 0000000..3bd5ed8
--- /dev/null
+++ b/key-concepts-old.html
@@ -0,0 +1,587 @@
+---
+active_crumb: Docs
+layout: documentation
+id: key_concepts
+---
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
+ <section id="overview">
+ <h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>
+ NLPCraft is based on three main concepts:
+ </p>
+ <ul>
+ <li>
+ {% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation.
+ </li>
+ <li>
+ {% scaladoc NCPipeline NCPipeline %} is a part of the model configuration that defines
+ specifics of the user input processing.
+ </li>
+ <li>
+ {% scaladoc NCModelClient NCModelClient %} is responsible for interaction with the data model.
+ </li>
+ </ul>
+
+ <p>Here's the typical code structure when working with NLPCraft:</p>
+
+ <pre class="brush: scala, highlight: []">
+ // Init data model.
+ val mdl = new CustomNlpModel()
+
+ // Creates client for given model.
+ val cli = new NCModelClient(mdl)
+
+ // Sends text request to model by user ID "user01".
+ val result = client.ask("Some user command", "user01")
+ </pre>
+ </section>
+
+ <section id="terminology">
+ <h2 class="section-title">Terminology<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ Let's start with the nomenclature of the main NLPCraft types:
+ </p>
+
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Type</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><b>{% scaladoc NCModel NCModel %}</b></td>
+ <td>
+ <code>Model</code> is the main component in NLPCraft. User-define data model contains its {% scaladoc NCModelConfig NCModelConfig %},
+ input processing {% scaladoc NCPipeline NCPipeline %} and life-cycle callbacks.
+ NLPCraft employs model-as-a-code approach where entire data model is an implementation of just
+ this interface. The instance of this interface is passed to {% scaladoc NCModelClient NCModelClient %} class.
+ Note that the model-as-a-code approach natively supports any software life cycle tools and frameworks
+ like various build tools, CI/SCM tools, IDEs, etc. You don't need any additional tools to manage some
+ aspects of your data models - your entire model and all of its components are part of your project's source code.
+ Note that in most cases, one would use a convenient {% scaladoc NCModelAdapter NCModelAdapter %} adapter to implement this interface.
+ </td>
+ </tr>
+ <tr>
+ <td><b>{% scaladoc NCToken NCToken %}</b></td>
+ <td>
+ <code>Token</code> is simple string, part of user input, which is obtained by splitting user input
+ according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens:
+ "<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
+ Usually <code>tokens</code> are words and punctuation symbols which also contain additional
+ information like point of speech tags, relative position in the overall input text, stopword flag,
+ stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><b>{% scaladoc NCEntity NCEntity %}</b></td>
+ <td>
+ <code>Entity</code> typically represents a real-world object, such as a person, location, organization,
+ or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
+ Each <code>entity</code> consists of zero or more <code>tokens</code>. Combination of entities form one or more parsing
+ <code>variants</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><b>{% scaladoc NCVariant NCVariant %}</b></td>
+ <td>
+ <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
+ of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
+ interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
+ For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
+ one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+ Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>.
+ </td>
+ </tr>
+ <tr>
+ <td><b>{% scaladoc NCPipeline NCPipeline %}</b></td>
+ <td>
+ <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
+ of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
+ pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
+ The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
+ </td>
+ </tr>
+ <tr>
+ <td><b>{% scaladoc NCModelCofig NCModelConfig %}</b></td>
+ <td>
+ <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
+ of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
+ pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
+ The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
+ </td>
+ </tr>
+ <tr>
+ <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
+ <td>
+ <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
+ of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
+ interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
+ For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
+ one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+ </td>
+ </tr>
+
+ </tbody>
+ </table>
+
+ <figure>
+ <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
+ <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
+ </figure>
+
+ <p>
+ When <code>Variant</code> is prepared, the suitable <code>Intent</code> is trying to matched with it.
+ </p>
+
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Term</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+
+ <tr>
+ <td><code>Intent</code></td>
+ <td>
+ <code>Intent</code> is user defined callback method and rule according to which this callback should be called.
+ Most often rule is some template based on expected set of <code>entities</code> in user input,
+ but it can be defined more flexible.
+ Parameters extracted from user text input are passed into callback method.
+ This method execution result is provided to user as answer on his request.
+ <code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by
+ <code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
+ </td>
+ </tr>
+ <tr>
+ <td><code>IDL</code></td>
+ <td>
+ IDL, Intent Definition Language, is a relatively straightforward declarative language which
+ defines a match between the parsed user input represented as the collection of tokens,
+ and the user-define callback method.
+ IDL intents are bound to their callbacks via Java annotation and can be located
+ in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.
+ </td>
+ </tr>
+ <tr>
+ <td><code>Callback</code></td>
+ <td>
+ The user defined Scala method which mapped to the <code>intent</code>.
+ This method receives as its parameters normalized values from user input text according to
+ IDL matched terms.
+ </td>
+ </tr>
+ </tbody>
+ </table>
+
+ <p>
+ So, <code>Data Model</code> must be able to do tree following things:
+ </p>
+
+ <ul>
+ <li>
+ Parse user input text as the <code>tokens</code>.
+ They are input for searching <code>named entities</code>.
+ <code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
+ </li>
+ <li>
+ Find <code>named entities</code> based on these parsed <code>tokens</code>.
+ They are input for searching <code>intents</code>.
+ <code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
+ </li>
+ <li>
+ Prepare <code>intents</code> with their callbacks methods which contain business logic.
+ These methods should be defined directly in the model class definition or the model should have references on them.
+ It will be described below. Callback can de defined in model scala class directly or via references.
+ Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
+ </li>
+ </ul>
+
+ <p>
+ As example, let's prepare the system which can call persons from your contact list.
+ Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
+ For solving this task this model should be able to recognize in user text following entities:
+ <code>command</code> and <code>person</code> to apply this command.
+ </p>
+
+ <p>
+ So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
+ </p>
+
+ <ul>
+ <li>
+ Parse tokens splitting user text input:
+ "<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
+ </li>
+ <li>
+ Find two named entities:
+ <ul>
+ <li>
+ <code>command</code> by token "<code>call</code>".
+ </li>
+ <li>
+ <code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
+ </li>
+ </ul>
+ </li>
+ <li>
+ Have prepared intent:
+ <pre class="brush: scala, highlight: [1, 2, 5, 6]">
+ @NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}")
+ def onCommand(
+ ctx: NCContext,
+ im: NCIntentMatch,
+ @NCIntentTerm("command") command: NCEntity,
+ @NCIntentTerm("person") person: NCEntity
+ ): NCResult = ? // Implement business logic here.
+ </pre>
+
+ <ul>
+ <li>
+ <code>Line 1</code> defines intent <code>call</code> with two conditions
+ which expects two named entities in user input text.
+ </li>
+ <li>
+ <code>Line 2</code> defines related callback method <code>onCommand()</code>.
+ </li>
+ <li>
+ <code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
+ <code>call</code> intent terms conditions. You can extract normalized value
+ <code>john smith</code> from the <code>person</code> parameter and use it in the method body
+ for getting his phone number etc.
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </section>
+
+ <section id="model-configuration">
+ <h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>
+ <code>Data Model</code> configuration represented as
+ {% scaladoc NCModelConfig NCModelConfig %}
+ contains set of parameters which are described below.
+ </p>
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Name</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><code>id</code>, <code>name</code> and <code>version</code></td>
+ <td>
+ Mandatory model properties.
+ </td>
+ </tr>
+ <tr>
+ <td><code>description</code>, <code>origin</code></td>
+ <td>
+ Optional model properties.
+ </td>
+ </tr>
+ <tr>
+ <td><code>conversationTimeout</code></td>
+ <td>
+ Timeout of the user's conversation.
+ If user doesn't communicate with the model this time period STM is going to be cleared.
+ Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+ It is the mandatory parameter with default value.
+ </td>
+ </tr>
+ <tr>
+ <td><code>conversationDepth</code></td>
+ <td>
+ Maximum supported depth the user's conversation.
+ Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+ It is the mandatory parameter with default value.
+ </td>
+ </tr>
+ </tbody>
+ </table>
+ </section>
+
+ <section id="model-pipeline">
+ <h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>
+ Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and
+ contains following components:
+ </p>
+
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Component</th>
+ <th>Mandatory</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>{% scaladoc NCTokenParser NCTokenParser %}</td>
+ <td>Mandatory single</td>
+ <td>
+ <code>Token parser</code> should be able to parse user input plain text and split this text
+ into <code>tokens</code> list.
+ NLPCraft provides two default English language implementations of token parser.
+ Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
+ <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td>
+ <td>Optional list</td>
+ <td>
+ <code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens,
+ like part of speech, quote, stop-words flags or any other.
+ NLPCraft provides built-in English language set of token enrichers implementations.
+ Here is an <a href="custom-components.html#token-enrichers">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCTokenValidator NCTokenValidator %}</td>
+ <td>Optional list</td>
+ <td>
+ <code>Token validator</code> is a component which allow to inspect prepared tokens and
+ throw an exception to break user input processing.
+ Here is an <a href="custom-components.html#token-validators">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCEntityParser NCEntityParser %}</td>
+ <td>Mandatory list</td>
+ <td>
+ <code>Entity parser</code> is a component which allow to find user defined named entities
+ based on prepared tokens as input.
+ NLPCraft provides wrappers for named-entity recognition components of
+ <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
+ <a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations.
+ Note that at least one entity parser must be defined.
+ Here is an <a href="custom-components.html#entity-parsers">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td>
+ <td>Optional list</td>
+ <td>
+ <code>Entity enricher</code> is component which allows to add additional properties for prepared entities.
+ Can be useful for extending existing entity enrichers functionality.
+ Here is an <a href="custom-components.html#entity-enrichers">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCEntityMapper NCEntityMapper %}</td>
+ <td>Optional list</td>
+ <td>
+ <code>Entity mappers</code> is component which allows to map one set of entities to another after the entities
+ were parsed and enriched. Can be useful for building complex parsers based on existing.
+ Here is an <a href="custom-components.html#entity-mappers">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCEntityValidator NCEntityValidator %}</td>
+ <td>Optional list</td>
+ <td>
+ <code>Entity validator</code> is a component which allow to inspect prepared entities and
+ throw an exception to break user input processing.
+ Here is an <a href="custom-components.html#entity-validators">example</a>.
+ </td>
+ </tr>
+ <tr>
+ <td> {% scaladoc NCVariantFilter NCVariantFilter %}</td>
+ <td>Optional single</td>
+ <td>
+ <code>Variant filter</code> is a component which allows filtering detected variants and
+ rejecting undesirable.
+ Here is an <a href="custom-components.html#variant-filters">example</a>.
+ </td>
+ </tr>
+ </tbody>
+ </table>
+
+ <figure>
+ <img alt="pipeline" class="img-fluid" src="/images/pipeline.png">
+ <figcaption><b>Fig 2.</b> Pipeline</figcaption>
+ </figure>
+
+ <p>
+ Below {% scaladoc NCModel NCModel %} creation example.
+ {% scaladoc NCPipeline NCPipeline %} is prepared using
+ {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper.
+ </p>
+
+ <pre class="brush: scala, highlight: []">
+ val pipeline =
+ new NCPipelineBuilder().
+ withTokenParser(new NCFrTokenParser()).
+ withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
+ withTokenEnricher(new NCFrStopWordsTokenEnricher()).
+ withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
+ build
+ val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
+
+ val mdl = new NCModel(cfg, pipeline):
+ // Add your callbacks definition or references on them here.
+ </pre>
+
+ <p>
+ This flexible system allows to create any pipelines on any language.
+ You can collect NLPCraft predefined components, write your own and easy reuse custom components.
+ </p>
+ </section>
+
+ <section id="model-behavior">
+ <h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>
+ There are also several {% scaladoc NCModel NCModel %}
+ callbacks that you can override to affect model behavior during
+ <a href="/intent-matching.html#model_callbacks">intent matching</a>
+ to perform logging, debugging, statistic or usage collection, explicit update or initialization of
+ conversation context, security audit or validation:
+ </p>
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Method</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>{% scaladoc NCModel#onContext-38d onContext() %}</td>
+ <td>
+ Overriding this method allows to prepare result before intent matching.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td>
+ <td>
+ Overriding this method allows to reject matched intent and continue matching process.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td>
+ <td>
+ Overriding this method allows to replace callback method execution result.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td>
+ <td>
+ Overriding this method allows to change operation result when rejection occurs.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModel#onError-fffff759 onError() %}</td>
+ <td>
+ Overriding this method allows to change operation result when any error occurs.
+ </td>
+ </tr>
+ </tbody>
+ </table>
+ </section>
+
+ <section id="client">
+ <h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+ <p>
+ <code>Client</code> represented as {% scaladoc NCModelClient NCModelClient %}
+ is necessary for communication with the <code>Data Model</code>. Base client methods are described below.
+ </p>
+
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Method</th>
+ <th>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td>
+ <td>
+ Passes user text input to the model and receives back execution
+ {% scaladoc NCResult NCResult %} or
+ rejection exception if there isn't any triggered intents.
+ {% scaladoc NCResult NCResult %} is wrapper on
+ callback method execution result with additional information.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td>
+ <td>
+ Passes user text input to the model and receives back callback and its parameters or
+ rejection exception if there isn't any triggered intents.
+ Main difference from <code>ask</code> that triggered intent callback method is not called.
+ This method and this parameter can be useful in tests scenarios.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td>
+ <td>
+ Clears STM state. Memory is cleared wholly or with some predicate.
+ Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+ Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td>
+ <td>
+ Clears dialog state. Dialog is cleared wholly or with some predicate.
+ Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+ Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}.
+ </td>
+ </tr>
+ <tr>
+ <td>{% scaladoc NCModelClient#close-94c close() %}</td>
+ <td>
+ Closes client. You can't call another client's methods after this method was closed.
+ </td>
+ </tr>
+ </tbody>
+ </table>
+ </section>
+</div>
+<div class="col-md-2 third-column">
+ <ul class="side-nav">
+ <li class="side-nav-title">On This Page</li>
+ <li><a href="#overview">Key Concepts</a></li>
+ <li><a href="#terminology">Terminology</a></li>
+<!-- <li><a href="#model-configuration">Model Configuration</a></li> -->
+<!-- <li><a href="#model-pipeline">Model Pipeline</a></li> -->
+<!-- <li><a href="#model-behavior">Model Behavior Overriding</a></li> -->
+<!-- <li><a href="#client">Client Responsibility</a></li> -->
+ {% include quick-links.html %}
+ </ul>
+</div>
+
+
+
+
diff --git a/key-concepts.html b/key-concepts.html
index ec3a5ae..b3b3b18 100644
--- a/key-concepts.html
+++ b/key-concepts.html
@@ -37,7 +37,7 @@
specifics of the user input processing.
</li>
<li>
- {% scaladoc NCModelClient NCModelClient %} is responsible for communication with the data model.
+ {% scaladoc NCModelClient NCModelClient %} is responsible for interaction with the data model.
</li>
</ul>
@@ -56,7 +56,7 @@
</section>
<section id="terminology">
- <h2 class="section-title">Terminology<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <h2 class="section-title">Main Types<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Let's start with the nomenclature of the main NLPCraft types:
</p>
@@ -98,8 +98,10 @@
<td>
<code>Entity</code> typically represents a real-world object, such as a person, location, organization,
or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
- Each <code>entity</code> consists of zero or more <code>tokens</code>. Combination of entities form one or more parsing
- <code>variants</code>.
+ Each <code>entity</code> consists of zero or more <code>tokens</code> and therefore is represented by zero
+ or more substrings from the original input text. Note that entities may have only a very loose mapping back
+ to the original text as entities represent a higher-level abstractions compared to tokens. Combination of
+ entities form one or more parsing <code>variants</code>.
</td>
</tr>
<tr>
@@ -123,461 +125,33 @@
</td>
</tr>
<tr>
- <td><b>{% scaladoc NCModelCofig NCModelConfig %}</b></td>
- <td>
- <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
- of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
- pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
- The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
- </td>
- </tr>
- <tr>
<td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
<td>
- <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
- of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
- interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
- For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
- one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+ <a target="scaladoc" href="/apis/latest/">@NCIntent</a> annotation binds a declarative intent to its
+ callback method. The intent generally refers to the goal that the end-user had in mind when speaking
+ or typing the input utterance. The intent has a <em>declarative part or template</em> written in <a href="/intent-matching.html#idl">IDL - Intent Definition Language</a>
+ that strictly defines a particular form the user input.
+ Intent is also bound to a callback method that will be executed
+ when that intent, i.e. its template, is detected as the best match for a given input.
</td>
</tr>
</tbody>
</table>
-
+ <p>
+ Here's the illustration on how a user input text transforms into a set of parsing variants:
+ </p>
<figure>
<img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
<figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
</figure>
-
- <p>
- When <code>Variant</code> is prepared, the suitable <code>Intent</code> is trying to matched with it.
- </p>
-
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Term</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
-
- <tr>
- <td><code>Intent</code></td>
- <td>
- <code>Intent</code> is user defined callback method and rule according to which this callback should be called.
- Most often rule is some template based on expected set of <code>entities</code> in user input,
- but it can be defined more flexible.
- Parameters extracted from user text input are passed into callback method.
- This method execution result is provided to user as answer on his request.
- <code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by
- <code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
- </td>
- </tr>
- <tr>
- <td><code>IDL</code></td>
- <td>
- IDL, Intent Definition Language, is a relatively straightforward declarative language which
- defines a match between the parsed user input represented as the collection of tokens,
- and the user-define callback method.
- IDL intents are bound to their callbacks via Java annotation and can be located
- in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.
- </td>
- </tr>
- <tr>
- <td><code>Callback</code></td>
- <td>
- The user defined Scala method which mapped to the <code>intent</code>.
- This method receives as its parameters normalized values from user input text according to
- IDL matched terms.
- </td>
- </tr>
- </tbody>
- </table>
-
- <p>
- So, <code>Data Model</code> must be able to do tree following things:
- </p>
-
- <ul>
- <li>
- Parse user input text as the <code>tokens</code>.
- They are input for searching <code>named entities</code>.
- <code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
- </li>
- <li>
- Find <code>named entities</code> based on these parsed <code>tokens</code>.
- They are input for searching <code>intents</code>.
- <code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
- </li>
- <li>
- Prepare <code>intents</code> with their callbacks methods which contain business logic.
- These methods should be defined directly in the model class definition or the model should have references on them.
- It will be described below. Callback can de defined in model scala class directly or via references.
- Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
- </li>
- </ul>
-
- <p>
- As example, let's prepare the system which can call persons from your contact list.
- Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
- For solving this task this model should be able to recognize in user text following entities:
- <code>command</code> and <code>person</code> to apply this command.
- </p>
-
- <p>
- So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
- </p>
-
- <ul>
- <li>
- Parse tokens splitting user text input:
- "<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
- </li>
- <li>
- Find two named entities:
- <ul>
- <li>
- <code>command</code> by token "<code>call</code>".
- </li>
- <li>
- <code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
- </li>
- </ul>
- </li>
- <li>
- Have prepared intent:
- <pre class="brush: scala, highlight: [1, 2, 5, 6]">
- @NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}")
- def onCommand(
- ctx: NCContext,
- im: NCIntentMatch,
- @NCIntentTerm("command") command: NCEntity,
- @NCIntentTerm("person") person: NCEntity
- ): NCResult = ? // Implement business logic here.
- </pre>
-
- <ul>
- <li>
- <code>Line 1</code> defines intent <code>call</code> with two conditions
- which expects two named entities in user input text.
- </li>
- <li>
- <code>Line 2</code> defines related callback method <code>onCommand()</code>.
- </li>
- <li>
- <code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
- <code>call</code> intent terms conditions. You can extract normalized value
- <code>john smith</code> from the <code>person</code> parameter and use it in the method body
- for getting his phone number etc.
- </li>
- </ul>
- </li>
- </ul>
- </section>
-
- <section id="model-configuration">
- <h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
- <p>
- <code>Data Model</code> configuration represented as
- {% scaladoc NCModelConfig NCModelConfig %}
- contains set of parameters which are described below.
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Name</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code>id</code>, <code>name</code> and <code>version</code></td>
- <td>
- Mandatory model properties.
- </td>
- </tr>
- <tr>
- <td><code>description</code>, <code>origin</code></td>
- <td>
- Optional model properties.
- </td>
- </tr>
- <tr>
- <td><code>conversationTimeout</code></td>
- <td>
- Timeout of the user's conversation.
- If user doesn't communicate with the model this time period STM is going to be cleared.
- Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
- It is the mandatory parameter with default value.
- </td>
- </tr>
- <tr>
- <td><code>conversationDepth</code></td>
- <td>
- Maximum supported depth the user's conversation.
- Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
- It is the mandatory parameter with default value.
- </td>
- </tr>
- </tbody>
- </table>
- </section>
-
- <section id="model-pipeline">
- <h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
- <p>
- Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and
- contains following components:
- </p>
-
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Component</th>
- <th>Mandatory</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td>{% scaladoc NCTokenParser NCTokenParser %}</td>
- <td>Mandatory single</td>
- <td>
- <code>Token parser</code> should be able to parse user input plain text and split this text
- into <code>tokens</code> list.
- NLPCraft provides two default English language implementations of token parser.
- Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
- <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td>
- <td>Optional list</td>
- <td>
- <code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens,
- like part of speech, quote, stop-words flags or any other.
- NLPCraft provides built-in English language set of token enrichers implementations.
- Here is an <a href="custom-components.html#token-enrichers">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCTokenValidator NCTokenValidator %}</td>
- <td>Optional list</td>
- <td>
- <code>Token validator</code> is a component which allow to inspect prepared tokens and
- throw an exception to break user input processing.
- Here is an <a href="custom-components.html#token-validators">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCEntityParser NCEntityParser %}</td>
- <td>Mandatory list</td>
- <td>
- <code>Entity parser</code> is a component which allow to find user defined named entities
- based on prepared tokens as input.
- NLPCraft provides wrappers for named-entity recognition components of
- <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
- <a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations.
- Note that at least one entity parser must be defined.
- Here is an <a href="custom-components.html#entity-parsers">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td>
- <td>Optional list</td>
- <td>
- <code>Entity enricher</code> is component which allows to add additional properties for prepared entities.
- Can be useful for extending existing entity enrichers functionality.
- Here is an <a href="custom-components.html#entity-enrichers">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCEntityMapper NCEntityMapper %}</td>
- <td>Optional list</td>
- <td>
- <code>Entity mappers</code> is component which allows to map one set of entities to another after the entities
- were parsed and enriched. Can be useful for building complex parsers based on existing.
- Here is an <a href="custom-components.html#entity-mappers">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCEntityValidator NCEntityValidator %}</td>
- <td>Optional list</td>
- <td>
- <code>Entity validator</code> is a component which allow to inspect prepared entities and
- throw an exception to break user input processing.
- Here is an <a href="custom-components.html#entity-validators">example</a>.
- </td>
- </tr>
- <tr>
- <td> {% scaladoc NCVariantFilter NCVariantFilter %}</td>
- <td>Optional single</td>
- <td>
- <code>Variant filter</code> is a component which allows filtering detected variants and
- rejecting undesirable.
- Here is an <a href="custom-components.html#variant-filters">example</a>.
- </td>
- </tr>
- </tbody>
- </table>
-
- <figure>
- <img alt="pipeline" class="img-fluid" src="/images/pipeline.png">
- <figcaption><b>Fig 2.</b> Pipeline</figcaption>
- </figure>
-
- <p>
- Below {% scaladoc NCModel NCModel %} creation example.
- {% scaladoc NCPipeline NCPipeline %} is prepared using
- {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper.
- </p>
-
- <pre class="brush: scala, highlight: []">
- val pipeline =
- new NCPipelineBuilder().
- withTokenParser(new NCFrTokenParser()).
- withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
- withTokenEnricher(new NCFrStopWordsTokenEnricher()).
- withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
- build
- val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
-
- val mdl = new NCModel(cfg, pipeline):
- // Add your callbacks definition or references on them here.
- </pre>
-
- <p>
- This flexible system allows to create any pipelines on any language.
- You can collect NLPCraft predefined components, write your own and easy reuse custom components.
- </p>
- </section>
-
- <section id="model-behavior">
- <h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
- <p>
- There are also several {% scaladoc NCModel NCModel %}
- callbacks that you can override to affect model behavior during
- <a href="/intent-matching.html#model_callbacks">intent matching</a>
- to perform logging, debugging, statistic or usage collection, explicit update or initialization of
- conversation context, security audit or validation:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Method</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td>{% scaladoc NCModel#onContext-38d onContext() %}</td>
- <td>
- Overriding this method allows to prepare result before intent matching.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td>
- <td>
- Overriding this method allows to reject matched intent and continue matching process.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td>
- <td>
- Overriding this method allows to replace callback method execution result.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td>
- <td>
- Overriding this method allows to change operation result when rejection occurs.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModel#onError-fffff759 onError() %}</td>
- <td>
- Overriding this method allows to change operation result when any error occurs.
- </td>
- </tr>
- </tbody>
- </table>
- </section>
-
- <section id="client">
- <h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
- <p>
- <code>Client</code> represented as {% scaladoc NCModelClient NCModelClient %}
- is necessary for communication with the <code>Data Model</code>. Base client methods are described below.
- </p>
-
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Method</th>
- <th>Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td>
- <td>
- Passes user text input to the model and receives back execution
- {% scaladoc NCResult NCResult %} or
- rejection exception if there isn't any triggered intents.
- {% scaladoc NCResult NCResult %} is wrapper on
- callback method execution result with additional information.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td>
- <td>
- Passes user text input to the model and receives back callback and its parameters or
- rejection exception if there isn't any triggered intents.
- Main difference from <code>ask</code> that triggered intent callback method is not called.
- This method and this parameter can be useful in tests scenarios.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td>
- <td>
- Clears STM state. Memory is cleared wholly or with some predicate.
- Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
- Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td>
- <td>
- Clears dialog state. Dialog is cleared wholly or with some predicate.
- Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
- Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}.
- </td>
- </tr>
- <tr>
- <td>{% scaladoc NCModelClient#close-94c close() %}</td>
- <td>
- Closes client. You can't call another client's methods after this method was closed.
- </td>
- </tr>
- </tbody>
- </table>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Key Concepts</a></li>
- <li><a href="#terminology">Terminology</a></li>
-<!-- <li><a href="#model-configuration">Model Configuration</a></li> -->
-<!-- <li><a href="#model-pipeline">Model Pipeline</a></li> -->
-<!-- <li><a href="#model-behavior">Model Behavior Overriding</a></li> -->
-<!-- <li><a href="#client">Client Responsibility</a></li> -->
+ <li><a href="#terminology">Main Types</a></li>
{% include quick-links.html %}
</ul>
</div>