WIP

commit: 9f7395422317f4246b8e3f5c3dd051d22c36d2c7 [log] [tgz]
author: Aaron Radzinski <aradzinski@datalingvo.com> Thu Nov 24 13:49:56 2022 -0800
committer: Aaron Radzinski <aradzinski@datalingvo.com> Thu Nov 24 13:49:56 2022 -0800
tree: b8c82a611a8b8832ab19dd0306dc89494c82a06b
parent: 353331ecbb0ad979f0515869c76c3cef20988da9 [diff]
diff --git a/key-concepts-old.html b/key-concepts-old.html
new file mode 100644
index 0000000..3bd5ed8
--- /dev/null
+++ b/key-concepts-old.html

@@ -0,0 +1,587 @@
+---
+active_crumb: Docs
+layout: documentation
+id: key_concepts
+---
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
+    <section id="overview">
+        <h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+        <p>
+            NLPCraft is based on three main concepts:
+        </p>
+        <ul>
+            <li>
+                {% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation.
+            </li>
+            <li>
+                {% scaladoc NCPipeline NCPipeline %} is a part of the model configuration that defines
+                specifics of the user input processing.
+            </li>
+            <li>
+                {% scaladoc NCModelClient NCModelClient %} is responsible for interaction with the data model.
+            </li>
+        </ul>
+
+        <p>Here's the typical code structure when working with NLPCraft:</p>
+
+        <pre class="brush: scala, highlight: []">
+              // Init data model.
+              val mdl = new CustomNlpModel()
+
+              // Creates client for given model.
+              val cli = new NCModelClient(mdl)
+
+              // Sends text request to model by user ID "user01".
+              val result = client.ask("Some user command", "user01")
+        </pre>
+    </section>
+
+    <section id="terminology">
+        <h2 class="section-title">Terminology<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            Let's start with the nomenclature of the main NLPCraft types:
+        </p>
+
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Type</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+            <tr>
+                <td><b>{% scaladoc NCModel NCModel %}</b></td>
+                <td>
+                    <code>Model</code> is the main component in NLPCraft. User-define data model contains its {% scaladoc NCModelConfig NCModelConfig %},
+                    input processing {% scaladoc NCPipeline NCPipeline %} and life-cycle callbacks.
+                    NLPCraft employs model-as-a-code approach where entire data model is an implementation of just
+                    this interface. The instance of this interface is passed to {% scaladoc NCModelClient NCModelClient %} class.
+                    Note that the model-as-a-code approach natively supports any software life cycle tools and frameworks
+                    like various build tools, CI/SCM tools, IDEs, etc. You don't need any additional tools to manage some
+                    aspects of your data models - your entire model and all of its components are part of your project's source code.
+                    Note that in most cases, one would use a convenient {% scaladoc NCModelAdapter NCModelAdapter %} adapter to implement this interface.
+                </td>
+            </tr>
+            <tr>
+                <td><b>{% scaladoc NCToken NCToken %}</b></td>
+                <td>
+                    <code>Token</code> is simple string, part of user input, which is obtained by splitting user input
+                    according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens:
+                    "<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
+                    Usually <code>tokens</code> are words and punctuation symbols which also contain additional
+                    information like point of speech tags, relative position in the overall input text, stopword flag,
+                    stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>.
+                </td>
+            </tr>
+            <tr>
+                <td><b>{% scaladoc NCEntity NCEntity %}</b></td>
+                <td>
+                    <code>Entity</code> typically represents a real-world object, such as a person, location, organization,
+                    or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
+                    Each <code>entity</code> consists of zero or more <code>tokens</code>. Combination of entities form one or more parsing
+                    <code>variants</code>.
+                </td>
+            </tr>
+            <tr>
+                <td><b>{% scaladoc NCVariant NCVariant %}</b></td>
+                <td>
+                    <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
+                    of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
+                    interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
+                    For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
+                    one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+                    Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>.
+                </td>
+            </tr>
+            <tr>
+                <td><b>{% scaladoc NCPipeline NCPipeline %}</b></td>
+                <td>
+                    <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
+                    of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
+                    pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
+                    The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
+                </td>
+            </tr>
+            <tr>
+                <td><b>{% scaladoc NCModelCofig NCModelConfig %}</b></td>
+                <td>
+                    <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
+                    of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
+                    pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
+                    The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
+                </td>
+            </tr>
+            <tr>
+                <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
+                <td>
+                    <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
+                    of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
+                    interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
+                    For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
+                    one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+                </td>
+            </tr>
+
+            </tbody>
+        </table>
+
+        <figure>
+            <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
+            <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
+        </figure>
+
+        <p>
+            When <code>Variant</code> is prepared, the suitable  <code>Intent</code> is trying to matched with it.
+        </p>
+
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Term</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+
+            <tr>
+                <td><code>Intent</code></td>
+                <td>
+                    <code>Intent</code> is user defined callback method and rule according to which this callback should be called.
+                    Most often rule is some template based on expected set of <code>entities</code> in user input,
+                    but it can be defined more flexible.
+                    Parameters extracted from user text input are passed into callback method.
+                    This method execution result is provided to user as answer on his request.
+                    <code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by
+                    <code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
+                </td>
+            </tr>
+            <tr>
+                <td><code>IDL</code></td>
+                <td>
+                    IDL, Intent Definition Language, is a relatively straightforward declarative language which
+                    defines a match between the parsed user input represented as the collection of tokens,
+                    and the user-define callback method.
+                    IDL intents are bound to their callbacks via Java annotation and can be located
+                    in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.
+                </td>
+            </tr>
+            <tr>
+                <td><code>Callback</code></td>
+                <td>
+                    The user defined Scala method which mapped to the <code>intent</code>.
+                    This method receives as its parameters normalized values from user input text according to
+                    IDL matched terms.
+                </td>
+            </tr>
+            </tbody>
+        </table>
+
+        <p>
+            So, <code>Data Model</code> must be able to do tree following things:
+        </p>
+
+        <ul>
+            <li>
+                Parse user input text as the <code>tokens</code>.
+                They are input for searching <code>named entities</code>.
+                <code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
+            </li>
+            <li>
+                Find <code>named entities</code> based on these parsed <code>tokens</code>.
+                They are input for searching <code>intents</code>.
+                <code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
+            </li>
+            <li>
+                Prepare <code>intents</code> with their callbacks methods which contain business logic.
+                These methods should be defined directly in the model class definition or the model should have references on them.
+                It will be described below. Callback can de defined in model scala class directly or via references.
+                Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
+            </li>
+        </ul>
+
+        <p>
+            As example, let's prepare the system which can call persons from your contact list.
+            Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
+            For solving this task this model should be able to recognize in user text following entities:
+            <code>command</code> and <code>person</code> to apply this command.
+        </p>
+
+        <p>
+            So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
+        </p>
+
+        <ul>
+            <li>
+                Parse tokens splitting user text input:
+                "<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
+            </li>
+            <li>
+                Find two named entities:
+                <ul>
+                    <li>
+                        <code>command</code> by token "<code>call</code>".
+                    </li>
+                    <li>
+                        <code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
+                    </li>
+                </ul>
+            </li>
+            <li>
+                Have prepared intent:
+                <pre class="brush: scala, highlight: [1, 2, 5, 6]">
+                    @NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}")
+                    def onCommand(
+                        ctx: NCContext,
+                        im: NCIntentMatch,
+                        @NCIntentTerm("command") command: NCEntity,
+                        @NCIntentTerm("person") person: NCEntity
+                    ): NCResult = ? // Implement business logic here.
+                </pre>
+
+                <ul>
+                    <li>
+                        <code>Line 1</code> defines intent <code>call</code> with two conditions
+                        which expects two named entities in user input text.
+                    </li>
+                    <li>
+                        <code>Line 2</code> defines related callback method <code>onCommand()</code>.
+                    </li>
+                    <li>
+                        <code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
+                        <code>call</code> intent terms conditions. You can extract normalized value
+                        <code>john smith</code> from the <code>person</code> parameter and use it in the method body
+                        for getting his phone number etc.
+                    </li>
+                </ul>
+            </li>
+        </ul>
+    </section>
+
+    <section id="model-configuration">
+        <h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+        <p>
+            <code>Data Model</code> configuration represented as
+            {% scaladoc NCModelConfig NCModelConfig %}
+            contains set of parameters which are described below.
+        </p>
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Name</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+            <tr>
+                <td><code>id</code>, <code>name</code> and <code>version</code></td>
+                <td>
+                    Mandatory model properties.
+                </td>
+            </tr>
+            <tr>
+                <td><code>description</code>, <code>origin</code></td>
+                <td>
+                    Optional model properties.
+                </td>
+            </tr>
+            <tr>
+                <td><code>conversationTimeout</code></td>
+                <td>
+                    Timeout of the user's conversation.
+                    If user doesn't communicate with the model this time period STM is going to be cleared.
+                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+                    It is the mandatory parameter with default value.
+                </td>
+            </tr>
+            <tr>
+                <td><code>conversationDepth</code></td>
+                <td>
+                    Maximum supported depth the user's conversation.
+                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+                    It is the mandatory parameter with default value.
+                </td>
+            </tr>
+            </tbody>
+        </table>
+    </section>
+
+    <section id="model-pipeline">
+        <h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+        <p>
+            Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and
+            contains following components:
+        </p>
+
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Component</th>
+                <th>Mandatory</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+            <tr>
+                <td>{% scaladoc NCTokenParser NCTokenParser %}</td>
+                <td>Mandatory single</td>
+                <td>
+                    <code>Token parser</code> should be able to parse user input plain text and split this text
+                    into <code>tokens</code> list.
+                    NLPCraft provides two default English language implementations of token parser.
+                    Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
+                    <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td>
+                <td>Optional list</td>
+                <td>
+                    <code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens,
+                    like part of speech, quote, stop-words flags or any other.
+                    NLPCraft provides built-in English language set of token enrichers implementations.
+                    Here is an <a href="custom-components.html#token-enrichers">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCTokenValidator NCTokenValidator %}</td>
+                <td>Optional list</td>
+                <td>
+                    <code>Token validator</code> is a component which allow to inspect prepared tokens and
+                    throw an exception to break user input processing.
+                    Here is an <a href="custom-components.html#token-validators">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCEntityParser NCEntityParser %}</td>
+                <td>Mandatory list</td>
+                <td>
+                    <code>Entity parser</code> is a component which allow to find user defined named entities
+                    based on prepared tokens as input.
+                    NLPCraft provides wrappers for named-entity recognition components of
+                    <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
+                    <a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations.
+                    Note that at least one entity parser must be defined.
+                    Here is an <a href="custom-components.html#entity-parsers">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td>
+                <td>Optional list</td>
+                <td>
+                    <code>Entity enricher</code> is component which allows to add additional properties for prepared entities.
+                    Can be useful for extending existing entity enrichers functionality.
+                    Here is an <a href="custom-components.html#entity-enrichers">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCEntityMapper NCEntityMapper %}</td>
+                <td>Optional list</td>
+                <td>
+                    <code>Entity mappers</code> is component which allows to map one set of entities to another after the entities
+                    were parsed and enriched. Can be useful for building complex parsers based on existing.
+                    Here is an <a href="custom-components.html#entity-mappers">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCEntityValidator NCEntityValidator %}</td>
+                <td>Optional list</td>
+                <td>
+                    <code>Entity validator</code> is a component which allow to inspect prepared entities and
+                    throw an exception to break user input processing.
+                    Here is an <a href="custom-components.html#entity-validators">example</a>.
+                </td>
+            </tr>
+            <tr>
+                <td> {% scaladoc NCVariantFilter NCVariantFilter %}</td>
+                <td>Optional single</td>
+                <td>
+                    <code>Variant filter</code> is a component which allows filtering detected variants and
+                    rejecting undesirable.
+                    Here is an <a href="custom-components.html#variant-filters">example</a>.
+                </td>
+            </tr>
+            </tbody>
+        </table>
+
+        <figure>
+            <img alt="pipeline" class="img-fluid" src="/images/pipeline.png">
+            <figcaption><b>Fig 2.</b> Pipeline</figcaption>
+        </figure>
+
+        <p>
+            Below {% scaladoc NCModel NCModel %} creation example.
+            {% scaladoc NCPipeline NCPipeline %} is prepared using
+            {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper.
+        </p>
+
+        <pre class="brush: scala, highlight: []">
+            val pipeline =
+                new NCPipelineBuilder().
+                    withTokenParser(new NCFrTokenParser()).
+                    withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
+                    withTokenEnricher(new NCFrStopWordsTokenEnricher()).
+                    withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
+                    build
+            val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
+
+            val mdl = new NCModel(cfg, pipeline):
+                // Add your callbacks definition or references on them here.
+        </pre>
+
+        <p>
+            This flexible system allows to create any pipelines on any language.
+            You can collect NLPCraft predefined components, write your own and easy reuse custom components.
+        </p>
+    </section>
+
+    <section id="model-behavior">
+        <h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+        <p>
+            There are also several {% scaladoc NCModel NCModel %}
+            callbacks that you can override to affect model behavior during
+            <a href="/intent-matching.html#model_callbacks">intent matching</a>
+            to perform logging, debugging, statistic or usage collection, explicit update or initialization of
+            conversation context, security audit or validation:
+        </p>
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Method</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+            <tr>
+                <td>{% scaladoc NCModel#onContext-38d onContext() %}</td>
+                <td>
+                    Overriding this method allows to prepare result before intent matching.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td>
+                <td>
+                    Overriding this method allows to reject matched intent and continue matching process.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td>
+                <td>
+                    Overriding this method allows to replace callback method execution result.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td>
+                <td>
+                    Overriding this method allows to change operation result when rejection occurs.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModel#onError-fffff759 onError() %}</td>
+                <td>
+                    Overriding this method allows to change operation result when any error occurs.
+                </td>
+            </tr>
+            </tbody>
+        </table>
+    </section>
+
+    <section id="client">
+        <h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+
+        <p>
+            <code>Client</code>  represented as {% scaladoc NCModelClient NCModelClient %}
+            is necessary for communication with the <code>Data Model</code>. Base client methods  are described below.
+        </p>
+
+        <table class="gradient-table">
+            <thead>
+            <tr>
+                <th>Method</th>
+                <th>Description</th>
+            </tr>
+            </thead>
+            <tbody>
+            <tr>
+                <td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td>
+                <td>
+                    Passes user text input to the model and receives back execution
+                    {% scaladoc NCResult NCResult %} or
+                    rejection exception if there isn't any triggered intents.
+                    {% scaladoc NCResult NCResult %} is wrapper on
+                    callback method execution result with additional information.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td>
+                <td>
+                    Passes user text input to the model and receives back callback and its parameters or
+                    rejection exception if there isn't any triggered intents.
+                    Main difference from <code>ask</code> that triggered intent callback method is not called.
+                    This method and this parameter can be useful in tests scenarios.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td>
+                <td>
+                    Clears STM state. Memory is cleared wholly or with some predicate.
+                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+                    Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td>
+                <td>
+                    Clears dialog state. Dialog is cleared wholly or with some predicate.
+                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
+                    Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}.
+                </td>
+            </tr>
+            <tr>
+                <td>{% scaladoc NCModelClient#close-94c close() %}</td>
+                <td>
+                    Closes client. You can't call another client's methods after this method was closed.
+                </td>
+            </tr>
+            </tbody>
+        </table>
+    </section>
+</div>
+<div class="col-md-2 third-column">
+    <ul class="side-nav">
+        <li class="side-nav-title">On This Page</li>
+        <li><a href="#overview">Key Concepts</a></li>
+        <li><a href="#terminology">Terminology</a></li>
+<!--         <li><a href="#model-configuration">Model Configuration</a></li> -->
+<!--         <li><a href="#model-pipeline">Model Pipeline</a></li> -->
+<!--         <li><a href="#model-behavior">Model Behavior Overriding</a></li> -->
+<!--         <li><a href="#client">Client Responsibility</a></li> -->
+        {% include quick-links.html %}
+    </ul>
+</div>
+
+
+
+

diff --git a/key-concepts.html b/key-concepts.html
index ec3a5ae..b3b3b18 100644
--- a/key-concepts.html
+++ b/key-concepts.html

@@ -37,7 +37,7 @@
                 specifics of the user input processing.
             </li>
             <li>
-                {% scaladoc NCModelClient NCModelClient %} is responsible for communication with the data model.
+                {% scaladoc NCModelClient NCModelClient %} is responsible for interaction with the data model.
             </li>
         </ul>
 
@@ -56,7 +56,7 @@
     </section>
 
     <section id="terminology">
-        <h2 class="section-title">Terminology<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+        <h2 class="section-title">Main Types<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             Let's start with the nomenclature of the main NLPCraft types:
         </p>
@@ -98,8 +98,10 @@
                 <td>
                     <code>Entity</code> typically represents a real-world object, such as a person, location, organization,
                     or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
-                    Each <code>entity</code> consists of zero or more <code>tokens</code>. Combination of entities form one or more parsing
-                    <code>variants</code>.
+                    Each <code>entity</code> consists of zero or more <code>tokens</code> and therefore is represented by zero
+                    or more substrings from the original input text. Note that entities may have only a very loose mapping back
+                    to the original text as entities represent a higher-level abstractions compared to tokens. Combination of
+                    entities form one or more parsing <code>variants</code>.
                 </td>
             </tr>
             <tr>
@@ -123,461 +125,33 @@
                 </td>
             </tr>
             <tr>
-                <td><b>{% scaladoc NCModelCofig NCModelConfig %}</b></td>
-                <td>
-                    <code>Pipeline</code> is the main configuration property of the model. Pipeline consists of an ordered sequence
-                    of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the
-                    pipeline as a simple text and exits the end of the pipeline as a one or more parsing <code>variants</code>.
-                    The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
-                </td>
-            </tr>
-            <tr>
                 <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
                 <td>
-                    <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
-                    of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
-                    interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
-                    For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
-                    one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
+                    <a target="scaladoc" href="/apis/latest/">@NCIntent</a> annotation binds a declarative intent to its
+                    callback method. The intent generally refers to the goal that the end-user had in mind when speaking
+                    or typing the input utterance. The intent has a <em>declarative part or template</em> written in <a href="/intent-matching.html#idl">IDL - Intent Definition Language</a>
+                    that strictly defines a particular form the user input.
+                    Intent is also bound to a callback method that will be executed
+                    when that intent, i.e. its template, is detected as the best match for a given input.
                 </td>
             </tr>
 
             </tbody>
         </table>
-
+        <p>
+            Here's the illustration on how a user input text transforms into a set of parsing variants:
+        </p>
         <figure>
             <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
             <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
         </figure>
-
-        <p>
-            When <code>Variant</code> is prepared, the suitable  <code>Intent</code> is trying to matched with it.
-        </p>
-
-        <table class="gradient-table">
-            <thead>
-            <tr>
-                <th>Term</th>
-                <th>Description</th>
-            </tr>
-            </thead>
-            <tbody>
-
-            <tr>
-                <td><code>Intent</code></td>
-                <td>
-                    <code>Intent</code> is user defined callback method and rule according to which this callback should be called.
-                    Most often rule is some template based on expected set of <code>entities</code> in user input,
-                    but it can be defined more flexible.
-                    Parameters extracted from user text input are passed into callback method.
-                    This method execution result is provided to user as answer on his request.
-                    <code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by
-                    <code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
-                </td>
-            </tr>
-            <tr>
-                <td><code>IDL</code></td>
-                <td>
-                    IDL, Intent Definition Language, is a relatively straightforward declarative language which
-                    defines a match between the parsed user input represented as the collection of tokens,
-                    and the user-define callback method.
-                    IDL intents are bound to their callbacks via Java annotation and can be located
-                    in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.
-                </td>
-            </tr>
-            <tr>
-                <td><code>Callback</code></td>
-                <td>
-                    The user defined Scala method which mapped to the <code>intent</code>.
-                    This method receives as its parameters normalized values from user input text according to
-                    IDL matched terms.
-                </td>
-            </tr>
-            </tbody>
-        </table>
-
-        <p>
-            So, <code>Data Model</code> must be able to do tree following things:
-        </p>
-
-        <ul>
-            <li>
-                Parse user input text as the <code>tokens</code>.
-                They are input for searching <code>named entities</code>.
-                <code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
-            </li>
-            <li>
-                Find <code>named entities</code> based on these parsed <code>tokens</code>.
-                They are input for searching <code>intents</code>.
-                <code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
-            </li>
-            <li>
-                Prepare <code>intents</code> with their callbacks methods which contain business logic.
-                These methods should be defined directly in the model class definition or the model should have references on them.
-                It will be described below. Callback can de defined in model scala class directly or via references.
-                Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
-            </li>
-        </ul>
-
-        <p>
-            As example, let's prepare the system which can call persons from your contact list.
-            Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
-            For solving this task this model should be able to recognize in user text following entities:
-            <code>command</code> and <code>person</code> to apply this command.
-        </p>
-
-        <p>
-            So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
-        </p>
-
-        <ul>
-            <li>
-                Parse tokens splitting user text input:
-                "<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
-            </li>
-            <li>
-                Find two named entities:
-                <ul>
-                    <li>
-                        <code>command</code> by token "<code>call</code>".
-                    </li>
-                    <li>
-                        <code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
-                    </li>
-                </ul>
-            </li>
-            <li>
-                Have prepared intent:
-                <pre class="brush: scala, highlight: [1, 2, 5, 6]">
-                    @NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}")
-                    def onCommand(
-                        ctx: NCContext,
-                        im: NCIntentMatch,
-                        @NCIntentTerm("command") command: NCEntity,
-                        @NCIntentTerm("person") person: NCEntity
-                    ): NCResult = ? // Implement business logic here.
-                </pre>
-
-                <ul>
-                    <li>
-                        <code>Line 1</code> defines intent <code>call</code> with two conditions
-                        which expects two named entities in user input text.
-                    </li>
-                    <li>
-                        <code>Line 2</code> defines related callback method <code>onCommand()</code>.
-                    </li>
-                    <li>
-                        <code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
-                        <code>call</code> intent terms conditions. You can extract normalized value
-                        <code>john smith</code> from the <code>person</code> parameter and use it in the method body
-                        for getting his phone number etc.
-                    </li>
-                </ul>
-            </li>
-        </ul>
-    </section>
-
-    <section id="model-configuration">
-        <h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
-        <p>
-            <code>Data Model</code> configuration represented as
-            {% scaladoc NCModelConfig NCModelConfig %}
-            contains set of parameters which are described below.
-        </p>
-        <table class="gradient-table">
-            <thead>
-            <tr>
-                <th>Name</th>
-                <th>Description</th>
-            </tr>
-            </thead>
-            <tbody>
-            <tr>
-                <td><code>id</code>, <code>name</code> and <code>version</code></td>
-                <td>
-                    Mandatory model properties.
-                </td>
-            </tr>
-            <tr>
-                <td><code>description</code>, <code>origin</code></td>
-                <td>
-                    Optional model properties.
-                </td>
-            </tr>
-            <tr>
-                <td><code>conversationTimeout</code></td>
-                <td>
-                    Timeout of the user's conversation.
-                    If user doesn't communicate with the model this time period STM is going to be cleared.
-                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
-                    It is the mandatory parameter with default value.
-                </td>
-            </tr>
-            <tr>
-                <td><code>conversationDepth</code></td>
-                <td>
-                    Maximum supported depth the user's conversation.
-                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
-                    It is the mandatory parameter with default value.
-                </td>
-            </tr>
-            </tbody>
-        </table>
-    </section>
-
-    <section id="model-pipeline">
-        <h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
-        <p>
-            Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and
-            contains following components:
-        </p>
-
-        <table class="gradient-table">
-            <thead>
-            <tr>
-                <th>Component</th>
-                <th>Mandatory</th>
-                <th>Description</th>
-            </tr>
-            </thead>
-            <tbody>
-            <tr>
-                <td>{% scaladoc NCTokenParser NCTokenParser %}</td>
-                <td>Mandatory single</td>
-                <td>
-                    <code>Token parser</code> should be able to parse user input plain text and split this text
-                    into <code>tokens</code> list.
-                    NLPCraft provides two default English language implementations of token parser.
-                    Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
-                    <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td>
-                <td>Optional list</td>
-                <td>
-                    <code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens,
-                    like part of speech, quote, stop-words flags or any other.
-                    NLPCraft provides built-in English language set of token enrichers implementations.
-                    Here is an <a href="custom-components.html#token-enrichers">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCTokenValidator NCTokenValidator %}</td>
-                <td>Optional list</td>
-                <td>
-                    <code>Token validator</code> is a component which allow to inspect prepared tokens and
-                    throw an exception to break user input processing.
-                    Here is an <a href="custom-components.html#token-validators">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCEntityParser NCEntityParser %}</td>
-                <td>Mandatory list</td>
-                <td>
-                    <code>Entity parser</code> is a component which allow to find user defined named entities
-                    based on prepared tokens as input.
-                    NLPCraft provides wrappers for named-entity recognition components of
-                    <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
-                    <a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations.
-                    Note that at least one entity parser must be defined.
-                    Here is an <a href="custom-components.html#entity-parsers">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td>
-                <td>Optional list</td>
-                <td>
-                    <code>Entity enricher</code> is component which allows to add additional properties for prepared entities.
-                    Can be useful for extending existing entity enrichers functionality.
-                    Here is an <a href="custom-components.html#entity-enrichers">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCEntityMapper NCEntityMapper %}</td>
-                <td>Optional list</td>
-                <td>
-                    <code>Entity mappers</code> is component which allows to map one set of entities to another after the entities
-                    were parsed and enriched. Can be useful for building complex parsers based on existing.
-                    Here is an <a href="custom-components.html#entity-mappers">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCEntityValidator NCEntityValidator %}</td>
-                <td>Optional list</td>
-                <td>
-                    <code>Entity validator</code> is a component which allow to inspect prepared entities and
-                    throw an exception to break user input processing.
-                    Here is an <a href="custom-components.html#entity-validators">example</a>.
-                </td>
-            </tr>
-            <tr>
-                <td> {% scaladoc NCVariantFilter NCVariantFilter %}</td>
-                <td>Optional single</td>
-                <td>
-                    <code>Variant filter</code> is a component which allows filtering detected variants and
-                    rejecting undesirable.
-                    Here is an <a href="custom-components.html#variant-filters">example</a>.
-                </td>
-            </tr>
-            </tbody>
-        </table>
-
-        <figure>
-            <img alt="pipeline" class="img-fluid" src="/images/pipeline.png">
-            <figcaption><b>Fig 2.</b> Pipeline</figcaption>
-        </figure>
-
-        <p>
-            Below {% scaladoc NCModel NCModel %} creation example.
-            {% scaladoc NCPipeline NCPipeline %} is prepared using
-            {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper.
-        </p>
-
-        <pre class="brush: scala, highlight: []">
-            val pipeline =
-                new NCPipelineBuilder().
-                    withTokenParser(new NCFrTokenParser()).
-                    withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
-                    withTokenEnricher(new NCFrStopWordsTokenEnricher()).
-                    withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
-                    build
-            val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
-
-            val mdl = new NCModel(cfg, pipeline):
-                // Add your callbacks definition or references on them here.
-        </pre>
-
-        <p>
-            This flexible system allows to create any pipelines on any language.
-            You can collect NLPCraft predefined components, write your own and easy reuse custom components.
-        </p>
-    </section>
-
-    <section id="model-behavior">
-        <h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
-        <p>
-            There are also several {% scaladoc NCModel NCModel %}
-            callbacks that you can override to affect model behavior during
-            <a href="/intent-matching.html#model_callbacks">intent matching</a>
-            to perform logging, debugging, statistic or usage collection, explicit update or initialization of
-            conversation context, security audit or validation:
-        </p>
-        <table class="gradient-table">
-            <thead>
-            <tr>
-                <th>Method</th>
-                <th>Description</th>
-            </tr>
-            </thead>
-            <tbody>
-            <tr>
-                <td>{% scaladoc NCModel#onContext-38d onContext() %}</td>
-                <td>
-                    Overriding this method allows to prepare result before intent matching.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td>
-                <td>
-                    Overriding this method allows to reject matched intent and continue matching process.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td>
-                <td>
-                    Overriding this method allows to replace callback method execution result.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td>
-                <td>
-                    Overriding this method allows to change operation result when rejection occurs.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModel#onError-fffff759 onError() %}</td>
-                <td>
-                    Overriding this method allows to change operation result when any error occurs.
-                </td>
-            </tr>
-            </tbody>
-        </table>
-    </section>
-
-    <section id="client">
-        <h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
-
-        <p>
-            <code>Client</code>  represented as {% scaladoc NCModelClient NCModelClient %}
-            is necessary for communication with the <code>Data Model</code>. Base client methods  are described below.
-        </p>
-
-        <table class="gradient-table">
-            <thead>
-            <tr>
-                <th>Method</th>
-                <th>Description</th>
-            </tr>
-            </thead>
-            <tbody>
-            <tr>
-                <td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td>
-                <td>
-                    Passes user text input to the model and receives back execution
-                    {% scaladoc NCResult NCResult %} or
-                    rejection exception if there isn't any triggered intents.
-                    {% scaladoc NCResult NCResult %} is wrapper on
-                    callback method execution result with additional information.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td>
-                <td>
-                    Passes user text input to the model and receives back callback and its parameters or
-                    rejection exception if there isn't any triggered intents.
-                    Main difference from <code>ask</code> that triggered intent callback method is not called.
-                    This method and this parameter can be useful in tests scenarios.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td>
-                <td>
-                    Clears STM state. Memory is cleared wholly or with some predicate.
-                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
-                    Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td>
-                <td>
-                    Clears dialog state. Dialog is cleared wholly or with some predicate.
-                    Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
-                    Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}.
-                </td>
-            </tr>
-            <tr>
-                <td>{% scaladoc NCModelClient#close-94c close() %}</td>
-                <td>
-                    Closes client. You can't call another client's methods after this method was closed.
-                </td>
-            </tr>
-            </tbody>
-        </table>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Key Concepts</a></li>
-        <li><a href="#terminology">Terminology</a></li>
-<!--         <li><a href="#model-configuration">Model Configuration</a></li> -->
-<!--         <li><a href="#model-pipeline">Model Pipeline</a></li> -->
-<!--         <li><a href="#model-behavior">Model Behavior Overriding</a></li> -->
-<!--         <li><a href="#client">Client Responsibility</a></li> -->
+        <li><a href="#terminology">Main Types</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
commit	9f7395422317f4246b8e3f5c3dd051d22c36d2c7	[log] [tgz]
author	Aaron Radzinski <aradzinski@datalingvo.com>	Thu Nov 24 13:49:56 2022 -0800
committer	Aaron Radzinski <aradzinski@datalingvo.com>	Thu Nov 24 13:49:56 2022 -0800
tree	b8c82a611a8b8832ab19dd0306dc89494c82a06b
parent	353331ecbb0ad979f0515869c76c3cef20988da9 [diff]