blob: 5b9454f5dcc9f2d2afb63b790148dbba480f4cfc [file] [log] [blame]
---
active_crumb: Docs
layout: documentation
id: api-components
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
<section id="overview">
<h2 class="section-title">API Components<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
NLPCraft project is based on two main concepts <code>Data Model</code> and <code>Client</code>
which have API representations
{% scaladoc NCModel NCModel %} and
{% scaladoc NCModelClient NCModelClient %}.
For work with the system you should prepare {% scaladoc NCModel NCModel %} instance
which is based on configuration and list of components named <code>Pipeline</code>.
After you just communicate with prepared model via client's methods.
</p>
<ul>
<li>
<code>Data Model</code> is domain specific object which responsible for user input interpretation.
</li>
<li>
<code>Client</code> is object which allows to communicate with the given data model.
</li>
</ul>
<p>Typical part of code:</p>
<pre class="brush: scala, highlight: []">
// Initializes prepared domain model.
val mdl = new CustomNlpModel()
// Creates client for given model.
val client = new NCModelClient(mdl)
// Sends text request to model by user ID "userId".
val result = client.ask("Some user command", "userId")
// Clears dialog session for user with ID "userId".
client.clearDialog("userId")
</pre>
</section>
<section id="model">
<h2 class="section-title">Data Model Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Let's start with terminology and describe the system work workflow.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Term</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Token</code></td>
<td>
<code>Token</code> represented as {% scaladoc NCToken NCToken %}
is simple string, part of user input, which split according to some rules
for instance by spaces and some additional conditions which depend on language and some expectations.
So user input "<b>Where is it?</b>" contains four tokens:
"<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
Usually <code>tokens</code> are words and punctuation symbols which can also contain some additional
information like point of speech etc.
<code>Tokens</code> are input for searching the <code>entities</code>.
</td>
</tr>
<tr>
<td><code>Entity</code></td>
<td>
According to wikipedia, <code>named entity</code> is a real-world object, such as a person, location, organization,
product, etc., that can be denoted with a proper name. It can be abstract or have a physical existence.
Each <code>entity</code> can contain one or more tokens.
<code>Entities</code> represented as
{% scaladoc NCEntity NCEntity %} are input for searching <code>intents</code> according to <a href="intent-matching.html">Intent matching</a> conditions.
</td>
</tr>
<tr>
<td><code>Variant</code></td>
<td>
<code>Variant</code> represented as {% scaladoc NCVariant NCVariant %}
is a set of <code>entities</code> list. Potentially, each <code>token</code> or group
of <code>tokens</code> can be recognized as different <code>entities</code>,
so user input can be processed as set of <code>variants</code>.
For example user input <b>look at this crane</b> can be processed as two <code>variants</code>,
one of them contains <code>entity</code> <b>bird</b> and one contains <code>entity</code> <b>mechanism</b>.
When words are not overlapped with different <code>entities</code> there is only one
<code>variant</code> detected.
</td>
</tr>
</tbody>
</table>
<figure>
<img alt="named entities" class="img-fluid" src="/images/text-tokens-entities.png">
<figcaption><b>Fig 1.</b> Text -> Tokens -> Named Entities.</figcaption>
</figure>
<p>
When <code>Variant</code> is prepared, the suitable <code>Intent</code> is trying to matched with it.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Term</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>Intent</code></td>
<td>
<code>Intent</code> is user defined callback method and rule according to which this callback should be called.
Most often rule is some template based on expected set of <code>entities</code> in user input,
but it can be defined more flexible.
Parameters extracted from user text input are passed into callback method.
This method execution result is provided to user as answer on his request.
<code>Intent</code> callbacks are methods defined in <code>Data Model</code> class annotated by
<code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
</td>
</tr>
<tr>
<td><code>IDL</code></td>
<td>
IDL, Intent Definition Language, is a relatively straightforward declarative language which
defines a match between the parsed user input represented as the collection of tokens,
and the user-define callback method.
IDL intents are bound to their callbacks via Java annotation and can be located
in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.
</td>
</tr>
<tr>
<td><code>Callback</code></td>
<td>
The user defined Scala method which mapped to the <code>intent</code>.
This method receives as its parameters normalized values from user input text according to
IDL matched terms.
</td>
</tr>
</tbody>
</table>
<p>
So, <code>Data Model</code> must be able to do tree following things:
</p>
<ul>
<li>
Parse user input text as the <code>tokens</code>.
They are input for searching <code>named entities</code>.
<code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
</li>
<li>
Find <code>named entities</code> based on these parsed <code>tokens</code>.
They are input for searching <code>intents</code>.
<code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
</li>
<li>
Prepare <code>intents</code> with their callbacks methods which contain business logic.
These methods should be defined directly in the model class definition or the model should have references on them.
It will be described below. Callback can de defined in model scala class directly or via references.
Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
</li>
</ul>
<p>
As example, let's prepare the system which can call persons from your contact list.
Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
For solving this task this model should be able to recognize in user text following entities:
<code>command</code> and <code>person</code> to apply this command.
</p>
<p>
So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
</p>
<ul>
<li>
Parse tokens splitting user text input:
"<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
</li>
<li>
Find two named entities:
<ul>
<li>
<code>command</code> by token "<code>call</code>".
</li>
<li>
<code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
</li>
</ul>
</li>
<li>
Have prepared intent:
<pre class="brush: scala, highlight: [1, 2, 5, 6]">
@NCIntent("intent=call term(command)={# == 'command'} term(person)={# == 'person'}")
def onCommand(
ctx: NCContext,
im: NCIntentMatch,
@NCIntentTerm("command") command: NCEntity,
@NCIntentTerm("person") person: NCEntity
): NCResult = ? // Implement business logic here.
</pre>
<ul>
<li>
<code>Line 1</code> defines intent <code>call</code> with two conditions
which expects two named entities in user input text.
</li>
<li>
<code>Line 2</code> defines related callback method <code>onCommand()</code>.
</li>
<li>
<code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
<code>call</code> intent terms conditions. You can extract normalized value
<code>john smith</code> from the <code>person</code> parameter and use it in the method body
for getting his phone number etc.
</li>
</ul>
</li>
</ul>
</section>
<section id="model-configuration">
<h2 class="section-title">Model Configuration<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<code>Data Model</code> configuration represented as
{% scaladoc NCModelConfig NCModelConfig %}
contains set of parameters which are described below.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>id</code>, <code>name</code> and <code>version</code></td>
<td>
Mandatory model properties.
</td>
</tr>
<tr>
<td><code>description</code>, <code>origin</code></td>
<td>
Optional model properties.
</td>
</tr>
<tr>
<td><code>conversationTimeout</code></td>
<td>
Timeout of the user's conversation.
If user doesn't communicate with the model this time period STM is going to be cleared.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
It is the mandatory parameter with default value.
</td>
</tr>
<tr>
<td><code>conversationDepth</code></td>
<td>
Maximum supported depth the user's conversation.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
It is the mandatory parameter with default value.
</td>
</tr>
</tbody>
</table>
</section>
<section id="model-pipeline">
<h2 class="section-title">Model Pipeline<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Model <code>Pipeline</code> is represented as {% scaladoc NCPipeline NCPipeline %} and
contains following components:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Component</th>
<th>Mandatory</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>{% scaladoc NCTokenParser NCTokenParser %}</td>
<td>Mandatory single</td>
<td>
<code>Token parser</code> should be able to parse user input plain text and split this text
into <code>tokens</code> list.
NLPCraft provides two default English language implementations of token parser.
Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
<a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
</td>
</tr>
<tr>
<td> {% scaladoc NCTokenEnricher NCTokenEnricher %}</td>
<td>Optional list</td>
<td>
<code>Tokens enricher</code> is a component which allow to add additional properties for prepared tokens,
like part of speech, quote, stop-words flags or any other.
NLPCraft provides built-in English language set of token enrichers implementations.
Here is an <a href="custom-components.html#token-enrichers">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCTokenValidator NCTokenValidator %}</td>
<td>Optional list</td>
<td>
<code>Token validator</code> is a component which allow to inspect prepared tokens and
throw an exception to break user input processing.
Here is an <a href="custom-components.html#token-validators">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCEntityParser NCEntityParser %}</td>
<td>Mandatory list</td>
<td>
<code>Entity parser</code> is a component which allow to find user defined named entities
based on prepared tokens as input.
NLPCraft provides wrappers for named-entity recognition components of
<a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
<a href="https://nlp.stanford.edu/">Stanford NLP</a> and its own implementations.
Note that at least one entity parser must be defined.
Here is an <a href="custom-components.html#entity-parsers">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCEntityEnricher NCEntityEnricher %}</td>
<td>Optional list</td>
<td>
<code>Entity enricher</code> is component which allows to add additional properties for prepared entities.
Can be useful for extending existing entity enrichers functionality.
Here is an <a href="custom-components.html#entity-enrichers">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCEntityMapper NCEntityMapper %}</td>
<td>Optional list</td>
<td>
<code>Entity mappers</code> is component which allows to map one set of entities to another after the entities
were parsed and enriched. Can be useful for building complex parsers based on existing.
Here is an <a href="custom-components.html#entity-mappers">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCEntityValidator NCEntityValidator %}</td>
<td>Optional list</td>
<td>
<code>Entity validator</code> is a component which allow to inspect prepared entities and
throw an exception to break user input processing.
Here is an <a href="custom-components.html#entity-validators">example</a>.
</td>
</tr>
<tr>
<td> {% scaladoc NCVariantFilter NCVariantFilter %}</td>
<td>Optional single</td>
<td>
<code>Variant filter</code> is a component which allows filtering detected variants and
rejecting undesirable.
Here is an <a href="custom-components.html#variant-filters">example</a>.
</td>
</tr>
</tbody>
</table>
<figure>
<img alt="pipeline" class="img-fluid" src="/images/pipeline.png">
<figcaption><b>Fig 2.</b> Pipeline</figcaption>
</figure>
<p>
Below {% scaladoc NCModel NCModel %} creation example.
{% scaladoc NCPipeline NCPipeline %} is prepared using
{% scaladoc NCPipelineBuilder NCPipelineBuilder %} class helper.
</p>
<pre class="brush: scala, highlight: []">
val pipeline =
new NCPipelineBuilder().
withTokenParser(new NCFrTokenParser()).
withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
withTokenEnricher(new NCFrStopWordsTokenEnricher()).
withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
build
val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
val mdl = new NCModel(cfg, pipeline):
// Add your callbacks definition or references on them here.
</pre>
<p>
This flexible system allows to create any pipelines on any language.
You can collect NLPCraft predefined components, write your own and easy reuse custom components.
</p>
</section>
<section id="model-behavior">
<h2 class="section-title">Model Behavior Overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There are also several {% scaladoc NCModel NCModel %}
callbacks that you can override to affect model behavior during
<a href="/intent-matching.html#model_callbacks">intent matching</a>
to perform logging, debugging, statistic or usage collection, explicit update or initialization of
conversation context, security audit or validation:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>{% scaladoc NCModel#onContext-38d onContext() %}</td>
<td>
Overriding this method allows to prepare result before intent matching.
</td>
</tr>
<tr>
<td>{% scaladoc NCModel#onMatchedIntent-946 onMatchedIntent() %}</td>
<td>
Overriding this method allows to reject matched intent and continue matching process.
</td>
</tr>
<tr>
<td>{% scaladoc NCModel#onResult-fffffaf3 onResult() %}</td>
<td>
Overriding this method allows to replace callback method execution result.
</td>
</tr>
<tr>
<td>{% scaladoc NCModel#onRejection-4fa onRejection() %}</td>
<td>
Overriding this method allows to change operation result when rejection occurs.
</td>
</tr>
<tr>
<td>{% scaladoc NCModel#onError-fffff759 onError() %}</td>
<td>
Overriding this method allows to change operation result when any error occurs.
</td>
</tr>
</tbody>
</table>
</section>
<section id="client">
<h2 class="section-title">Client Responsibility<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<code>Client</code> represented as {% scaladoc NCModelClient NCModelClient %}
is necessary for communication with the <code>Data Model</code>. Base client methods are described below.
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>{% scaladoc NCModelClient#ask-fffff9ce ask() %}</td>
<td>
Passes user text input to the model and receives back execution
{% scaladoc NCResult NCResult %} or
rejection exception if there isn't any triggered intents.
{% scaladoc NCResult NCResult %} is wrapper on
callback method execution result with additional information.
</td>
</tr>
<tr>
<td>{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}</td>
<td>
Passes user text input to the model and receives back callback and its parameters or
rejection exception if there isn't any triggered intents.
Main difference from <code>ask</code> that triggered intent callback method is not called.
This method and this parameter can be useful in tests scenarios.
</td>
</tr>
<tr>
<td>{% scaladoc NCModelClient#clearStm-571 clearStm() %}</td>
<td>
Clears STM state. Memory is cleared wholly or with some predicate.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearStm-1d8 clearStm() %}.
</td>
</tr>
<tr>
<td>{% scaladoc NCModelClient#clearDialog-571 clearDialog() %}</td>
<td>
Clears dialog state. Dialog is cleared wholly or with some predicate.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
Second variant of given method with another parameters is here - {% scaladoc NCModelClient#clearDialog-1d8 clearDialog() %}.
</td>
</tr>
<tr>
<td>{% scaladoc NCModelClient#close-94c close() %}</td>
<td>
Closes client. You can't call another client's methods after this method was closed.
</td>
</tr>
</tbody>
</table>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#model">Data Model Responsibility</a></li>
<li><a href="#model-configuration">Model Configuration</a></li>
<li><a href="#model-pipeline">Model Pipeline</a></li>
<li><a href="#model-behavior">Model Behavior Overriding</a></li>
<li><a href="#client">Client Responsibility</a></li>
{% include quick-links.html %}
</ul>
</div>