blob: 8134bc41160a0d14d60f07287f59b04729b1afa0 [file] [log] [blame]
---
active_crumb: Docs
layout: documentation
id: overview
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Library API review <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
NlpCraft library is based on two main base concepts <code>Model</code> and <code>Client</code>
which have API representations
<a href="apis/latest/org/apache/nlpcraft/NCModel.html">NCModel</a> and
<a href="apis/latest/org/apache/nlpcraft/NCModelClient.html">NCModelClient</a>.
When you work with the system you should prepare model by configuring its parameters and defining its components.
After you just communicate with this model via client's methods.
</p>
<ul>
<li>
<code>Model</code> is domain specific object which responsible for user input interpretation.
</li>
<li>
<code>Client</code> is object which allows to communicate with the given model.
</li>
</ul>
<p>Typical part of code:</p>
<pre class="brush: scala, highlight: []">
// Initialized prepared domain model.
val mdl = new CustomNlpModel()
// Creates client for given model.
val client = new NCModelClient(mdl)
// Sends text request to model by user ID "userId".
val result = client.ask("Some user command", "userId")
// Clears dialog session for user with ID "userId".
client.clearDialog("userId")
</pre>
</section>
<section id="model">
<h2 class="section-title">Model responsibility overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Let's start with terminology and describe the system work workflow.
</p>
<ul>
<li>
<code>Token</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCToken.html">NCToken</a>.
It is simple string, part of user input, which split according to some rules,
for instance by spaces and some additional conditions, which depends on language and some expectations.
So user input "<b>Where is it?</b>" contains four tokens:
"<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
Usually <code>tokens</code> are words and punctuation symbols which can also contain some additional
information like point of speech etc.
<code>Tokens</code> are input for searching the <code>entities</code>.
</li>
<li>
<code>Entity</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCEntity.html">NCEntity</a>.
According to wikipedia, named entity is a real-world object, such as a person, location, organization,
product, etc., that can be denoted with a proper name. It can be abstract or have a physical existence.
Each <code>entity</code> can contain one or more tokens.
<code>Entities</code> are input for searching <code>intents</code> according to <a href="intent-matching.html">Intent matching</a> conditions.
</li>
<li>
<code>Variant</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCVariant.html">NCVariant</a>.
It is a list of <code>entities</code>. Potentially, each <code>token</code> can be recognized as
different <code>entities</code>,
so user input can be processed as set of <code>variants</code>.
For example user input "Mercedes" can be processed as two <code>variants</code>,
both of them contains single element list of <code>entities</code>: <b>car brand</b> or <b>Spanish female name</b>.
When words are not overlapped with different <code>entities</code> there is only one
<code>variant</code> detected.
</li>
<li>
<code>Intent</code> is user defined callback and rule, according to which this callback should be called.
Rule is most often some template, based on expected set of <code>entities</code> in user input,
but it can be more flexible.
Parameters extracted from user text input are passed into callback methods.
These methods execution results are provided to user as answer on his request.
<code>Intent</code> callbacks are methods defined in <code>Model</code> class annotated by
<code>intent</code> rules via <a href="intent-matching.html">IDL</a>.
</li>
</ul>
<p>
<code>Model</code> must be able to do tree following things:
</p>
<ul>
<li>
Parse user input text as the <code>tokens</code>.
They are input for searching <code>named entities</code>.
<code>Tokens</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
</li>
<li>
Find <code>named entities</code> based on these parsed <code>tokens</code>.
They are input for searching <code>intents</code>.
<code>Entity</code> parsing components should be included into <a href="#model-pipeline">Model pipeline</a>.
</li>
<li>
Prepare <code>intents</code> with their callbacks methods which contain business logic.
These methods should be defined directly in the model class definition or the model should have references on them.
It will be described below.
</li>
</ul>
<p>
As example, let's prepare the system which can call persons from your contact list.
Typical commands are: "<b>Please call to John Smith</b>" or "<b>Connect me with Barbara Dillan</b>".
For solving this task this model should be able to recognize in user text following entities:
<code>command</code> and <code>person</code> to apply this command.
</p>
<p>
So, when request "<b>Please call to John Smith</b>" received, our model should be able to:
</p>
<ul>
<li>
Parse tokens splitting user text input:
"<code>please</code>", "<code>call</code>", "<code>to</code>", "<code>john</code>", "<code>smith</code>".
</li>
<li>
Find two named entities:
<ul>
<li>
<code>command</code> by token "<code>call</code>".
</li>
<li>
<code>person</code> by tokens "<code>john</code>" and "<code>smith</code>".
</li>
</ul>
</li>
<li>
Have prepared intent:
<pre class="brush: scala, highlight: [1, 2, 5, 6]">
@NCIntent("intent=call term(command)={# == command'} term(person)={# == 'person'}")
def onCommand(
ctx: NCContext,
im: NCIntentMatch,
@NCIntentTerm("command") command: NCEntity,
@NCIntentTerm("person") person: NCEntity
): NCResult = ? // Implement business logic here.
</pre>
<ul>
<li>
<code>Line 1</code> defines intent <code>call</code> with two conditions.
</li>
<li>
<code>Line 2</code> defines related callback method <code>onCommand()</code>.
</li>
<li>
<code>Lines 4 and 5</code> define two callback method's arguments which are corresponded to
<code>call</code> intent terms conditions. You can extract normalized value
<code>john smith</code> from the <code>person</code> parameter and use it in the method body
for getting his phone number etc.
</li>
</ul>
<p>
Note that there are a lot of options of defining intents and their callback methods.
They can be defined in configuration files, nested classes, etc.
Look at the chapter <a href="intent-matching.html">Intent Matching</a> content for get more details.
</p>
</li>
</ul>
</section>
<section id="client">
<h2 class="section-title">Client responsibility overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Client which is represented as <a href="apis/latest/org/apache/nlpcraft/NCModelClient.html">NCModelClient</a>
is necessary for communication with the model. Base client methods are described below.
</p>
<ul>
<li>
<code>ask()</code> passes user text input to the model and receives back execution
<a href="apis/latest/org/apache/nlpcraft/NCResult.html">NCResult</a> or
rejection exception if there isn't any triggered intents.
<a href="apis/latest/org/apache/nlpcraft/NCResult.html">NCResult</a> is wrapper on
callback method execution result with additional information.
</li>
<li>
<code>debugAsk()</code> passes user text input to the model and receives back callback and its parameters or
rejection exception if there isn't any triggered intents.
Main difference from <code>ask</code> that triggered intent callback method is not called.
This method and this parameter can be useful for tests scenarios.
</li>
<li>
<code>clearStm()</code> clears STM state. Memory is cleared wholly or with some predicate.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
</li>
<li>
<code>clearDialog()</code> clears dialog state. Dialog is cleared wholly or with some predicate.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
</li>
<li>
<code>close()</code> - Closes client. You can't call another client's methods after this method was closed.
</li>
</ul>
</section>
<section id="model-configuration">
<h2 class="section-title">Model configuration <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Model configuration <a href="apis/latest/org/apache/nlpcraft/NCModelConfig.html">NCModelConfig</a> represents set of model parameter values.
Its properties are described below.
</p>
<ul>
<li>
<code>id</code>, <code>name</code> and <code>version</code> are mandatory model descriptors.
</li>
<li>
<code>description</code> and <code>origin</code> are optional model descriptors.
</li>
<li>
<code>conversationTimeout</code> - timeout of the user's conversation.
If user doesn't communicate with the model this time period STM is going to be cleared.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
Mandatory parameter with default value.
</li>
<li>
<code>conversationDepth</code> - maximum supported depth the user's conversation.
Loot at <a href="short-term-memory.html">Conversation</a> chapter to get more details.
Mandatory parameter with default value.
</li>
</ul>
</section>
<section id="model-pipeline">
<h2 class="section-title">Model pipeline <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Model <code>Pipeline</code> is represented as <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> and
contains following components:
</p>
<ul>
<li>
<code>Token parser</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>.
Mandatory pipeline component, it is required for parsing plain text, user input, and split this text
into tokens list. NlpCraft provides default EN implementation of token parser.
Also, project contain various examples for <a href="examples/light_switch_fr.html">French</a> and
<a href="examples/light_switch_ru.html">Russia</a> languages.
</li>
<li>
<code>Tokens enrichers</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a> optional list.
Tokens enricher is component which allows to add additional properties to prepared tokens,
like part of speech, quote, stop-words flags or any other.
NlpCraft provides default set of EN tokens enrichers implementations.
</li>
<li>
<code>Tokens validators</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a> optional list.
Tokens validator is user defined component, where tokens are inspected and exception can be thrown
from user code to break user input processing.
</li>
<li>
<code>Entity parsers</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a> mandatory list.
At least one entity parser must be defined. Having prepared tokens as input,
each entity parser tries to find user defined named entities.
NlpCraft provides wrappers for named-entity recognition components of OpenNLP and Stanford libraries.
</li>
<li>
<code>Entity enrichers</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCEntityEnricher.html">NCEntityEnricher</a> optional list.
Entity enricher is component which allows to add additional properties to prepared entities.
Can be useful for extending existing entity enrichers functionality.
</li>
<li>
<code>Entity mappers</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCEntityMapper.html">NCEntityMapper</a> optional list.
Entity mapper is component which allows to map one set of entities into another after the entities
were parsed and enriched. Can be useful for building complex parsers based on existing.
</li>
<li>
<code>Entity validators</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a> optional list.
Entities validator is user defined component, where prepared entities are inspected and
exceptions can be thrown from user code to break user input processing.
</li>
<li>
<code>Variant filter</code> represented as <a href="apis/latest/org/apache/nlpcraft/NCVariantFilter.html">NCVariantFilter</a>.
Optional component which allows filtering detected variants, rejecting undesirable.
</li>
</ul>
<p>
Below <code>Model</code> creation example.
<code>Pipeline</code> is prepared using <code>NCPipelineBuilder</code> class helper.
</p>
<pre class="brush: scala, highlight: []">
val pipeline =
new NCPipelineBuilder().
withTokenParser(new NCFrTokenParser()).
withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
withTokenEnricher(new NCFrStopWordsTokenEnricher()).
withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
build
val cfg = NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0")
val mdl = new NCModelAdapter(cfg, pipeline)
</pre>
<p>
This flexible system allows to create any pipelines on any language.
You can collect NlpCraft predefined components, write your own and easy reuse custom components.
</p>
</section>
<section id="model-behavior">
<h2 class="section-title">Model behavior overriding<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There are also several <a href="/apis/latest/org/apache/nlpcraft/NCModel.html">NCModel</a>
callbacks that you can override to affect model behavior during
<a href="/intent-matching.html#model_callbacks">intent matching</a>
to perform logging, debugging, statistic or usage collection, explicit update or initialization of
conversation context, security audit or validation:
</p>
<ul>
<li>
Overriding <code>onMatchedIntent()</code> allows to reject matched intent and continue matching process.
</li>
<li>
Overriding <code>onResult()</code> allows to replace callback method execution result.
</li>
<li>
Overriding <code>onRejection()</code> allows to change operation result when rejection occurs.
</li>
<li>
Overriding <code>onError()</code> allows to change operation result when any error occurs.
</li>
</ul>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#model">Model responsibility overview</a></li>
<li><a href="#client">Client responsibility overview</a></li>
<li><a href="#model-configuration">Model configuration</a></li>
<li><a href="#model-pipeline">Model pipeline</a></li>
<li><a href="#model-behavior">Model behavior overriding</a></li>
{% include quick-links.html %}
</ul>
</div>