blob: 1a1b4483b3fb639f30001d1c03924fd4c2f3e51c [file] [log] [blame]
---
active_crumb: Intent Matching
layout: documentation
id: intent_matching
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Overview</h2>
<p>
As we discussed in <a href="/data-model.html#logic">Data Model</a> section the processing logic of the data model is
encoded in intents and their callbacks. Sections below will explain how to declare an intent and how to develop
a callback method when its intent is matched.
</p>
</section>
<section id="matching">
<h2 class="section-title">Intent-Based Matching</h2>
<p>
The intent matching is based on an idea of defining one or more templates for user input and let the
algorithm choose the best matching one given the user input. Such template is called an <em>intent</em>.
Each intent defines a pattern of the user input and associated action to take when that pattern is detected
and selected as a best match. While selecting the best matching intent NLPCraft uses sophisticated
NLP algorithms. When more than one intent matches the user input - the system selects the
most specific one as its best match.
</p>
<p>
In NLPCraft intents are specified using three Java annotations:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Annotation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/intent/NCIntent.html">@NCIntent</a></td>
<td>
This annotation defines an intent in-place on the method serving as its callback. It takes a string value
that specifies an intent using intent <a href="#syntax">DSL syntax</a>.
</td>
</tr>
<tr>
<td><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/intent/NCIntentRed.html">@NCIntentRef</a></td>
<td>
This annotation allows to reference an intent defined in external JSON or YAML model definition.
Similarly to <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/intent/NCIntent.html">@NCIntent</a>
annotation the intent should be defined using <a href="#syntax">DSL syntax</a>.
</td>
</tr>
<tr>
<td><a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/intent/NCIntentTerm.html">@NCIntentTerm</a></td>
<td>
This annotation annotation marks a formal callback method parameter to receive term's tokens.
</td>
</tr>
</tbody>
</table>
<p>
Here's a couple of examples of intent declarations to illustrate the basics of intent declaration and usage.
</p>
<p>
An intent from
<a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/src/main/scala/org/apache/nlpcraft/examples/lightswitch">Light Switch</a> Scala example:
</p>
<pre class="brush: java">
&#64;NCIntent("intent=act conv=false term(act)={groups @@ 'act'} term(loc)={trim(id) == 'ls:loc'}*")
def onMatch(
&#64;NCIntentTerm("act") actTok: NCToken,
&#64;NCIntentTerm("loc") locToks: List[NCToken]
): NCResult = {
...
}
</pre>
<p>
<b>NOTES:</b>
</p>
<ul>
<li>
The intent is defined in-place using <code>@NCIntent</code> annotation.
</li>
<li>
A term match is defined as one or more tokens. Term can be optional if its min quantifier is zero.
</li>
<li>
A non-conversational intent <code>act</code> has one mandatory term and another
that can match zero or more tokens with method <code>onMatch(...)</code> as its callback.
</li>
<li>
Method <code>onMatch(...)</code> will be called when this intent is selected as the best match.
</li>
<li>
Note that terms have <code>min=1, max=1</code> quantifiers by default, i.e. one and only one.
</li>
<li>
First term defines any single token that belongs to the group <code>act</code>. Note that model elements
can belong to multiple groups.
</li>
<li>
Second term would match zero or more tokens with ID <code>ls:loc</code>. Note that we use function <code>trim</code>
on the token ID.
</li>
<li>
Note that both terms have IDs (<code>act</code> and <code>loc</code>) that are used in <code>onMatch(...)</code>
method parameters to automatically assign terms' tokens to the formal method parameters using <code>@NCIntentTerm</code>
annotations.
</li>
<li>
Given the <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/src/main/scala/org/apache/nlpcraft/examples/lightswitch/lightswitch_model.yaml">data model</a>
from this <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/src/main/scala/org/apache/nlpcraft/examples/lightswitch/lightswitch_model.yaml">example</a> the following sentences will be matched by this intent:
<ul>
<li><code>Turn the lights off in the entire house.</code></li>
<li><code>Switch on the illumination in the master bedroom closet.</code></li>
<li><code>Get the lights on.</code></li>
<li><code>Please, put the light out in the upstairs bedroom.</code></li>
<li><code>Set the lights on in the entire house.</code></li>
<li><code>Turn the lights off in the guest bedroom.</code></li>
<li><code>Could you please switch off all the lights?</code></li>
<li><code>Dial off illumination on the 2nd floor.</code></li>
<li><code>Please, no lights!</code></li>
<li><code>Kill off all the lights now!</code></li>
<li><code>No lights in the bedroom, please.</code></li>
</ul>
</li>
</ul>
<br/>
<p>
In the following <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/src/main/scala/org/apache/nlpcraft/examples/alarm">Alarm Clock</a> Java example
the intent is defined in JSON model definition and referenced in Java code using <code>@NCIntentTerm</code>
annotation:
</p>
<pre class="brush: js, highlight: [19]">
{
"id": "nlpcraft.alarm.ex",
"name": "Alarm Example Model",
"version": "1.0",
"enabledBuiltInTokens": [
"nlpcraft:num"
],
"elements": [
{
"id": "x:alarm",
"description": "Alarm token indicator.",
"synonyms": [
"{ping|buzz|wake|call|hit} {me|up|me up|*}",
"{set|*} {my|*} {wake|wake up|*} {alarm|timer|clock|buzzer|call} {up|*}"
]
}
],
"intents": [
"intent=alarm term={id=='x:alarm'} term(nums)={id=='nlpcraft:num' && ~nlpcraft:num:unittype=='datetime' && ~nlpcraft:num:isequalcondition==true}[0,7]"
]
}
</pre>
<pre class="brush: java, highlight: [1]">
&#64;NCIntentRef("alarm")
private NCResult onMatch(
NCIntentMatch ctx,
&#64;NCIntentTerm("nums") List&lt;NCToken&gt; numToks
) {
...
}
</pre>
<p>
<b>NOTES:</b>
</p>
<ul>
<li>
Intent is defined in the external JSON model declaration (see line 19 in JSON file).
</li>
<li>
This intent is referenced by annotation <code>@NCIntentRef("alarm")</code> with method <code>onMatch(...)</code>
as its callback.
</li>
<li>
This example defines a non-conversational intent with two terms both of which have to found for the
intent to match.
</li>
<li>
Method <code>onMatch(...)</code> will be called when this intent is the best match detected.
</li>
<li>
Note that terms have <code>min=1, max=1</code> quantifiers by default.
</li>
<li>
First term is defined as a single mandatory (<code>min=1, max=1</code>) user token with ID <code>x:alarm</code>
whose element is defined in the model.
</li>
<li>
Second term is defined as a zero or up to seven numeric built-in <code>nlpcraft:num</code> tokens that
have unit type of <code>datetime</code> and are single numbers. Note that <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/src/main/scala/org/apache/nlpcraft/examples/alarm">Alarm Clock</a>
model allows zero tokens in this term which would mean the current time.
</li>
<li>
Given data model definition above the following sentences will be matched by this intent:
<ul>
<li><code>Ping me in 3 minutes</code></li>
<li><code>Buzz me in an hour and 15mins</code></li>
<li><code>Set my alarm for 30s</code></li>
</ul>
</li>
</ul>
</section>
<section id="syntax">
<h2 class="section-title">Intent DSL</h2>
<p>
Regardless of how intent is defined - directly in <code>@NCIntent</code> annotation or in external model
declaration - it follow the exactly the same syntax (here's a full <a target=github href="https://github.com/apache/incubator-nlpcraft/blob/master/src/main/scala/org/apache/nlpcraft/model/intent/impl/antlr4/NCIntentDsl.g4">ANTRL4 grammar</a>).
</p>
<p>
Intent DSL grammar can be informally described using the following example (order of individual declarations is important):
</p>
<pre class="brush: js">
intent=my_intent
conv=true
order=true
flow='id* >> (id2|id3)[2,3]'
term(term1)={group @@ 'my_group'}?
term(term2)={trim(partId.partAlias.id) == 'token1:id'}[1,3]
</pre>
<dl>
<dt><code>intent=my_intent</code></dt>
<dd>
Mandatory intent ID. Any arbitrary unique string matching the following template: <code>(UNDERSCORE|[a-z]|[A-Z])+([a-z]|[A-Z]|[0-9]|COLON|MINUS|UNDERSCORE)*</code>
</dd>
<dt><code>conv=true</code></dt>
<dd>
<em>Optional.</em> Whether or not this intent supports
<a href="basic-concepts.html#stm">conversation</a>. Default is <code>true</code>.
</dd>
<dt><code>ordered=true</code></dt>
<dd>
<em>Optional.</em>
Whether or not this intent is ordered. Default is <code>false</code>.
For ordered intent the specified order of terms is important for matching this intent.
If intent is unordered its terms can be found anywhere in the input text and in any order.
Note that ordered intent significantly limits the user input it can match. In most cases
the ordered intent is only applicable to processing formal strict grammar (like a programming language)
and unsuitable for natural language processing.
</dd>
<dt><code>flow='id* >> (id2|id3)[2,3]'</code></dt>
<dd>
<p>
<em>Optional.</em> Dialog flow is a history of previously matched intents to match on. If provided,
the intent will match not only on the current user input but also on the history of the previously matched
intents.
</p>
<p>
Dialog flow pattern consists of one of multiple intent IDs separated by <code>&gt;&gt;</code> symbol ordered from
most recent to the oldest.
Multiple IDs should be placed in <code>(</code> <code>)</code> brackets and separated by <code>|</code>
symbol. Each group of IDs can have an optional quantifier (e.g. <code>[2,3]</code>) for how many times this intent should appear
in matching history:
</p>
<ul>
<li><code>[n,m]</code> - intent should appear at least <code>n</code> times and at most <code>m</code> times.</li>
<li><code>*</code> is equal to <code>[0,∞]</code></li>
<li><code>+</code> is equal to <code>[1,∞]</code></li>
<li><code>?</code> is equal to <code>[0,1]</code></li>
<li>No quantifier defaults to <code>[1,1]</code></li>
</ul>
<p>
For the dialog flow to match the history of the matched intents (for given user and the data model) should
match the dialog flow pattern. Note that if dialog flow is defined and it doesn't match the history the terms
of the intent won't be tested at all.
</p>
</dd>
<dt>
<code>term(term1)={group @@ 'my_group'}?</code><br>
<code>term(term2)={trim(partId.partAlias.id) == 'token1:id'}[1,3]</code>
</dt>
<dd>
<p>
Term, also known as a slot, is a building block of the intent. Term has optional ID, predicate and quantifiers. It can
represent one or more tokens, sequential or not, detected in the user input. Intent has a list of terms
(always at least one) that all have to be found in the user input for the intent to match. Note that term
can be optional if its min quantifier is zero. Whether or not the order of the terms is important
for matching is governed by <code>ordered=true</code> parameter.
</p>
<p>
Term ID (<code>term1</code> and <code>term2</code>) is optional, and when provided, is used by <code>@NCIntentTerm</code>
annotation to link term's tokens to a formal parameter of the callback method. Note that term ID follows
the same lexical rules as intent ID.
</p>
<p>
Inside of curly brackets <code>{</code> <code>}</code> is a <a href="data-model.html#dsl">token DSL</a>
expression: <code>group @@ 'my_group'</code> and <code>trim(partId.partAlias.id) == 'token1:id'</code>.
Note that exactly the same syntax is used for token DSL as well as for intent DSL for defining a token predicate.
Consult <a href="data-model.html#dsl">token DSL</a> documentation for details on its syntax.
</p>
<p>
<code>?</code> and <code>[1,3]</code> define an inclusive quantifier for that term (how many time this term should appear
for it to be considered found). You cal also use the following abbreviations:
</p>
<ul>
<li><code>*</code> is equal to <code>[0,∞]</code></li>
<li><code>+</code> is equal to <code>[1,∞]</code></li>
<li><code>?</code> is equal to <code>[0,1]</code></li>
<li>No quantifier defaults to <code>[1,1]</code></li>
</ul>
</dd>
</dl>
<div class="bq success">
<p>
<b>Token DSL</b>
</p>
<p>
Intent DSL is built on top of the <a href="data-model.html#dsl">Token DSL</a> that provides comprehensive support for
defining predicates over detected model elements.
Make sure to read about <a href="data-model.html#dsl">Token DSL</a> as well.
</p>
</div>
</section>
<section id="callback">
<h2 class="section-title">Intent Callback</h2>
<p>
Whether the intent is defined directly in <code>@NCIntent</code> annotation or indirectly via <code>@NCIntentRef</code>
annotation - it is always attached to a callback method:
</p>
<ul>
<li>
Callback can only be an instance method on the class implementing <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
interface.
</li>
<li>
Method must have return type of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCResult.html">NCResult</a>.
</li>
<li>
Method can have zero or more parameters:
<ul>
<li>
Parameter of type <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a>,
if present, must be first.
</li>
<li>
Any other parameters (other than the first optional
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a>) must
have <code>@NCIntentTerm</code> annotation.
</li>
</ul>
</li>
<li>
Method must support reflection-based invocation.
</li>
</ul>
<p>
<code>@NCIntentTerm</code> annotation marks callback parameter to receive term's tokens. This annotations can
only be used for the parameters of the callbacks, i.e. methods that are annotated with <code>@NCIntnet</code> or
<code>@NCIntentRef</code>. <code>@NCIntentTerm</code> takes a term ID as its only mandatory parameter and
should be applied to callback method parameters to get the tokens associated with that term (if and when
the intent was matched and that callback was invoked).
</p>
<p>
Depending on the term quantifier the method parameter type can only be one of the following types:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Quantifier</th>
<th>Java Type</th>
<th>Scala Type</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>[1,1]</code></td>
<td><code>NCToken</code></td>
<td><code>NCToken</code></td>
</tr>
<tr>
<td><code>[0,1]</code></td>
<td><code>Optional&lt;NCToken&gt;</code></td>
<td><code>Option[NCToken]</code></td>
</tr>
<tr>
<td><code>[1,∞]</code> or <code>[0,∞]</code></td>
<td><code>java.util.List&lt;NCToken&gt;</code></td>
<td><code>List[NCToken]</code></td>
</tr>
</tbody>
</table>
<p>
For example:
</p>
<pre class="brush: java">
&#64;NCIntent("intent=id term(termId)={id == 'my_token'}?")
private NCResult onMatch(
&#64;NCIntentTerm("termId") Optional&lt;NCToken&gt; myTok
) {
...
}
</pre>
<p><b>NOTES:</b></p>
<ul>
<li>
Term <code>termId</code> has <code>[0,1]</code> quantifier (it's optional).
</li>
<li>
The formal parameter on the callback has a type of <code>Optional&lt;NCToken&gt;</code> because the
term's quantifier is <code>[0,1]</code>.
</li>
<li>
Note that callback doesn't have an optional <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a>
parameter.
</li>
</ul>
<h3 class="section-title"><code>NCRejection</code> and <code>NCIntentSkip</code> Exceptions</h3>
<p>
There are two exceptions that can be used by intent callback logic to control intent matching process.
</p>
<p>
When <a href="/apis/latest/org/apache/nlpcraft/model/NCRejection.html">NCRejection</a> exception is thrown by the callback it indicates that user input cannot
be processed as is. This exception typically indicates that user has not provided enough information in
the input string to have it processed automatically. In most cases this means that the user's input is
either too short or too simple, too long or too complex, missing required context, or is unrelated to the
requested data model.
</p>
<p>
<a href="/apis/latest/org/apache/nlpcraft/model/NCIntentSkip.html">NCIntentSkip</a> is a control flow exception to
skip current intent. This exception can be thrown by the intent callback to indicate that current intent
should be skipped (even though it was matched and its callback was called). If there's more than one intent
matched the next best matching intent will be selected and its callback will be called.
<p>
<p>
This exception becomes useful when it is hard or impossible to encode the entire matching logic using just
declarative intent DSL. In these cases the intent definition can be relaxed and the "last mile" of intent
matching can happen inside of the intent callback's user logic. If it is determined that intent in fact does
not match then throwing this exception allows to try next best matching intent, if any.
</p>
<p>
Note that there's a significant difference between <a href="/apis/latest/org/apache/nlpcraft/model/NCIntentSkip.html">NCIntentSkip</a>
exception and model's <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html#onMatchedIntent-org.apache.nlpcraft.model.NCIntentMatch-">onMatchedIntent(...)</a>
callback. Unlike this callback, the exception does not force re-matching of all intents, it simply
picks the next best intent from the list of already matched ones. The model's callback can force
a full reevaluation of all intents against the user input.
</p>
<div class="bq info">
<p>
<b>Intent DSL Expressiveness</b>
</p>
<p>
Note that usage of <code>NCIntentSkip</code> exception (or even model's life-cycle callbacks) is a
requires technique when you cannot express the desired matching logic with just intent DSL alone.
Intent DSL is a high-level declarative language and it does
not support programmable logic or other types of complex matching algorithms. In such cases, you can
define a broad intent that would match and then define the rest of the more complex matching logic in the callback
using <code>NCIntentSkip</code> exception to effectively indicate that intent doesn't match (and other
intents have to be tried).
</p>
<p>
There are many use cases where DSL is not expressive enough. For example, if you intent matching depends
on the date and time, financial market conditions, weather, state from external systems or current user geographical
location - you will need to use <code>NCIntentSkip</code>-based logic or model's callbacks to support
that type of matching.
</p>
</div>
<h3 class="section-title"><code>NCIntentMatch</code> Interface</h3>
<p>
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentMatch.html">NCIntentMatch</a> interface
can be passed into intent callback as its first parameter. This interface provide runtime information
about the intent that was matched (i.e. the intent with which this callback was annotated with). Note also that
intent context can only be the 1st parameter in the callback, and if not declared as such - it won't be passed in.
</p>
</section>
<section id="examples">
<h2 class="section-title">Intent Examples</h2>
<p>
Here's number of intent examples with explanations. These intents can be defined directly in <code>@NCIntent</code>
annotation or in external JSON or YAML mode definition.
</p>
<p>
<b>Example 1:</b>
</p>
<pre class="brush: js">
intent=id1
term={id == 'x:id'}
term(nums)={id == 'nlpcraft:num' && lowercase(~nlpcraft:num:unittype) == 'datetime'}[0,2]
</pre>
<p><b>NOTES:</b></p>
<ul>
<li>
Intent has ID <code>id1</code>.
</li>
<li>
Intent has uses default conversational support (<code>true</code>) and default order (<code>false</code>).
</li>
<li>
Intent has two terms that have to be found for the intent to match. Note that second term is optional as it
has <code>[0,2]</code> quantifier.
</li>
<li>
First term matches any single token with ID <code>x:id</code>.
</li>
<li>
Second term can appear zero, once or two times and it matches token with ID <code>nlpcraft:num</code> with
<code>nlpcraft:num:unittype</code> metadata property equal to <code>'datetime'</code> string.
</li>
<li>
Function <code>lowercase</code> used on <code>nlpcraft:num:unittype</code> metadata property value. Note
that built-in functions can only be used on left values.
</li>
<li>
Note that since second term has ID (<code>nums</code>) it can be references by <code>@NCIntentTerm</code>
annotation by the callback formal parameter.
</li>
<li>
Read more on built-in functions and token metadata properties in <a href="data-model.html#dsl">Token DSL</a>
section.
</li>
</ul>
<br/>
<p>
<b>Example 2:</b>
</p>
<pre class="brush: js">
intent=id2
conv=false
flow='id1* >> (id1|id2)[1,2]'
term={id == 'mytok' && signum(~score['best']) != -1}
term={(groups @@ 'actors' || groups @@ 'owners') && size(partAlias.~text) > 10}
</pre>
<p><b>NOTES:</b></p>
<ul>
<li>
Intent has ID <code>id2</code>.
</li>
<li>
Intent overrides default conversational support via <code>conv=false</code> and uses default order (<code>false</code>).
</li>
<li>
Intent has dialog flow pattern to match: <code>'id1* >> (id1|id2)[1,2]'</code>. It expect zero or more
intents <code>id1</code> to matched immediately prior to this one and either one or two of <code>id1</code> or
<code>id2</code> intents before that.
</li>
<li>
Intent has two terms. Both terms have to be present only once (their implicit quantifiers are <code>[1,1]</code>).
</li>
<li>
First term should be a token with ID <code>mytok</code> and have metadata property <code>score</code> of type
map. This map should have a value with the string key <code>'best'</code>. <code>signum</code> of this map value
should not equal <code>-1</code>. Note that built-in functions (i.e. <code>signum</code>) can only be
used on the left values.
</li>
<li>
Second term should be a token that belongs to either <code>actors</code> or <code>owners</code> group.
It should have a part token whose <a href="data-model.html#dsl">Token DSL</a> expression should have alias <code>partAlias</code>. That
part token should have metadata property <code>text</code> of type string or collection. The length of
this string or collection should be greater than <code>10</code>.
</li>
<li>
Read more on built-in functions and token metadata properties in <a href="data-model.html#dsl">Token DSL</a>
section.
</li>
</ul>
<div class="bq success">
<p>
<b>Token DSL</b>
</p>
<p>
Intent DSL is built on top of the <a href="data-model.html#dsl">Token DSL</a> that provides comprehensive support for
defining predicates over detected model elements.
Make sure to read about <a href="data-model.html#dsl">Token DSL</a> as well.
</p>
</div>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#matching">Intent Matching</a></li>
<li><a href="#syntax">Intent DSL</a></li>
<li><a href="#callback">Intent Callback</a></li>
<li><a href="#examples">Intent Examples</a></li>
{% include quick-links.html %}
</ul>
</div>