blob: 2e8d9235db8c34c1fa2d41ba36e25ecf1ec4c5bc [file] [log] [blame]
---
active_crumb: NLPCraft IDL - Intent Definition Language
layout: blog
blog_title: NLPCraft IDL - Intent Definition Language
author_name: Aaron Radzinski
author_avatar: images/lion.jpg
author_twitter_id: aaron_radzinski
publish_date: June 3, 2021
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<section>
<div class="bq info">
This blog is an English adaptation of Sergey Kamov's <a target=habr href="https://habr.com/ru/post/559716">blog</a> written in Russian.
</div>
{% include latest-ver-blog-warn.html %}
<p>
This article is a second part of the article <a target=habr href="https://habr.com/ru/post/534034/">Проектируем интенты с Apache NLPCraft</a>
<img alt="" style="vertical-align: baseline" src="/images/ru-flag-16.png"> and contains a detailed description of
NLPCraft IDL - Intent Definition Language, created for NLP projects based on the Apache NlpCraft.
NLPCraft IDL support has been added to the system since version <a href="/download.html">0.7.5</a>.
</p>
<p>
The declarative Intent Definition Language, called <a href="/intent-matching.html">NLPCraft IDL</a>, significantly simplifies the process of
working with intents in NLP-based dialog and search systems developed using Apache NLPsCraft and at the same time
expands the capabilities of them.
</p>
<div class="bq success">
<p>
Note that at the point of choosing the best matching intent, NLP systems, in general, already have a parsed and processed
user input request, which contains combinations of request’s tokens and other related information.
</p>
</div>
</section>
<section>
<h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Let's start with examples to demonstrate the general capabilities of the language, provide the necessary explanations,
and then describe the design of the language a bit more formally.
</p>
<pre class="brush: idl">
intent=xa
flow="^(?:login)(^:logout)*$"
meta={'enabled': true}
term(a)={# != "z"}[1,3]
term(b)={
meta_intent('enabled') == true && // Must be active.
month == 1 // January.
}
term(c)~{
// Variables.
@usrTypes = meta_model('user_types')
// Predicate.
(# == 'order' || # == 'order_cancel') &&
has_all(@usrTypes, list(1, 2, 3) &&
abs(meta_tok('order:size')) > 10)
</pre>
<p><b>NOTES:</b></p>
<ul class="fixed">
<li>
The intent name is <code>xa</code>.</li>
<li>
The intent contains three terms. Term is an element that defines a predicate, each of which must
pass for the intent to be selected:
<ul class="fixed">
<li>
<code>term(a)</code> - the parsed request must contain between one and three tokens with identifiers
other than <code>z</code>, without taking into account data from the dialogue history (term type <code>=</code>).
</li>
<li>
<code>term(b)</code> - the intent must be active - the flag <code>enabled</code> from the model metadata.
In addition, such an intent can only be triggered in January - the <code>month()</code> built-in function.
</li>
<li>
<code>term(c)</code> - the token with identifier <code>order</code> or <code>order_cancel</code> should
be found in the request or in the dialogue history (term type <code>~</code>). There are additional
restrictions in the model metadata values and also the order size absolute value should be more than 10.
In the definition of this term we use IDL variables - and we will talk about them a little bit later.
</li>
</ul>
</li>
<li>
<code>flow=</code> This section defines an additional rule, according to which for the intent to
be triggered, the intent with the <code>login</code> identifier must have been selected at least
once within the current session, and the intent with the identifier <code>logout</code> should have never
been selected in this session. The rule is defined as a regular expression based on the intents IDs
of the previous intents in the user session. Other ways to create similar rules will also be
described below.
</li>
<li>
<code>meta=</code> For the given intent, a certain set of data can be defined - in this case a
configuration - with which you can enable or disable the intent.
</li>
</ul>
<pre class="brush: idl">
intent=xb
flow=/#flowModelMethod/
term(a)=/org.mypackage.MyClass#termMethod/?
fragment(frag)
</pre>
<p><b>NOTES:</b></p>
<ul class="fixed">
<li>
The intent name is <code>xb</code>.
</li>
<li>
<code>term(a)</code> The intent can contain one optional term (<code>?</code> quantifier,
detailed explanations will be given below), defined in the code - <code>org.mypackage.MyClass#termMethod</code> method.</li>
<li>
<code>fragment(frag)</code> Fragment with identifier <code>frag</code> extends the list of terms of the intent with
additional terms that are defined elsewhere (in <code>fragment</code> expression). The <code>frag</code> element must
be defined above in the code, or accessible via import statement.
</li>
<li>
<code>flow=</code> Flow contains the condition specified in the model code in the method <code>flowModelMethod</code>.
</li>
</ul>
<p>
You can find more examples <a href="https://nlpcraft.apache.org/intent-matching.html">here</a>
</p>
</section>
<section>
<h2 class="section-title">Deeper Dive<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<div class="bq info">
<p>
Note that this article is only a brief overview of the capabilities of NLPCraft IDL and is not trying to
be a comprehensive manual. Before starting to work with NLPCraft, it is recommended to study a
detailed description of the syntax and all the features review of the language in the corresponding sections
of the <a href="https://nlpcraft.apache.org/intent-matching.html">documentation</a>.
</p>
</div>
<h2 class="section-sub-title">The Places Where Intents Are Defined <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<ul class="fixed">
<li>
Intents defined with NLPCraft IDL can be declared directly in static model definition JSON or YAML files.
This approach is very convenient for simple cases. An example is available <a target=github href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/lightswitch/src/main/resources/lightswitch_model.yaml">here</a>.
</li>
<li>
Intents can also be defined directly in the model code using <a target="javadoc" href="https://nlpcraft.apache.org/apis/latest/org/apache/nlpcraft/model/NCIntent.html">@NCIntent</a>
annotations. An example can be found <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/time/src/main/java/org/apache/nlpcraft/examples/time/TimeModel.java">here</a>.
</li>
<li>
In addition, intents can be defined in separate special files (usually with <code>*.idl</code> extension).
In this case, the model will refer to these intents according to the specified path to these files or
URL resources using the import statement. This approach is convenient when working with large models,
when syntax highlighting and other features provided by the IDE may be useful (for example, Intellij IDEA
provides keyword highlighting, hints and syntax checking for files for the configured types). In addition,
this approach can be useful for specialists who do not have the ability or desire to edit the code directly.
An example is available at <a target=github href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/alarm/src/main/resources/alarm_model.json">here</a>
and <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/alarm/src/main/resources/intents.idl">here</a>.
</li>
</ul>
<h2 class="section-sub-title">Keywords <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
NLPCraft IDL has only 10 keywords: <code>flow, fragment, import, intent, meta, ordered, term, true, false, null.</code>
</p>
<ul class="fixed">
<li><code>intent, flow, fragment, meta, options, term</code> are parts of the intent definition.</li>
<li><code>fragment</code> keyword is also can be used to create named terms lists to include in intent definitions (a-la macros).</li>
<li><code>import</code> - required for including external files with fragment, intent or imports statements.</li>
<li><code>true, false, null</code> - used when working with built-in functions.</li>
</ul>
<h2 class="section-sub-title">Lifecycle <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
An intent compiled when the <a href="/docs.html#data-model">model</a> loaded and can be debugged only when the
model is being debugged. To define complex intents, it is recommended to create them in their own separated
files and use the editing capabilities provided by the IDE to ensure initial validation of the syntax.
</p>
<h2 class="section-sub-title">Program Structure <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
The program contains a set of the following optional elements, in no particular order:
</p>
<ul class="fixed">
<li>
<code>import</code> statement
</li>
<li><code>fragment</code> statement</li>
<li><code>intent</code> statement</li>
</ul>
<b><code>import</code> Statement</b>
<p>
It contains the <code>import</code> keyword and the file name or URL resource in parentheses. It allows to import
other IDL definitions from external files or URLs. For example:
</p>
<pre class="brush: idl">
import('http://mysite.com/nlp/idls/external.idl)
</pre>
<b><code>fragment</code> Statement</b>
<p>
It contains the <code>fragment</code> keyword with the name and a list of terms. Terms can be parameterized.
An example of a simple fragment:
</p>
<pre class="brush: idl">
fragment=buzz term~{tok_id() == 'x:alarm'}
</pre>
<p>
An example of a parameterized fragment with arguments <code>a</code> and <code>b</code>:
</p>
<pre class="brush: idl">
fragment=p1
term={
meta_frag('a') &&
has_any(get(meta_frag('b'), 'Москва'), list(1, 2))
}
</pre>
<p>
Below is an example of using this fragment in an intent. Note that fragment’s parameters (its metadata) is passed using JSON format:
</p>
<pre class="brush: idl">
intent=i1
fragment(p1, {'a': true, 'b': {'Москва': [1, 2, 3]}})
</pre>
<b><code>intent</code> Statement</b>
<p>
This statement is the core element of IDL allowing to declare the intent. Here's the example
of the simple intent:
</p>
<pre class="brush: idl">
intent=xa
flow="^(?:login)(^:logout)*$"
meta={'enabled': true}
term(a)={# != "z"}[1,3]
</pre>
<p>
Every intent statement consists of the following elements:
</p>
<ul class="fixed">
<li>
<p><b>Intent Name</b></p>
<p>
Required element. The name is a unique identifier within the model. Below is an example of using a reference
to the <code>xa</code> intent.
</p>
<pre class="brush: java">
@NCIntentRef("xa")
fun onTimeMatch(
ctx: NCIntentMatch,
@NCIntentTerm("a") tok: NCToken
): NCResult { ... }
</pre>
</li>
<li>
<p><b>Intent Terms</b></p>
<p>
At least one term is required. Term is the main element of the intent definition. Term consists of
the predicate over a token (term body). The constituent parts of the predicate can be based on
the token or on some other factors.
</p>
<p>
Here's how an intent's term is defined:
</p>
<ul class="fixed">
<li>The <code>term</code> keyword. Required element.</li>
<li>
Name in parentheses. Optional. Used to create references to the found token in the callback arguments,
see the example above, token <code>a</code>.
</li>
<li>
<p>
Term type. Required element. Two term types are supported:
</p>
<ul class="fixed">
<li><code>~</code> the token can be obtained from the history of the dialog or from the current request (i.e. the term is conversational).</li>
<li><code>=</code> the token should be obtained only from the current request.</li>
</ul>
<p>
Example terms:
</p>
<pre class="brush: idl">
term(nums)~{# == 'nlpcraft:num'} // Conversational '~' term.
term(nums)={# != 'z'} // Non-conversational '=' term.
</pre>
</li>
<li>
<p>
The term body. Required element. There are two ways to define the term body:
using IDL script with built-in functions or in external Java-based code:
</p>
<pre class="brush: idl">
term(nums)={# == 'nlpcraft:num'} // IDL script.
term(yes)~{true} // IDL script.
term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
</pre>
<p>
Note the special syntax for the last term. The existing set of built-in functions are typically
enough to define many terms of the intent. But, if necessary, the programmer can create her
own predicates in Java, Scala, Kotlin, Groovy or any other Java based language and write them
in the term body. That is, NLPCraft IDL can have snippets of code written entirely in other languages:
</p>
<pre class="brush: idl">
term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
</pre>
<p>
This function <code>termMethod</code> receives an argument that contains all the necessary
data on the user's request, and as the output it should return the value of a specific type.
</p>
</li>
<li>
<p>
Quantifier. Optional. The default value is <code>[1, 1]</code>. The following types of quantifiers are supported:
</p>
<ul class="fixed">
<li><code>[M, N]</code> - the term must be found from <code>N</code> to <code>M</code> times.</li>
<li><code>*</code> - the term must be found at least once, is equivalent to <code>[0, ∞]</code></li>
<li><code>+</code> - the term must be found more than once, is it equivalent to <code>[1, ∞]</code></li>
<li><code>?</code> - the term must be found 0 or 1 time, equivalent to <code>[0, 1]</code></li>
</ul>
<p>
Examples:
</p>
<pre class="brush: idl">
term(nums)={# == 'nlpcraft:num'}[1,2] // The request must contain one or two tokens with the ID “nlpcraft: num”.
term(nums)={# == 'nlpcraft:num'}* // The request must contain zero or more tokens with the ID “nlpcraft: num”.
</pre>
</li>
</ul>
</li>
<li>
<p><b>IDL Built-In Functions</b></p>
<p>
NLPCraft IDL provides over 140 <a href="/intent-matching.html#idl_functions">built-in functions</a> that can
be used in term definition. These functions can be conventionally classified into the following categories:
</p>
<ul class="fixed">
<li>Based on base tokens properties - token IDs, groups, parent, hierarchy, etc. Examples: <code>tok_id</code>, <code>tok_groups</code>, <code>tok_parent</code>.</li>
<li>NLP-based tokens properties - stemmas, lemmas, parts of speech, stop words. Examples: <code>tok_lemma</code>, <code>tok_is_wordnet</code>, <code>tok_swear</code>.</li>
<li>Based on information about how the token was found in the user's request - synonym values, etc. Examples: <code>tok_value</code>, <code>tok_is_permutated</code>, <code>tok_is_direct</code>.</li>
<li>Based on user request data - request time, user agent type. Examples: <code>req_tstamp</code>, <code>req_addr</code>, <code>req_agent</code>.</li>
<li>Based on various metadata - tokens, model, request, etc. Examples: <code>meta_model('my: prop')</code>, <code>meta_tok('nlpcraft: num: unit')</code>, <code>meta_user('my: prop')</code>.</li>
<li>Based on data provided by NER token providers. Example, for <code>geo:city</code> token it can be the number of city residents or coordinates obtained from metadata.</li>
<li>Based on the user and his company - admin status, registration time. Examples: <code>user_admin</code>, <code>comp_name</code>, <code>user_signup_tstamp</code>.</li>
<li>Based on system/environment variables, system time, etc. Examples:<code> meta_sys('java.home'), now, day_of_week</code>.</li>
<li>Math functions, text functions, collection functions, etc. Examples: <code>lowercase("TeXt")</code>, <code>abs(-1.5)</code>, <code>distinct(list(1, 2, 2, 3, 1))</code>.</li>
</ul>
<p>
More detailed information and a description of each function can be found <a href="/intent-matching.html#idl_functions">here</a>.
</p>
</li>
<li>
<p><b>Term Variables</b></p>
<p>
The term body written in IDL script is a predicate written using IDL built-in functions. To avoid the
unnecessary repetitive computations when working with built-in functions, local term variables can be used:
</p>
<pre class="brush: idl">
term(t2)={
@a = meta_model('a')
@list = list(1, 2, 3, 4)
(@a == 42 || @a == 44) && has_all(@list, list(3, 2))
}
</pre>
<p>
Local variables are defined and used with the special prefix <code>@</code>. They used to avoid
repeated computations and shortening/simplifying of the IDL term predicate expressions.
</p>
</li>
<li>
<p><b>Intent Fragments</b></p>
<p>
Fragment is a named set of terms that is created to be reusable across intents.
See example by following this <a href="/intent-matching.html#idl">link</a>.
</p>
</li>
<li>
<p><b>Intent Flow</b></p>
<p>
Here we define an additional intent selection rule based on data from previous intent matches
within the current session. This rule can be defined using a regex based on the IDs of
the previously matched intents:
</p>
<pre class="brush: idl">
flow="^(?:login)(^:logout)*$"
</pre>
<p>
This rule means that for the intent to be matched, it is necessary that within the current session
the intent with the <code>login</code> identifier has already been matched before, and
not with identifier <code>logout</code>.
</p>
<p>
If it is necessary to define more complex logic, it can also be moved into user code
written in any Java-based language, like the term body:
</p>
<pre class="brush: scala">
@NCIntent("intent=x
flow=/com.company.dialog.Flow#customFlow/
term~{# == 'some_id'}"
)
def onX(): NCResult = { .. }
</pre>
<p>
The predicate defined in the method <code>customFlow()</code> receives at the input a list with all
intents information, previously matched within the current session, and returns a boolean value.
</p>
</li>
<li>
<p><b>Intent Metadata</b></p>
<p>
Optional element. A additional dataset that can be used by term predicates presented in JSON format.
</p>
</li>
</ul>
</section>
<section>
<h2 class="section-title">Why Do We Need NLPCraft IDL? <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
All the logic for creating intents defined using NLPCraft IDL, can be written in any Java based language.
Why, then, is this new language needed at all? Even if its syntax is short, simple and straightforward, you
still have to spend some time studying it.
</p>
<p>
Below are some reasons for using NLPCraft IDL:
</p>
<ul>
<li>
The program terseness. A custom DSL code is always shorter than a general programming language code with
the same logic. For intents with non-trivial rules, this can be important.
</li>
<li>
If the NLPCraft IDL code is defined in a separate file, then editing the IDL program does not require the rebuild of the code of the model and its callbacks.
</li>
<li>
Separating the logic of writing callbacks and the logic of matching intents. Different people can
work with these tasks. Due to the deliberate limited IDL features, DSL is easier to learn by a non-programmer.
</li>
<li>
Currently, models can be created in any Java based language. Apache NLPCraft plans to expand the list of
supported languages in the near future. Using NLPCraft IDL will allow you to maintain one common language
for defining intents for different types of models within the same project.
</li>
</ul>
</section>
<section>
<h2 class="section-title">Conclusion <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
I hope you were able to get a first impression of the capabilities of the NLPCraft IDL language and the
types of tasks that can be solved using it. <a href="/intent-matching.html">Here</a> you will find a detailed description of the
language and its capabilities. Additional examples of models created in Java, Kotlin, Groovy and Scala,
using NLPCraft IDL to define intents, are available in the <a href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">GitHub</a> project.
</p>
</section>