| --- |
| active_crumb: NLPCraft IDL - Intent Definition Language |
| layout: blog |
| blog_title: NLPCraft IDL - Intent Definition Language |
| author_name: Aaron Radzinski |
| author_avatar: images/lion.jpg |
| author_twitter_id: aaron_radzinski |
| publish_date: June 3, 2021 |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <section> |
| <div class="bq info"> |
| This blog is an English adaptation of Sergey Kamov's <a target=habr href="https://habr.com/ru/post/559716">blog</a> written in Russian. |
| </div> |
| {% include latest-ver-blog-warn.html %} |
| <p> |
| This article is a second part of the article <a target=habr href="https://habr.com/ru/post/534034/">Проектируем интенты с Apache NLPCraft</a> |
| <img alt="" style="vertical-align: baseline" src="/images/ru-flag-16.png"> and contains a detailed description of |
| NLPCraft IDL - Intent Definition Language, created for NLP projects based on the Apache NlpCraft. |
| NLPCraft IDL support has been added to the system since version <a href="/download.html">0.7.5</a>. |
| </p> |
| <p> |
| The declarative Intent Definition Language, called <a href="/intent-matching.html">NLPCraft IDL</a>, significantly simplifies the process of |
| working with intents in NLP-based dialog and search systems developed using Apache NLPsCraft and at the same time |
| expands the capabilities of them. |
| </p> |
| <div class="bq success"> |
| <p> |
| Note that at the point of choosing the best matching intent, NLP systems, in general, already have a parsed and processed |
| user input request, which contains combinations of request’s tokens and other related information. |
| </p> |
| </div> |
| </section> |
| <section> |
| <h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Let's start with examples to demonstrate the general capabilities of the language, provide the necessary explanations, |
| and then describe the design of the language a bit more formally. |
| </p> |
| <pre class="brush: idl"> |
| intent=xa |
| flow="^(?:login)(^:logout)*$" |
| meta={'enabled': true} |
| term(a)={# != "z"}[1,3] |
| term(b)={ |
| meta_intent('enabled') == true && // Must be active. |
| month == 1 // January. |
| } |
| term(c)~{ |
| // Variables. |
| @usrTypes = meta_model('user_types') |
| |
| // Predicate. |
| (# == 'order' || # == 'order_cancel') && |
| has_all(@usrTypes, list(1, 2, 3) && |
| abs(meta_tok('order:size')) > 10) |
| </pre> |
| <p><b>NOTES:</b></p> |
| <ul class="fixed"> |
| <li> |
| The intent name is <code>xa</code>.</li> |
| <li> |
| The intent contains three terms. Term is an element that defines a predicate, each of which must |
| pass for the intent to be selected: |
| <ul class="fixed"> |
| <li> |
| <code>term(a)</code> - the parsed request must contain between one and three tokens with identifiers |
| other than <code>z</code>, without taking into account data from the dialogue history (term type <code>=</code>). |
| </li> |
| <li> |
| <code>term(b)</code> - the intent must be active - the flag <code>enabled</code> from the model metadata. |
| In addition, such an intent can only be triggered in January - the <code>month()</code> built-in function. |
| </li> |
| <li> |
| <code>term(c)</code> - the token with identifier <code>order</code> or <code>order_cancel</code> should |
| be found in the request or in the dialogue history (term type <code>~</code>). There are additional |
| restrictions in the model metadata values and also the order size absolute value should be more than 10. |
| In the definition of this term we use IDL variables - and we will talk about them a little bit later. |
| </li> |
| </ul> |
| </li> |
| <li> |
| <code>flow=</code> This section defines an additional rule, according to which for the intent to |
| be triggered, the intent with the <code>login</code> identifier must have been selected at least |
| once within the current session, and the intent with the identifier <code>logout</code> should have never |
| been selected in this session. The rule is defined as a regular expression based on the intents IDs |
| of the previous intents in the user session. Other ways to create similar rules will also be |
| described below. |
| </li> |
| <li> |
| <code>meta=</code> For the given intent, a certain set of data can be defined - in this case a |
| configuration - with which you can enable or disable the intent. |
| </li> |
| </ul> |
| <pre class="brush: idl"> |
| intent=xb |
| flow=/#flowModelMethod/ |
| term(a)=/org.mypackage.MyClass#termMethod/? |
| fragment(frag) |
| </pre> |
| <p><b>NOTES:</b></p> |
| <ul class="fixed"> |
| <li> |
| The intent name is <code>xb</code>. |
| </li> |
| <li> |
| <code>term(a)</code> The intent can contain one optional term (<code>?</code> quantifier, |
| detailed explanations will be given below), defined in the code - <code>org.mypackage.MyClass#termMethod</code> method.</li> |
| <li> |
| <code>fragment(frag)</code> Fragment with identifier <code>frag</code> extends the list of terms of the intent with |
| additional terms that are defined elsewhere (in <code>fragment</code> expression). The <code>frag</code> element must |
| be defined above in the code, or accessible via import statement. |
| </li> |
| <li> |
| <code>flow=</code> Flow contains the condition specified in the model code in the method <code>flowModelMethod</code>. |
| </li> |
| </ul> |
| <p> |
| You can find more examples <a href="https://nlpcraft.apache.org/intent-matching.html">here</a> |
| </p> |
| </section> |
| <section> |
| <h2 class="section-title">Deeper Dive<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <div class="bq info"> |
| <p> |
| Note that this article is only a brief overview of the capabilities of NLPCraft IDL and is not trying to |
| be a comprehensive manual. Before starting to work with NLPCraft, it is recommended to study a |
| detailed description of the syntax and all the features review of the language in the corresponding sections |
| of the <a href="https://nlpcraft.apache.org/intent-matching.html">documentation</a>. |
| </p> |
| </div> |
| <h2 class="section-sub-title">The Places Where Intents Are Defined <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <ul class="fixed"> |
| <li> |
| Intents defined with NLPCraft IDL can be declared directly in static model definition JSON or YAML files. |
| This approach is very convenient for simple cases. An example is available <a target=github href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/lightswitch/src/main/resources/lightswitch_model.yaml">here</a>. |
| </li> |
| <li> |
| Intents can also be defined directly in the model code using <a target="javadoc" href="https://nlpcraft.apache.org/apis/latest/org/apache/nlpcraft/model/NCIntent.html">@NCIntent</a> |
| annotations. An example can be found <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/time/src/main/java/org/apache/nlpcraft/examples/time/TimeModel.java">here</a>. |
| </li> |
| <li> |
| In addition, intents can be defined in separate special files (usually with <code>*.idl</code> extension). |
| In this case, the model will refer to these intents according to the specified path to these files or |
| URL resources using the import statement. This approach is convenient when working with large models, |
| when syntax highlighting and other features provided by the IDE may be useful (for example, Intellij IDEA |
| provides keyword highlighting, hints and syntax checking for files for the configured types). In addition, |
| this approach can be useful for specialists who do not have the ability or desire to edit the code directly. |
| An example is available at <a target=github href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/alarm/src/main/resources/alarm_model.json">here</a> |
| and <a target="github" href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/alarm/src/main/resources/intents.idl">here</a>. |
| </li> |
| </ul> |
| <h2 class="section-sub-title">Keywords <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| NLPCraft IDL has only 10 keywords: <code>flow, fragment, import, intent, meta, ordered, term, true, false, null.</code> |
| </p> |
| <ul class="fixed"> |
| <li><code>intent, flow, fragment, meta, options, term</code> are parts of the intent definition.</li> |
| <li><code>fragment</code> keyword is also can be used to create named terms lists to include in intent definitions (a-la macros).</li> |
| <li><code>import</code> - required for including external files with fragment, intent or imports statements.</li> |
| <li><code>true, false, null</code> - used when working with built-in functions.</li> |
| </ul> |
| <h2 class="section-sub-title">Lifecycle <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| An intent compiled when the <a href="/docs.html#data-model">model</a> loaded and can be debugged only when the |
| model is being debugged. To define complex intents, it is recommended to create them in their own separated |
| files and use the editing capabilities provided by the IDE to ensure initial validation of the syntax. |
| </p> |
| <h2 class="section-sub-title">Program Structure <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| The program contains a set of the following optional elements, in no particular order: |
| </p> |
| <ul class="fixed"> |
| <li> |
| <code>import</code> statement |
| </li> |
| <li><code>fragment</code> statement</li> |
| <li><code>intent</code> statement</li> |
| </ul> |
| <b><code>import</code> Statement</b> |
| <p> |
| It contains the <code>import</code> keyword and the file name or URL resource in parentheses. It allows to import |
| other IDL definitions from external files or URLs. For example: |
| </p> |
| <pre class="brush: idl"> |
| import('http://mysite.com/nlp/idls/external.idl) |
| </pre> |
| <b><code>fragment</code> Statement</b> |
| <p> |
| It contains the <code>fragment</code> keyword with the name and a list of terms. Terms can be parameterized. |
| An example of a simple fragment: |
| </p> |
| <pre class="brush: idl"> |
| fragment=buzz term~{tok_id() == 'x:alarm'} |
| </pre> |
| <p> |
| An example of a parameterized fragment with arguments <code>a</code> and <code>b</code>: |
| </p> |
| <pre class="brush: idl"> |
| fragment=p1 |
| term={ |
| meta_frag('a') && |
| has_any(get(meta_frag('b'), 'Москва'), list(1, 2)) |
| } |
| </pre> |
| <p> |
| Below is an example of using this fragment in an intent. Note that fragment’s parameters (its metadata) is passed using JSON format: |
| </p> |
| <pre class="brush: idl"> |
| intent=i1 |
| fragment(p1, {'a': true, 'b': {'Москва': [1, 2, 3]}}) |
| </pre> |
| <b><code>intent</code> Statement</b> |
| <p> |
| This statement is the core element of IDL allowing to declare the intent. Here's the example |
| of the simple intent: |
| </p> |
| <pre class="brush: idl"> |
| intent=xa |
| flow="^(?:login)(^:logout)*$" |
| meta={'enabled': true} |
| term(a)={# != "z"}[1,3] |
| </pre> |
| <p> |
| Every intent statement consists of the following elements: |
| </p> |
| <ul class="fixed"> |
| <li> |
| <p><b>Intent Name</b></p> |
| <p> |
| Required element. The name is a unique identifier within the model. Below is an example of using a reference |
| to the <code>xa</code> intent. |
| </p> |
| <pre class="brush: java"> |
| @NCIntentRef("xa") |
| fun onTimeMatch( |
| ctx: NCIntentMatch, |
| @NCIntentTerm("a") tok: NCToken |
| ): NCResult { ... } |
| </pre> |
| </li> |
| <li> |
| <p><b>Intent Terms</b></p> |
| <p> |
| At least one term is required. Term is the main element of the intent definition. Term consists of |
| the predicate over a token (term body). The constituent parts of the predicate can be based on |
| the token or on some other factors. |
| </p> |
| <p> |
| Here's how an intent's term is defined: |
| </p> |
| <ul class="fixed"> |
| <li>The <code>term</code> keyword. Required element.</li> |
| <li> |
| Name in parentheses. Optional. Used to create references to the found token in the callback arguments, |
| see the example above, token <code>a</code>. |
| </li> |
| <li> |
| <p> |
| Term type. Required element. Two term types are supported: |
| </p> |
| <ul class="fixed"> |
| <li><code>~</code> the token can be obtained from the history of the dialog or from the current request (i.e. the term is conversational).</li> |
| <li><code>=</code> the token should be obtained only from the current request.</li> |
| </ul> |
| <p> |
| Example terms: |
| </p> |
| <pre class="brush: idl"> |
| term(nums)~{# == 'nlpcraft:num'} // Conversational '~' term. |
| term(nums)={# != 'z'} // Non-conversational '=' term. |
| </pre> |
| </li> |
| <li> |
| <p> |
| The term body. Required element. There are two ways to define the term body: |
| using IDL script with built-in functions or in external Java-based code: |
| </p> |
| <pre class="brush: idl"> |
| term(nums)={# == 'nlpcraft:num'} // IDL script. |
| term(yes)~{true} // IDL script. |
| term~/org.mypackage.MyClass#termMethod/? // Reference to external code. |
| </pre> |
| <p> |
| Note the special syntax for the last term. The existing set of built-in functions are typically |
| enough to define many terms of the intent. But, if necessary, the programmer can create her |
| own predicates in Java, Scala, Kotlin, Groovy or any other Java based language and write them |
| in the term body. That is, NLPCraft IDL can have snippets of code written entirely in other languages: |
| </p> |
| <pre class="brush: idl"> |
| term~/org.mypackage.MyClass#termMethod/? // Reference to external code. |
| </pre> |
| <p> |
| This function <code>termMethod</code> receives an argument that contains all the necessary |
| data on the user's request, and as the output it should return the value of a specific type. |
| </p> |
| </li> |
| <li> |
| <p> |
| Quantifier. Optional. The default value is <code>[1, 1]</code>. The following types of quantifiers are supported: |
| </p> |
| <ul class="fixed"> |
| <li><code>[M, N]</code> - the term must be found from <code>N</code> to <code>M</code> times.</li> |
| <li><code>*</code> - the term must be found at least once, is equivalent to <code>[0, ∞]</code></li> |
| <li><code>+</code> - the term must be found more than once, is it equivalent to <code>[1, ∞]</code></li> |
| <li><code>?</code> - the term must be found 0 or 1 time, equivalent to <code>[0, 1]</code></li> |
| </ul> |
| <p> |
| Examples: |
| </p> |
| <pre class="brush: idl"> |
| term(nums)={# == 'nlpcraft:num'}[1,2] // The request must contain one or two tokens with the ID “nlpcraft: num”. |
| term(nums)={# == 'nlpcraft:num'}* // The request must contain zero or more tokens with the ID “nlpcraft: num”. |
| </pre> |
| </li> |
| </ul> |
| </li> |
| <li> |
| <p><b>IDL Built-In Functions</b></p> |
| <p> |
| NLPCraft IDL provides over 140 <a href="/intent-matching.html#idl_functions">built-in functions</a> that can |
| be used in term definition. These functions can be conventionally classified into the following categories: |
| </p> |
| <ul class="fixed"> |
| <li>Based on base tokens properties - token IDs, groups, parent, hierarchy, etc. Examples: <code>tok_id</code>, <code>tok_groups</code>, <code>tok_parent</code>.</li> |
| <li>NLP-based tokens properties - stemmas, lemmas, parts of speech, stop words. Examples: <code>tok_lemma</code>, <code>tok_is_wordnet</code>, <code>tok_swear</code>.</li> |
| <li>Based on information about how the token was found in the user's request - synonym values, etc. Examples: <code>tok_value</code>, <code>tok_is_permutated</code>, <code>tok_is_direct</code>.</li> |
| <li>Based on user request data - request time, user agent type. Examples: <code>req_tstamp</code>, <code>req_addr</code>, <code>req_agent</code>.</li> |
| <li>Based on various metadata - tokens, model, request, etc. Examples: <code>meta_model('my: prop')</code>, <code>meta_tok('nlpcraft: num: unit')</code>, <code>meta_user('my: prop')</code>.</li> |
| <li>Based on data provided by NER token providers. Example, for <code>geo:city</code> token it can be the number of city residents or coordinates obtained from metadata.</li> |
| <li>Based on the user and his company - admin status, registration time. Examples: <code>user_admin</code>, <code>comp_name</code>, <code>user_signup_tstamp</code>.</li> |
| <li>Based on system/environment variables, system time, etc. Examples:<code> meta_sys('java.home'), now, day_of_week</code>.</li> |
| <li>Math functions, text functions, collection functions, etc. Examples: <code>lowercase("TeXt")</code>, <code>abs(-1.5)</code>, <code>distinct(list(1, 2, 2, 3, 1))</code>.</li> |
| </ul> |
| <p> |
| More detailed information and a description of each function can be found <a href="/intent-matching.html#idl_functions">here</a>. |
| </p> |
| </li> |
| <li> |
| <p><b>Term Variables</b></p> |
| <p> |
| The term body written in IDL script is a predicate written using IDL built-in functions. To avoid the |
| unnecessary repetitive computations when working with built-in functions, local term variables can be used: |
| </p> |
| <pre class="brush: idl"> |
| term(t2)={ |
| @a = meta_model('a') |
| @list = list(1, 2, 3, 4) |
| |
| (@a == 42 || @a == 44) && has_all(@list, list(3, 2)) |
| } |
| </pre> |
| <p> |
| Local variables are defined and used with the special prefix <code>@</code>. They used to avoid |
| repeated computations and shortening/simplifying of the IDL term predicate expressions. |
| </p> |
| </li> |
| <li> |
| <p><b>Intent Fragments</b></p> |
| <p> |
| Fragment is a named set of terms that is created to be reusable across intents. |
| See example by following this <a href="/intent-matching.html#idl">link</a>. |
| </p> |
| </li> |
| <li> |
| <p><b>Intent Flow</b></p> |
| <p> |
| Here we define an additional intent selection rule based on data from previous intent matches |
| within the current session. This rule can be defined using a regex based on the IDs of |
| the previously matched intents: |
| </p> |
| <pre class="brush: idl"> |
| flow="^(?:login)(^:logout)*$" |
| </pre> |
| <p> |
| This rule means that for the intent to be matched, it is necessary that within the current session |
| the intent with the <code>login</code> identifier has already been matched before, and |
| not with identifier <code>logout</code>. |
| </p> |
| <p> |
| If it is necessary to define more complex logic, it can also be moved into user code |
| written in any Java-based language, like the term body: |
| </p> |
| <pre class="brush: scala"> |
| @NCIntent("intent=x |
| flow=/com.company.dialog.Flow#customFlow/ |
| term~{# == 'some_id'}" |
| ) |
| def onX(): NCResult = { .. } |
| </pre> |
| <p> |
| The predicate defined in the method <code>customFlow()</code> receives at the input a list with all |
| intents information, previously matched within the current session, and returns a boolean value. |
| </p> |
| </li> |
| <li> |
| <p><b>Intent Metadata</b></p> |
| <p> |
| Optional element. A additional dataset that can be used by term predicates presented in JSON format. |
| </p> |
| </li> |
| </ul> |
| </section> |
| <section> |
| <h2 class="section-title">Why Do We Need NLPCraft IDL? <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| All the logic for creating intents defined using NLPCraft IDL, can be written in any Java based language. |
| Why, then, is this new language needed at all? Even if its syntax is short, simple and straightforward, you |
| still have to spend some time studying it. |
| </p> |
| <p> |
| Below are some reasons for using NLPCraft IDL: |
| </p> |
| <ul> |
| <li> |
| The program terseness. A custom DSL code is always shorter than a general programming language code with |
| the same logic. For intents with non-trivial rules, this can be important. |
| </li> |
| <li> |
| If the NLPCraft IDL code is defined in a separate file, then editing the IDL program does not require the rebuild of the code of the model and its callbacks. |
| </li> |
| <li> |
| Separating the logic of writing callbacks and the logic of matching intents. Different people can |
| work with these tasks. Due to the deliberate limited IDL features, DSL is easier to learn by a non-programmer. |
| </li> |
| <li> |
| Currently, models can be created in any Java based language. Apache NLPCraft plans to expand the list of |
| supported languages in the near future. Using NLPCraft IDL will allow you to maintain one common language |
| for defining intents for different types of models within the same project. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <h2 class="section-title">Conclusion <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| I hope you were able to get a first impression of the capabilities of the NLPCraft IDL language and the |
| types of tasks that can be solved using it. <a href="/intent-matching.html">Here</a> you will find a detailed description of the |
| language and its capabilities. Additional examples of models created in Java, Kotlin, Groovy and Scala, |
| using NLPCraft IDL to define intents, are available in the <a href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">GitHub</a> project. |
| </p> |
| </section> |
| |
| |