basic-concepts.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Basic Concepts
 layout: documentation
 id: basic_concepts
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div id="basic-concepts" class="col-md-8 second-column">
     <section id="overview">
         <h2 class="section-title">Basic Concepts</h2>
         <p>
             Below we’ll cover some of the key concepts that are important for NLPCraft:
         </p>
         <ul>
             <li><a href="#model">Data Model</a></li>
             <li><a href="#ne">Named Entities</a></li>
             <li><a href="#intent">Intent Matching</a></li>
             <li><a href="#stm">Conversation <span class="amp">&amp;</span> STM</a></li>
         </ul>
     </section>
     <section id="model">
         <h3 class="section-sub-title">Data Model</h3>
         <p>
             Data model is a central concept in NLPCraft. It defines natural language interface to your public or
             private data sources like on-premise database or a cloud SaaS application.
             NLPCraft employs <em>model-as-a-code</em> approach where entire data model is
             an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
             interface which can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy.
         </p>
         <p>
             A data model defines:
         </p>
         <ul>
             <li>Set of model <a href="data-model.html">elements</a> (a.k.a. <em>named entities</em>) to be detected in the user input.</li>
             <li>How to query a particular data source based on detected named entities.</li>
             <li>Common model configuration and <a href="data-model.html">life-cycle</a> callbacks.</li>
         </ul>
         <p>
             Note that model-as-a-code approach allows you to use any software lifecycle tools and
             frameworks like various build tools, CI/SCM tools, IDEs, etc. to develop and maintain your data model.
             You don't have to use additional web-based tools to manage some aspects of your
             data models - your entire model and all of its components are part of your project source code.
         </p>
         <p>
             Read more about data models <a href="data-model.html">here</a>.
         </p>
     </section>
     <section id="ne">
         <h3 class="section-sub-title">Named Entities</h3>
         <p>
             Named entity, also known as a model element or a token, is main a component defined by the NLPCraft data model. A named
             entity is one or more individual words that have a consistent semantic meaning and typically denote a
             real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
             object can be abstract or have a physical existence.
         </p>
         <p>
             For example, in the following sentence:
         </p>
         <p>
             <i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for 12pm today in San Francisco.</code>
         </p>
         <p>
             the following named entities can be detected:
         </p>
         <table class="gradient-table">
             <thead>
             <tr>
                 <th>Words</th>
                 <th>Type</th>
                 <th>Normalized Value</th>
             </tr>
             </thead>
             <tbody>
             <tr>
                 <td><code>Meeting</code></td>
                 <td>CUSTOM_OBJ</td>
                 <td>meeting</td>
             </tr>
             <tr>
                 <td><code>set</code></td>
                 <td>CUSTOM_ACT</td>
                 <td>set</td>
             </tr>
             <tr>
                 <td><code>12pm today</code></td>
                 <td>DATE_TIME</td>
                 <td>12:00 September 1, 2019 GMT</td>
             </tr>
             <tr>
                 <td><code>San Francisco</code></td>
                 <td>GEO_CITY</td>
                 <td>San Francisco, CA USA</td>
             </tr>
             </tbody>
         </table>
         <p>
             In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many
             different notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>,
             <code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a valid normalized form.
         </p>
         <p>
             The process of detecting named entities is called Named Entity Recognition (NER). There are many
             different ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using
             statistical techniques like neural networks with large corpus of predefined data. NLPCraft allows you define
             named entities through powerful DSL and also supports named entities that can be composed from other named entities
             including named entities from external projects such OpenNLP, spaCy or Stanford CoreNLP.
         </p>
         <p>
             Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic
             abstractions like geographical location or time when you are trying to understand the meaning of the sentence.
             One of the main goals of named entities is to act as an input ingredients for intent matching.
         </p>
         <p>
             Read more in-depth about named entities <a href="data-model.html">here</a>.
         </p>
     </section>
     <section id="intent">
         <h3 class="section-sub-title">Intent Matching</h3>
         <p>
             You can think of intent matching as regular expression matching where instead of characters you deal with detected named entities.
             Intent defines a pattern in terms of detected named entities (or tokens) and a callback to call when submitted sentence
             matches that pattern.
         </p>
         <p>
             Intents can also match on the <em>dialog flow</em> additionally to the matching on the current user sentence.
             Dialog flow matching means matching an intent based on what intents were matched previously for the same user
             and data model, i.e. the flow of the dialog. Note that you should not confuse dialog flow intent matching with
             conversational STM that is used to fill in missing tokens from memory.
         </p>
         <div class="bq success">
             <p>
                 You can think of NLPCraft data model as a mechanism to define named entities and intents that use
                 these named entities to pattern match the user input.
             </p>
         </div>
         <p>
             Learn more details about intent matching <a href="intent-matching.html">here</a>.
         </p>
     </section>
     <section id="stm">
         <h3 class="section-sub-title">Conversation <span class="amp">&amp;</span> STM</h3>
         <p>
             NLPCraft provides automatic conversation management right out of the box.
             Conversation management is based on the idea of short-term-memory (STM). STM is automatically
             maintained by NLPCraft per each user and data model. Essentially, NLPCraft "remembers"
             the context of the conversation and can supply the currently missing elements from its memory (i.e. from STM).
             STM implementation is also fully integrated with intent matching.
         </p>
         <p>
             Maintaining conversation state is necessary for effective context resolution, so that users
             could ask, for example, the following sequence of questions using example weather model:
         </p>
         <dl class="stm-example">
             <dd><i class="fa fa-fw fa-angle-right"></i>What’s the weather in London today?</dd>
             <dt>
                 <p>
                     User gets the current London’s weather.
                     STM is empty at this moment so NLPCraft expects to get all necessary information from
                     the user sentence. Meaningful parts of the sentence get stored in STM.
                 </p>
                 <div class="stm-state">
                     <div class="stm">
                         <label>STM Before:</label>
                         <span>&nbsp;</span>
                     </div>
                     <div class="stm">
                         <label>STM After:</label>
                         <span>weather</span>
                         <span>London</span>
                         <span>today</span>
                     </div>
                 </div>
             </dt>
             <dd><i class="fa fa-fw fa-angle-right"></i>And what about Berlin?</dd>
             <dt>
                 <p>
                     User gets the current Berlin’s weather.
                     The only useful data in the user sentence is name of the city <code>Berlin</code>. But since
                     NLPCraft now has data from the previous question in its STM it can safely deduce that we
                     are asking about <code>weather</code> for <code>today</code>.
                     <code>Berlin</code> overrides <code>London</code> in STM.
                 </p>
                 <div class="stm-state">
                     <div class="stm">
                         <label>STM Before:</label>
                         <span>weather</span>
                         <span>London</span>
                         <span>today</span>
                     </div>
                     <div class="stm">
                         <label>STM After:</label>
                         <span>weather</span>
                         <span><b>Berlin</b></span>
                         <span>today</span>
                     </div>
                 </div>
             </dt>
             <dd><i class="fa fa-fw fa-angle-right"></i>Next week forecast?</dd>
             <dt>
                 <p>
                     User gets the next week forecast for Berlin.
                     Again, the only useful data in the user sentence is <code>next week</code> and <code>forecast</code>.
                     STM supplies <code>Berlin</code>. <code>Next week</code> override <code>today</code>, and
                     <code>forecast</code> override <code>weather</code> in STM.
                 </p>
                 <div class="stm-state">
                     <div class="stm">
                         <label>STM Before:</label>
                         <span>weather</span>
                         <span>Berlin</span>
                         <span>today</span>
                     </div>
                     <div class="stm">
                         <label>STM After:</label>
                         <span><b>forecast</b></span>
                         <span>Berlin</span>
                         <span><b>Next week</b></span>
                     </div>
                 </div>
             </dt>
         </dl>
         <p>
             Note that STM is maintained per user and per data model.
             Conversation management implementation is also smart enough to clear STM after certain
             period of time, i.e. it “forgets” the conversational context after few minutes of inactivity.
             Note also that conversational context can also be cleared explicitly
             via <a href="https://github.com/apache/incubator-nlpcraft/blob/master/openapi/nlpcraft_swagger.yml" target="github">REST API</a>.
         </p>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#model">Data Model</a></li>
         <li><a href="#ne">Named Entities</a></li>
         <li><a href="#intent">Intent Matching</a></li>
         <li><a href="#stm">Conversation <span class="amp">&amp;</span> STM</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Basic Concepts
	layout: documentation
	id: basic_concepts
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div id="basic-concepts" class="col-md-8 second-column">
	<section id="overview">
	<h2 class="section-title">Basic Concepts</h2>
	<p>
	Below we’ll cover some of the key concepts that are important for NLPCraft:
	</p>
	<ul>
	<li><a href="#model">Data Model</a></li>
	<li><a href="#ne">Named Entities</a></li>
	<li><a href="#intent">Intent Matching</a></li>
	<li><a href="#stm">Conversation <span class="amp">&</span> STM</a></li>
	</ul>
	</section>
	<section id="model">
	<h3 class="section-sub-title">Data Model</h3>
	<p>
	Data model is a central concept in NLPCraft. It defines natural language interface to your public or
	private data sources like on-premise database or a cloud SaaS application.
	NLPCraft employs <em>model-as-a-code</em> approach where entire data model is
	an implementation of <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
	interface which can be developed using any JVM programming language like Java, Scala, Kotlin, or Groovy.
	</p>
	<p>
	A data model defines:
	</p>
	<ul>
	<li>Set of model <a href="data-model.html">elements</a> (a.k.a. <em>named entities</em>) to be detected in the user input.</li>
	<li>How to query a particular data source based on detected named entities.</li>
	<li>Common model configuration and <a href="data-model.html">life-cycle</a> callbacks.</li>
	</ul>
	<p>
	Note that model-as-a-code approach allows you to use any software lifecycle tools and
	frameworks like various build tools, CI/SCM tools, IDEs, etc. to develop and maintain your data model.
	You don't have to use additional web-based tools to manage some aspects of your
	data models - your entire model and all of its components are part of your project source code.
	</p>
	<p>
	Read more about data models <a href="data-model.html">here</a>.
	</p>
	</section>
	<section id="ne">
	<h3 class="section-sub-title">Named Entities</h3>
	<p>
	Named entity, also known as a model element or a token, is main a component defined by the NLPCraft data model. A named
	entity is one or more individual words that have a consistent semantic meaning and typically denote a
	real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
	object can be abstract or have a physical existence.
	</p>
	<p>
	For example, in the following sentence:
	</p>
	<p>
	<i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for 12pm today in San Francisco.</code>
	</p>
	<p>
	the following named entities can be detected:
	</p>
	<table class="gradient-table">
	<thead>
	<tr>
	<th>Words</th>
	<th>Type</th>
	<th>Normalized Value</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td><code>Meeting</code></td>
	<td>CUSTOM_OBJ</td>
	<td>meeting</td>
	</tr>
	<tr>
	<td><code>set</code></td>
	<td>CUSTOM_ACT</td>
	<td>set</td>
	</tr>
	<tr>
	<td><code>12pm today</code></td>
	<td>DATE_TIME</td>
	<td>12:00 September 1, 2019 GMT</td>
	</tr>
	<tr>
	<td><code>San Francisco</code></td>
	<td>GEO_CITY</td>
	<td>San Francisco, CA USA</td>
	</tr>
	</tbody>
	</table>
	<p>
	In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many
	different notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>,
	<code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a valid normalized form.
	</p>
	<p>
	The process of detecting named entities is called Named Entity Recognition (NER). There are many
	different ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using
	statistical techniques like neural networks with large corpus of predefined data. NLPCraft allows you define
	named entities through powerful DSL and also supports named entities that can be composed from other named entities
	including named entities from external projects such OpenNLP, spaCy or Stanford CoreNLP.
	</p>
	<p>
	Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic
	abstractions like geographical location or time when you are trying to understand the meaning of the sentence.
	One of the main goals of named entities is to act as an input ingredients for intent matching.
	</p>
	<p>
	Read more in-depth about named entities <a href="data-model.html">here</a>.
	</p>
	</section>
	<section id="intent">
	<h3 class="section-sub-title">Intent Matching</h3>
	<p>
	You can think of intent matching as regular expression matching where instead of characters you deal with detected named entities.
	Intent defines a pattern in terms of detected named entities (or tokens) and a callback to call when submitted sentence
	matches that pattern.
	</p>
	<p>
	Intents can also match on the <em>dialog flow</em> additionally to the matching on the current user sentence.
	Dialog flow matching means matching an intent based on what intents were matched previously for the same user
	and data model, i.e. the flow of the dialog. Note that you should not confuse dialog flow intent matching with
	conversational STM that is used to fill in missing tokens from memory.
	</p>
	<div class="bq success">
	<p>
	You can think of NLPCraft data model as a mechanism to define named entities and intents that use
	these named entities to pattern match the user input.
	</p>
	</div>
	<p>
	Learn more details about intent matching <a href="intent-matching.html">here</a>.
	</p>
	</section>
	<section id="stm">
	<h3 class="section-sub-title">Conversation <span class="amp">&</span> STM</h3>
	<p>
	NLPCraft provides automatic conversation management right out of the box.
	Conversation management is based on the idea of short-term-memory (STM). STM is automatically
	maintained by NLPCraft per each user and data model. Essentially, NLPCraft "remembers"
	the context of the conversation and can supply the currently missing elements from its memory (i.e. from STM).
	STM implementation is also fully integrated with intent matching.
	</p>
	<p>
	Maintaining conversation state is necessary for effective context resolution, so that users
	could ask, for example, the following sequence of questions using example weather model:
	</p>
	<dl class="stm-example">
	<dd><i class="fa fa-fw fa-angle-right"></i>What’s the weather in London today?</dd>
	<dt>
	<p>
	User gets the current London’s weather.
	STM is empty at this moment so NLPCraft expects to get all necessary information from
	the user sentence. Meaningful parts of the sentence get stored in STM.
	</p>
	<div class="stm-state">
	<div class="stm">
	<label>STM Before:</label>
	<span> </span>
	</div>
	<div class="stm">
	<label>STM After:</label>
	<span>weather</span>
	<span>London</span>
	<span>today</span>
	</div>
	</div>
	</dt>
	<dd><i class="fa fa-fw fa-angle-right"></i>And what about Berlin?</dd>
	<dt>
	<p>
	User gets the current Berlin’s weather.
	The only useful data in the user sentence is name of the city <code>Berlin</code>. But since
	NLPCraft now has data from the previous question in its STM it can safely deduce that we
	are asking about <code>weather</code> for <code>today</code>.
	<code>Berlin</code> overrides <code>London</code> in STM.
	</p>
	<div class="stm-state">
	<div class="stm">
	<label>STM Before:</label>
	<span>weather</span>
	<span>London</span>
	<span>today</span>
	</div>
	<div class="stm">
	<label>STM After:</label>
	<span>weather</span>
	<span><b>Berlin</b></span>
	<span>today</span>
	</div>
	</div>
	</dt>
	<dd><i class="fa fa-fw fa-angle-right"></i>Next week forecast?</dd>
	<dt>
	<p>
	User gets the next week forecast for Berlin.
	Again, the only useful data in the user sentence is <code>next week</code> and <code>forecast</code>.
	STM supplies <code>Berlin</code>. <code>Next week</code> override <code>today</code>, and
	<code>forecast</code> override <code>weather</code> in STM.
	</p>
	<div class="stm-state">
	<div class="stm">
	<label>STM Before:</label>
	<span>weather</span>
	<span>Berlin</span>
	<span>today</span>
	</div>
	<div class="stm">
	<label>STM After:</label>
	<span><b>forecast</b></span>
	<span>Berlin</span>
	<span><b>Next week</b></span>
	</div>
	</div>
	</dt>
	</dl>
	<p>
	Note that STM is maintained per user and per data model.
	Conversation management implementation is also smart enough to clear STM after certain
	period of time, i.e. it “forgets” the conversational context after few minutes of inactivity.
	Note also that conversational context can also be cleared explicitly
	via <a href="https://github.com/apache/incubator-nlpcraft/blob/master/openapi/nlpcraft_swagger.yml" target="github">REST API</a>.
	</p>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#model">Data Model</a></li>
	<li><a href="#ne">Named Entities</a></li>
	<li><a href="#intent">Intent Matching</a></li>
	<li><a href="#stm">Conversation <span class="amp">&</span> STM</a></li>
	{% include quick-links.html %}
	</ul>
	</div>