key-concepts.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Docs
 layout: documentation
 id: key_concepts
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
     <section id="overview">
         <h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             NLPCraft is based on three main concepts:
         </p>
         <ul>
             <li>
                 {% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation.
             </li>
             <li>
                 {% scaladoc NCPipeline NCPipeline %} is a part of the model that defines
                 specifics of the step-by-step user input processing.
             </li>
             <li>
                 {% scaladoc NCModelClient NCModelClient %} is responsible for submitting user input to be
                 processed by the model.
             </li>
         </ul>

         <p>Here's the typical code structure when working with NLPCraft:</p>

         <pre class="brush: scala, highlight: []">
               // Initialize data model including its pipeline.
               val mdl = new CustomNlpModel()

               // Creates client instance for given model.
               val cli = new NCModelClient(mdl)

               // Sends text request to model by user ID "user01".
               val result = client.ask("Some user command", "user01")
         </pre>
     </section>

     <section id="terminology">
         <h2 class="section-title">Main Types<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             Here's the list of the main NLPCraft types:
         </p>

         <table class="gradient-table">
             <thead>
             <tr>
                 <th>Type</th>
                 <th>Description</th>
             </tr>
             </thead>
             <tbody>
             <tr>
                 <td><b>{% scaladoc NCModel NCModel %}</b></td>
                 <td>
                     <code>Model</code> is the main component in NLPCraft. User-define data model contains its {% scaladoc NCModelConfig NCModelConfig %},
                     input processing {% scaladoc NCPipeline NCPipeline %} and life-cycle callbacks.
                     NLPCraft employs model-as-a-code approach where entire data model is an implementation of just
                     this interface. The instance of this interface is passed to {% scaladoc NCModelClient NCModelClient %} class.
                     Note that the model-as-a-code approach natively supports any software life cycle tools and frameworks
                     like various build tools, CI/SCM tools, IDEs, etc. You don't need any additional tools to manage some
                     aspects of your data models - your entire model and all of its components are part of your project's source code.
                     Note that in most cases, one would use a convenient {% scaladoc NCModelAdapter NCModelAdapter %} adapter to implement this interface.
                 </td>
             </tr>
             <tr>
                 <td><b>{% scaladoc NCToken NCToken %}</b></td>
                 <td>
                     <p>
                         <code>Token</code> is simple string, part of user input, which is obtained by splitting user input
                         according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens:
                         "<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
                         Usually <code>tokens</code> are words and punctuation symbols which also contain additional
                         information like point of speech tags, relative position in the overall input text, stopword flag,
                         stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>.
                     </p>
                     <p>
                         Tokens are produced by {% scaladoc NCTokenParser %} that is part of the processing
                         pipeline.
                     </p>
                 </td>
             </tr>
             <tr>
                 <td><b>{% scaladoc NCEntity NCEntity %}</b></td>
                 <td>
                     <p>
                         <code>Entity</code> typically represents a real-world object, such as a person, location, organization,
                         or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
                         Each <code>entity</code> consists of zero or more <code>tokens</code> and therefore is represented by zero
                         or more substrings from the original input text. Note that entities may have only a very loose mapping back
                         to the original text as entities represent a higher-level abstractions compared to tokens. Combination of
                         entities form one or more parsing <code>variants</code>.
                     </p>
                     <p>
                         Entities are produced by {% scaladoc NCEntityParser %} that is part of the processing
                         pipeline.
                     </p>
                 </td>
             </tr>
             <tr>
                 <td><b>{% scaladoc NCVariant NCVariant %}</b></td>
                 <td>
                     <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
                     of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
                     interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
                     For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
                     one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
                     Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>.
                 </td>
             </tr>
             <tr>
                 <td><b>{% scaladoc NCPipeline NCPipeline %}</b></td>
                 <td>
                     <code>Pipeline</code> is the main property of the model. A pipeline consists of an ordered sequence
                     of pipeline components. User input starts at the first component of the
                     pipeline as a simple text and exits the end of the pipeline as a one or more filtered and validated parsing <code>variants</code>.
                     The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
                 </td>
             </tr>
             <tr>
                 <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
                 <td>
                     <a target="scaladoc" href="/apis/latest/">@NCIntent</a> annotation binds a declarative intent to its
                     callback method on the model. The intent generally refers to the goal that the end-user had in mind when speaking
                     or typing the input utterance. The intent has a declarative part or a template written in <a href="/intent-matching.html#idl">IDL - Intent Definition Language</a>
                     that strictly defines a particular form the user input.
                     Intent is also bound to a callback method that will be executed
                     when that intent, i.e. its template, is detected as the best match for a given input.
                 </td>
             </tr>

             </tbody>
         </table>
         <p>
             Here's the illustration on how a user input text transforms into a set of parsing variants:
         </p>
         <figure>
             <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
             <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
         </figure>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Key Concepts</a></li>
         <li><a href="#terminology">Main Types</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Docs
	layout: documentation
	id: key_concepts
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
	<section id="overview">
	<h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	NLPCraft is based on three main concepts:
	</p>
	<ul>
	<li>
	{% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation.
	</li>
	<li>
	{% scaladoc NCPipeline NCPipeline %} is a part of the model that defines
	specifics of the step-by-step user input processing.
	</li>
	<li>
	{% scaladoc NCModelClient NCModelClient %} is responsible for submitting user input to be
	processed by the model.
	</li>
	</ul>

	<p>Here's the typical code structure when working with NLPCraft:</p>

	<pre class="brush: scala, highlight: []">
	// Initialize data model including its pipeline.
	val mdl = new CustomNlpModel()

	// Creates client instance for given model.
	val cli = new NCModelClient(mdl)

	// Sends text request to model by user ID "user01".
	val result = client.ask("Some user command", "user01")
	</pre>
	</section>

	<section id="terminology">
	<h2 class="section-title">Main Types<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	Here's the list of the main NLPCraft types:
	</p>

	<table class="gradient-table">
	<thead>
	<tr>
	<th>Type</th>
	<th>Description</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td><b>{% scaladoc NCModel NCModel %}</b></td>
	<td>
	<code>Model</code> is the main component in NLPCraft. User-define data model contains its {% scaladoc NCModelConfig NCModelConfig %},
	input processing {% scaladoc NCPipeline NCPipeline %} and life-cycle callbacks.
	NLPCraft employs model-as-a-code approach where entire data model is an implementation of just
	this interface. The instance of this interface is passed to {% scaladoc NCModelClient NCModelClient %} class.
	Note that the model-as-a-code approach natively supports any software life cycle tools and frameworks
	like various build tools, CI/SCM tools, IDEs, etc. You don't need any additional tools to manage some
	aspects of your data models - your entire model and all of its components are part of your project's source code.
	Note that in most cases, one would use a convenient {% scaladoc NCModelAdapter NCModelAdapter %} adapter to implement this interface.
	</td>
	</tr>
	<tr>
	<td><b>{% scaladoc NCToken NCToken %}</b></td>
	<td>
	<p>
	<code>Token</code> is simple string, part of user input, which is obtained by splitting user input
	according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens:
	"<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>".
	Usually <code>tokens</code> are words and punctuation symbols which also contain additional
	information like point of speech tags, relative position in the overall input text, stopword flag,
	stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>.
	</p>
	<p>
	Tokens are produced by {% scaladoc NCTokenParser %} that is part of the processing
	pipeline.
	</p>
	</td>
	</tr>
	<tr>
	<td><b>{% scaladoc NCEntity NCEntity %}</b></td>
	<td>
	<p>
	<code>Entity</code> typically represents a real-world object, such as a person, location, organization,
	or product that can often be denoted with a proper name. It can be abstract or have a physical existence.
	Each <code>entity</code> consists of zero or more <code>tokens</code> and therefore is represented by zero
	or more substrings from the original input text. Note that entities may have only a very loose mapping back
	to the original text as entities represent a higher-level abstractions compared to tokens. Combination of
	entities form one or more parsing <code>variants</code>.
	</p>
	<p>
	Entities are produced by {% scaladoc NCEntityParser %} that is part of the processing
	pipeline.
	</p>
	</td>
	</tr>
	<tr>
	<td><b>{% scaladoc NCVariant NCVariant %}</b></td>
	<td>
	<code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group
	of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible
	interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>.
	For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>,
	one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>.
	Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>.
	</td>
	</tr>
	<tr>
	<td><b>{% scaladoc NCPipeline NCPipeline %}</b></td>
	<td>
	<code>Pipeline</code> is the main property of the model. A pipeline consists of an ordered sequence
	of pipeline components. User input starts at the first component of the
	pipeline as a simple text and exits the end of the pipeline as a one or more filtered and validated parsing <code>variants</code>.
	The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>.
	</td>
	</tr>
	<tr>
	<td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td>
	<td>
	<a target="scaladoc" href="/apis/latest/">@NCIntent</a> annotation binds a declarative intent to its
	callback method on the model. The intent generally refers to the goal that the end-user had in mind when speaking
	or typing the input utterance. The intent has a declarative part or a template written in <a href="/intent-matching.html#idl">IDL - Intent Definition Language</a>
	that strictly defines a particular form the user input.
	Intent is also bound to a callback method that will be executed
	when that intent, i.e. its template, is detected as the best match for a given input.
	</td>
	</tr>

	</tbody>
	</table>
	<p>
	Here's the illustration on how a user input text transforms into a set of parsing variants:
	</p>
	<figure>
	<img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png">
	<figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption>
	</figure>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#overview">Key Concepts</a></li>
	<li><a href="#terminology">Main Types</a></li>
	{% include quick-links.html %}
	</ul>
	</div>