| --- |
| active_crumb: Docs |
| layout: documentation |
| id: key_concepts |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html"> |
| <section id="overview"> |
| <h2 class="section-title">Key Concepts<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| NLPCraft is based on three main concepts: |
| </p> |
| <ul> |
| <li> |
| {% scaladoc NCModel NCModel %} is a user-configured object responsible for input interpretation. |
| </li> |
| <li> |
| {% scaladoc NCPipeline NCPipeline %} is a part of the model that defines |
| specifics of the step-by-step user input processing. |
| </li> |
| <li> |
| {% scaladoc NCModelClient NCModelClient %} is responsible for submitting user input to be |
| processed by the model. |
| </li> |
| </ul> |
| |
| <p>Here's the typical code structure when working with NLPCraft:</p> |
| |
| <pre class="brush: scala, highlight: []"> |
| // Initialize data model including its pipeline. |
| val mdl = new CustomNlpModel() |
| |
| // Creates client instance for given model. |
| val cli = new NCModelClient(mdl) |
| |
| // Sends text request to model by user ID "user01". |
| val result = client.ask("Some user command", "user01") |
| </pre> |
| </section> |
| |
| <section id="terminology"> |
| <h2 class="section-title">Main Types<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Here's the list of the main NLPCraft types: |
| </p> |
| |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Type</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><b>{% scaladoc NCModel NCModel %}</b></td> |
| <td> |
| <code>Model</code> is the main component in NLPCraft. User-define data model contains its {% scaladoc NCModelConfig NCModelConfig %}, |
| input processing {% scaladoc NCPipeline NCPipeline %} and life-cycle callbacks. |
| NLPCraft employs model-as-a-code approach where entire data model is an implementation of just |
| this interface. The instance of this interface is passed to {% scaladoc NCModelClient NCModelClient %} class. |
| Note that the model-as-a-code approach natively supports any software life cycle tools and frameworks |
| like various build tools, CI/SCM tools, IDEs, etc. You don't need any additional tools to manage some |
| aspects of your data models - your entire model and all of its components are part of your project's source code. |
| Note that in most cases, one would use a convenient {% scaladoc NCModelAdapter NCModelAdapter %} adapter to implement this interface. |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCToken NCToken %}</b></td> |
| <td> |
| <p> |
| <code>Token</code> is simple string, part of user input, which is obtained by splitting user input |
| according to some rules. For example, the user input "<b>Where is it?</b>" contains four tokens: |
| "<code>Where</code>", "<code>is</code>", "<code>it</code>", "<code>?</code>". |
| Usually <code>tokens</code> are words and punctuation symbols which also contain additional |
| information like point of speech tags, relative position in the overall input text, stopword flag, |
| stem and lemma forms, etc. List of parsed <code>tokens</code> serves as an input for parsing <code>entities</code>. |
| </p> |
| <p> |
| Tokens are produced by {% scaladoc NCTokenParser %} that is part of the processing |
| <a href="/pipeline-components.html">pipeline</a>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCEntity NCEntity %}</b></td> |
| <td> |
| <p> |
| <code>Entity</code> typically represents a real-world object, such as a person, location, organization, |
| or product that can often be denoted with a proper name. It can be abstract or have a physical existence. |
| Each <code>entity</code> consists of zero or more <code>tokens</code> and therefore is represented by zero |
| or more substrings from the original input text. Note that entities may have only a very loose mapping back |
| to the original text as entities represent a higher-level abstractions compared to tokens. Combination of |
| entities form one or more parsing <code>variants</code>. |
| </p> |
| <p> |
| Entities are produced by {% scaladoc NCEntityParser %} that is part of the processing |
| <a href="/pipeline-components.html">pipeline</a>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCVariant NCVariant %}</b></td> |
| <td> |
| <code>Variant</code> is a unique set of <code>entities</code>. In many cases, a <code>token</code> or a group |
| of <code>tokens</code> can be recognized as more than one <code>entity</code> - resulting in multiple possible |
| interpretations of the original sequence of tokens. Each such interpretation is defined as a parsing <code>variant</code>. |
| For example, user input <b>"Look at this crane."</b> can be interpreted as two <code>variants</code>, |
| one of them containing <code>entity</code> <b>BIRD<sub>[crane]</sub></b> and another containing <code>entity</code> <b>MACHINE<sub>[crane]</sub></b>. |
| Set of <code>variants</code> ultimately serves as an input to <a href="intent-matching.html">intent matching</a>. |
| </td> |
| </tr> |
| <tr> |
| <td><b>{% scaladoc NCPipeline NCPipeline %}</b></td> |
| <td> |
| <code>Pipeline</code> is the main property of the model. A pipeline consists of an ordered sequence |
| of <a href="/pipeline-components.html">pipeline components</a>. User input starts at the first component of the |
| pipeline as a simple text and exits the end of the pipeline as a one or more filtered and validated parsing <code>variants</code>. |
| The output of the pipeline is further passed as an input to <a href="intent-matching.html">intent matching</a>. |
| </td> |
| </tr> |
| <tr> |
| <td><b><a target="scaladoc" href="/apis/latest/">@NCIntent</a></b></td> |
| <td> |
| <a target="scaladoc" href="/apis/latest/">@NCIntent</a> annotation binds a declarative intent to its |
| callback method on the model. The intent generally refers to the goal that the end-user had in mind when speaking |
| or typing the input utterance. The intent has a declarative part or a template written in <a href="/intent-matching.html#idl">IDL - Intent Definition Language</a> |
| that strictly defines a particular form the user input. |
| Intent is also bound to a callback method that will be executed |
| when that intent, i.e. its template, is detected as the best match for a given input. |
| </td> |
| </tr> |
| |
| </tbody> |
| </table> |
| <p> |
| Here's the illustration on how a user input text transforms into a set of parsing variants: |
| </p> |
| <figure> |
| <img alt="named entities" class="img-fluid" src="/images/text-tokens-entities2.png"> |
| <figcaption><b>Fig 1.</b> Text -> Tokens -> Entities -> Parsing Variants.</figcaption> |
| </figure> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Key Concepts</a></li> |
| <li><a href="#terminology">Main Types</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |