blob: b2f0b9500c7a7a437d73acf62c5214f4ed98ca3d [file] [log] [blame]
---
active_crumb: Docs
layout: documentation
id: built-in-entity-parser
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
{% scaladoc NCEntityParser NCEntityParser %} trait is part of <a href="api-components.html#model-pipeline">Model Pipeline</a>.
Its implementation should allow to find user defined named entities
based on prepared tokens as input.
</p>
<p>
There are provided following built-in parsers:
</p>
<ul>
<li>
<a href="#parser-opennlp">Wrapper</a> for <a href="https://opennlp.apache.org/">Apache OpenNLP</a> named entities finder which
prepared models support English and some other languages.
</li>
<li>
<a href="#parser-stanford">Wrapper</a> for <a href="https://nlp.stanford.edu/">Stanford NLP</a> named entities finder which
prepared models support English and some other languages.
</li>
<li>
NLP data <a href="#parser-nlp">wrapper</a> implementation. It is not depends on language.
</li>
<li>
Semantic <a href="#parser-semantic">implementation</a> for English language.
</li>
</ul>
</section>
<section id="parser-opennlp">
<h2 class="section-title">OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
{% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} is wrapper on <a href="https://opennlp.apache.org/">Apache OpenNLP</a> NER components.
Look at the supported NER finders models <a href="https://opennlp.sourceforge.net/models-1.5/">here</a>.
For example for English language are accessible: <code>Location</code>, <code>Money</code>,
<code>Person</code>, <code>Organization</code>, <code>Date</code>, <code>Time</code> and <code>Percentage</code>.
There are also accessible models for other languages.
</p>
</section>
<section id="parser-stanford">
<h2 class="section-title">Stanford NLP OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<code>NCStanfordNLPEntityParser</code> is wrapper on <a href="https://nlp.stanford.edu/">Stanford NLP</a> NER components.
Look at the supported NER finders models <a href="https://nlp.stanford.edu/software/CRF-NER.shtml">here</a>.
For example for English language are accessible: <code>Location</code>, <code>Money</code>,
<code>Person</code>, <code>Organization</code>, <code>Date</code>, <code>Time</code> and <code>Percent</code>.
There are also accessible models for other languages.
</p>
</section>
<section id="parser-nlp">
<h2 class="section-title">NLP Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
{% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} converts NLP tokens into entities with four mandatory properties:
<code>nlp:token:text</code>, <code>nlp:token:index</code>, <code>nlp:token:startCharIndex</code> and
<code>nlp:token:endCharIndex</code>.
However, if any other {% scaladoc NCTokenEnricher NCTokenEnricher %} components
are registered in the {% scaladoc NCPipeline NCPipeline %}
and they add other properties into the tokens,
these properties also will be copied with names prefixed with <code>nlp:token:</code>.
It is language independent component.
Note that converted tokens set can be restricted by predicate.
</p>
</section>
<section id="parser-semantic">
<h2 class="section-title">Semantic Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Semantic entity parser
{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
is synonyms based implementation of {% scaladoc NCEntityParser NCEntityParser %}.
This parser provides simple but very powerful way to find domain specific data in the input text.
It defines list of {% scaladoc nlp/parsers/NCSemanticElement NCSemanticElement %}
which are represent <a href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named entities</a>.
We will name this list as <code>Semantic Model</code>.
</p>
<p>
Let's talk a little bit more about <a href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named entities</a>.
</p>
<section id="parser-semantic-ne">
<h3 class="sub-section-title">Named Entities</h3>
<p>
Named entity, also known as a semantic element or a token, is one of the main a components defined by the NLPCraft data model.
A named entity is one or more individual words that have a consistent semantic meaning and typically denote a
real-world object, such as persons, locations, number, date and time, organizations, products, etc. Such
object can be abstract or have a physical existence.
</p>
<p>
For example, in the following sentence: TODO: PIC
</p>
<figure>
<img alt="named entities" class="img-fluid" src="/images/named-entities.png">
<figcaption><b>Fig 2.</b> Named Entities</figcaption>
</figure>
<p>
the following named entities can be detected:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Words</th>
<th>Type</th>
<th>Normalized Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>Top 20</b></td>
<td><code>user:element:1</code></td>
<td>top 20</td>
</tr>
<tr>
<td><b>best pages</b></td>
<td><code>user:element:2</code></td>
<td>best pages</td>
</tr>
<tr>
<td><b>California USA</b></td>
<td><code>stanford:city</code></td>
<td>USA, California</td>
</tr>
<tr>
<td><b>last 3 months</b></td>
<td><code>stanford:date</code></td>
<td>1/1/2021 - 4/1/2021</td>
</tr>
</tbody>
</table>
<p>
In most cases named entities will have associated <em>normalized value</em>. It is especially important for named entities that have many
notational forms such as time and date, currency, geographical locations, etc. For example, <code>New York</code>,
<code>New York City</code> and <code>NYC</code> all refer to the same "New York City, NY USA" location which is a standard normalized form.
</p>
<p>
The process of detecting named entities is called Named Entity Recognition (NER). There are many ways of how a certain named entity can be detected: through list of synonyms, by name, rule-based or by using
statistical techniques like neural networks with large corpus of predefined data. NLPCraft natively supports synonym-based
named entities definition as well as the ability to compose new named entities through powerful <a href="/intent-matching.html">Intent Definition Language</a> (IDL)
combining other named entities including named entities from
such OpenNLP, or Stanford CoreNLP, look at the <a href="built-in-entity-parser.html">Built-in Entity Parser</a> chapter.
</p>
<p>
Named entities allow you to abstract from basic linguistic forms like nouns and verbs to deal with the higher level semantic
abstractions like geographical location or time when you are trying to understand the meaning of the sentence.
One of the main goals of named entities is to act as an input ingredients for <a href="/intent-matching.html">intent matching</a>.
</p>
<div class="bq info">
<p>
<b>😀 User Input → Named Entities → Parsing Variants → Intent Matcher → Winning Intent 🚀</b>
</p>
<p>
User input is parsed into the list of named entities. That list is then further transformed into one or more
parsing variants where each variant represents a particular order and combination of detected named entities.
Finally, the list of variants act as an input to intent matching where each variant is matched against every intent
in the process of detecting the best matching intent for the original user input.
</p>
</div>
</section>
<section id="parser-semantic-elements">
<h3 class="sub-section-title">Elements</h3>
<p>
{% scaladoc nlp/parsers/NCSemanticElement NCSemanticElement %} represents
NER element for its detection un the user input.
<p>
<div class="bq info">
<p>
<b>Semantic Element <span class="amp">&</span> Named Entity <span class="amp">&</span> Token</b>
</p>
<p>
Terms 'semantic element', 'named entity' and 'token' are used throughout this documentation relatively interchangeably:
</p>
<dl>
<dt>Semantic Element</dt>
<dd>
Denotes a named entity <em>declared</em> in NLPCraft model.
</dd>
<dt>Token</dt>
<dd>
Denotes a semantic element that was <em>detected</em> by NLPCraft in the user input.
</dd>
<dt>Named Entity</dt>
<dd>
Denotes a classic term, i.e. one or more individual words that have a
consistent semantic meaning and typically define a real-world object.
</dd>
</dl>
</div>
<p>
Each {% scaladoc nlp/parsers/NCSemanticElement NCSemanticElement %}
is presented by <code>id</code>, <code>groups</code>, <code>synonyms</code>, <code>values</code> and <code>properties</code>.
<p>
<span id="synonyms" class="section-sub-title">Synonyms <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
NLPCraft uses fully deterministic named entity recognition and is not based on statistical approaches that
would require pre-existing marked up data sets and extensive training. For each semantic element you can either provide a
set of synonyms to match on or specify a piece of code that would be responsible for detecting that named
entity (discussed below). A synonym can have one or more individual words. Note that element's ID is its
implicit synonym so that even if no additional synonyms are defined at least one synonym always exists.
Note also that synonym matching is performed on <em>normalized</em> and <em>stemmatized</em> forms of both
a synonym and user input on first phase and if first attempt is not successful, it tries to match <em>stemmatized</em> forms
of synonyms with <em>stemmatized</em> forms of user input which were <em>lemmatized</em> preliminarily.
This approach allows to provide more accurate matching and doesn't force users to prepare synonyms in initial words form. .
</p>
<p>
Here's an example of a simple semantic element definition in JSON:
</p>
<pre class="brush: js, highlight: [6,7,8,9,10,11,12]">
...
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"truck",
"light duty truck"
"heavy duty truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<p>
While adding multi-word synonyms looks somewhat
trivial - in real models, the naive approach can lead to thousands and even tens of thousands of
possible synonyms due to words, grammar, and linguistic permutations - which quickly becomes untenable if
performed manually.
</p>
<p>
NLPCraft provides an effective tool for a compact synonyms representation. Instead of listing all possible
multi-word synonyms one by one you can use combination of following techniques:
</p>
<ul>
<li><a href="#macros">Macros</a></li>
<li><a href="#regex">Regular expressions</a></li>
<li><a href="#option-groups">Option Groups</a></li>
</ul>
<p>
Each whitespace separated string in the synonym can be either a regular word (like in the above transportation example
where it will be matched on using its normalized and stemmatized form) or one of the above expression.
</p>
<p>
Note that this synonyms definition is also used in the following
{% scaladoc nlp/parsers/NCSemanticElement NCSemanticElement %} methods:
</p>
<ul>
<li><code>getSynonyms()</code> - gets synonyms to match on.</li>
<li><code>getValues()</code> - get values to match on (see <a href="#values">below</a>).</li>
</ul>
<span id="values" class="section-sub-title">Element Values <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Semantic element can have an optional set of special synonyms called <em>values</em> or "proper nouns" for this element.
Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value,
and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an
implicit synonym even when no additional synonyms added for that value.
</p>
<p>
When a semantic element is recognized it is made available to the model's matching logic as an instance of
the {% scaladoc NCToken NCToken %} interface.
This interface has a method
{% scaladoc NCToken getValue() %} which
returns the name of the value, if any, by which
that semantic element was recognized. That value name can be further used in intent matching.
</p>
<p>
To understand the importance of the values consider the following changes to our transportation
example model:
</p>
<pre class="brush: js, highlight: [19,20,21,22,23,24,25,26,27,28,29,30]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{light duty|heavy duty|half ton|1/2 ton|3/4 ton|one ton|super duty}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|_} {pickup|_} truck"
"sedan",
"coupe"
],
"values": [
{
"value": "mercedes",
"synonyms": ["mercedes-ben{z|s}", "mb", "ben{z|s}"]
},
{
"value": "bmw",
"synonyms": ["{bimmer|bimer|beemer}", "bayerische motoren werke"]
}
{
"value": "chevrolet",
"synonyms": ["chevy"]
}
]
}
]
...
</pre>
<p>
With that setup <code>transport.vehicle</code> element will be recognized by any of the following input string:
</p>
<ul>
<li><code>car</code></li>
<li><code>benz</code> (with value <code>mercedes</code>)</li>
<li><code>3/4 ton pickup truck</code></li>
<li><code>light duty truck</code></li>
<li><code>chevy</code> (with value <code>chevrolet</code>)</li>
<li><code>bimmer</code> (with value <code>bmw</code>)</li>
<li><code>transport.vehicle</code></li>
</ul>
<span id="groups" class="section-sub-title">Element Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Each semantic element always belongs to one or more groups. Semantic element provides its groups via
{% scaladoc nlp/parsers/NCSemanticElement getGroups() %} method.
By default, if element group is not specified, the element ID will act as its default group ID.
Group membership is a quick and easy way to organise similar semantic elements together and use this
categorization in <a href="/intent-matching.html">IDL</a> intents.
</p>
<p>
Note that the proper grouping of the elements is also necessary for the correct operation of
Short-Term-Memory (STM) in the conversational context. Consider a
{% scaladoc NCToken NCToken %} that
represents a previously found semantic element that is stored in the conversation. Such token
will be overridden in the conversation by the more <b>recent token</b>
from the <b>same group</b> - a critical rule of maintaining the proper conversational context.
See
{% scaladoc NCConversation NCConversation %}
for mode details.
</p>
<span id="macros" class="section-sub-title">Macros<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Listing all possible multi-word synonyms for a given element can be a time-consuming task. Macros
together with option groups allow for significant simplification of this task.
Macros allow you to give a name to an often used set of words or option groups and reuse it without
repeating those words or option groups again and again. A model provides a list of macros via
{% scaladoc nlp/parsers/NCSemanticEntityParser macros %} method.
Each macro has a name in a form of <code>&lt;X&gt;</code> where <code>X</code>
is any string, and a string value. Note that macros can be nested (but not recursive), i.e. macro value can include
references to other macros. When macro name <code>X</code> is encountered in the synonym it gets recursively
replaced with its value.
</p>
<p>
Here's a code snippet of macro definitions using JSON definition:
</p>
<pre class="brush: js">
"macros": [
{
"name": "&lt;A&gt;",
"macro": "aaa"
},
{
"name": "&lt;B&gt;",
"macro": "&lt;A&gt; bbb"
},
{
"name": "&lt;C&gt;",
"macro": "&lt;A&gt; bbb {z|w}"
}
]
</pre>
<span id="option-groups" class="section-sub-title">Option Groups <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Option groups are similar to wildcard patterns that operates on a single word base. One line of
option group expands into one or more individual synonyms. Option groups is the key mechanism for shortened
synonyms notation. The following examples demonstrate how to use option groups.
</p>
<p>
Consider the following macros defined below (note that macros <code>&lt;B&gt;</code> and <code>&lt;C&gt;</code>
are nested):
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt;</code></td>
<td><code>aaa</code></td>
</tr>
<tr>
<td><code>&lt;B&gt;</code></td>
<td><code>&lt;A&gt; bbb</code></td>
</tr>
<tr>
<td><code>&lt;C&gt;</code></td>
<td><code>&lt;A&gt; bbb {z|w}</code></td>
</tr>
</tbody>
</table>
<p>
Then the following option group expansions will occur in these examples:
</p>
<table class="gradient-table">
<thead>
<tr>
<th>Synonym</th>
<th>Synonym Expansions</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&lt;A&gt; {b|_} c</code></td>
<td>
<code>"aaa b c"</code><br>
<code>"aaa c"</code>
</td>
</tr>
<tr>
<td><code>&lt;A&gt; {b|a}[1,2] c</code></td>
<td>
<code>"aaa b c"</code><br>
<code>"aaa b b c"</code><br>
<code>"aaa a c"</code><br>
<code>"aaa a a c"</code><br>
<code>"aaa c"</code>
</td>
</tr>
<tr>
<td>
<code>&lt;B&gt; {b|_} c</code><br>
or<br>
<code>&lt;B&gt; {b}[0,1] c</code>
</td>
<td>
<code>"aaa bbb b c"</code><br>
<code>"aaa bbb c"</code>
</td>
</tr>
<tr>
<td><code>{b|\{\_\}}</code></td>
<td>
<code>"b"</code><br>
<code>"b {_}"</code>
</td>
</tr>
<tr>
<td><code>a {b|_}. c</code></td>
<td>
<code>"a b. c"</code><br>
<code>"a . c"</code>
</td>
</tr>
<tr>
<td><code>a .{b, |_}. c</code></td>
<td>
<code>"a .b, . c"</code><br>
<code>"a .. c"</code>
</td>
</tr>
<tr>
<td><code>
{% raw %}a {{b|c}|_}.{% endraw %}</code></td>
<td>
<code>"a ."</code><br>
<code>"a b."</code><br>
<code>"a c."</code>
</td>
</tr>
<tr>
<td><code>a {% raw %}{{{&lt;C&gt;}}|{_}}{% endraw %} c</code></td>
<td>
<code>"a aaa bbb z c"</code><br>
<code>"a aaa bbb w c"</code><br>
<code>"a c"</code>
</td>
</tr>
<tr>
<td><code>{% raw %}{{{a}}} {b||_|{{_}}||_}{% endraw %}</code></td>
<td>
<code>"a b"</code><br>
<code>"a"</code>
</td>
</tr>
</tbody>
</table>
<p>
Specifically:
</p>
<ul>
<li><code>{A|B}</code> denotes either <code>A</code> or <code>B</code>.</li>
<li>
<code>{A|B|_}</code> denotes either <code>A</code> or <code>B</code> or nothing.
<ul>
<li>Symbol <code>_</code> cam appear anywhere in the list of options, i.e. <code>{A|B|_}</code> is equal to <code>{A|_|B}</code>.</li>
</ul>
</li>
<li>
<code>{C}[x,y]</code> denotes an option group with quantifier, i.e. group <code>C</code> appearing from <code>x</code> to <code>y</code> times inclusive.
<ul>
<li>For example, <code>{C}[1,3]</code> is the same as <code>{C|C C|C C C}</code> notation.</li>
<li>Note that <code>{C|_}</code> is equal to <code>{C}[0,1]</code></li>
</ul>
</li>
<li>Excessive curly brackets are ignored, when safe to do so.</li>
<li>Macros cannot be recursive but can be nested.</li>
<li>Option groups can be nested.</li>
<li>
<code>'\'</code> (backslash) can be used to escape <code>'{'</code>, <code>'}'</code>, <code>'|'</code> and
<code>'_'</code> special symbols used by the option groups.
</li>
<li>Excessive whitespaces are trimmed when expanding option groups.</li>
</ul>
<p>
We can rewrite our transportation semantic element in a more efficient way using macros and option groups.
Even though the actual length of definition hasn't changed much it now auto-generates many dozens of synonyms
we would have to write out manually otherwise:
</p>
<pre class="brush: js, highlight: [4,5,14]">
...
"macros": [
{
"name": "&lt;TRUCK_TYPE&gt;",
"macro": "{ {light|super|heavy|medium} duty|half ton|1/2 ton|3/4 ton|one ton}"
}
]
"elements": [
{
"id": "transport.vehicle",
"description": "Transportation vehicle",
"synonyms": [
"car",
"{&lt;TRUCK_TYPE&gt;|_} {pickup|_} truck"
"sedan",
"coupe"
]
}
]
...
</pre>
<span id="regex" class="section-sub-title">Regular Expressions <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span>
<p>
Any individual synonym word that starts and ends with <code>//</code> (two forward slashes) is
considered to be Java regular expression as defined in <code>java.util.regex.Pattern</code>. Note that
regular expression can only span a single word, i.e. only individual words from the user input will be
matched against given regular expression and no whitespaces are allowed within regular expression. Note
also that option group special symbols <code>{</code>, <code>}</code>,
<code>|</code> and <code>_</code> have to be escaped in the regular expression using <code>\</code>
(backslash).
</p>
<p>
For example, the following synonym:
</p>
<pre class="brush: js">
"synonyms": [
"{foo|//[bar].+//}}"
]
</pre>
<p>
will match word <code>foo</code> or any other strings that start with <code>bar</code> as long as
this string doesn't contain whitespaces.
</p>
<div class="bq info">
<b>Regular Expressions Performance</b>
<p>
It's important to note that regular expressions can significantly affect the performance of the
NLPCraft processing if used uncontrolled. Use it with caution and test the performance
of your model to ensure it meets your requirements.
</p>
</div>
</section>
<section id="parser-semantic-examples">
<h3 class="sub-section-title">Examples</h3>
<p>
The following example shows how to build model programmatically.
</p>
<pre class="brush: scala, highlight: [3, 5, 10]">
val mdl = new NCModel(
NCModelConfig("test.id", "Test Model", "1.0"),
new NCPipelineBuilder().withSemantic(
"en",
Map(
"&lt;OF&gt;" -&gt; "{of|for|per}",
"&lt;CUR&gt;" -&gt; "current|present|now|local}",
"&lt;TIME&gt;" -&gt; "{time &lt;OF&gt; day|day time|date|time|moment|datetime|hour|o'clock|clock|date time|date and time|time and date}",
)
List(
new NCSemanticElement():
override def getId: String = "time"
override def getSynonyms: Set[String] = Set("{&lt;CUR&gt;|_} &lt;TIME&gt;", "what &lt;TIME&gt; {is it now|now|is it|_}" )
)
).build
):
// Add your callbacks definition or references on them here.
</pre>
<ul>
<li>
<code>Line 5</code> shows <code>macro</code> parameter definition.
</li>
<li>
<code>Line 10</code> shows <code>macro</code> list of {% scaladoc nlp/parsers/NCSemanticElement NCSemanticElement %} parameter usage.
</li>
<li>
Note that usage {% scaladoc NCPipelineBuilder#withSemantic-fffff4b0 withSemantic() %}
method which represented on <code>line 3</code> is optional.
You can add {% scaladoc nlp/parsers/NCNLPEntityParser NCNLPEntityParser %}
as usual {% scaladoc NCEntityParser NCEntityParser %}
when you define your {% scaladoc NCPipeline NCPipeline %}.
</li>
</ul>
<p>
The following example is based on YAML semantic elements representation.
</p>
<pre class="brush: js, highlight: []">
macros:
"&lt;OF&gt;": "{of|for|per}"
"&lt;CUR&gt;": "{current|present|now|local}"
"&lt;TIME&gt;": "{time &lt;OF&gt; day|day time|date|time|moment|datetime|hour|o'clock|clock|date time|date and time|time and date}"
elements:
- id: "x:time"
description: "Date and/or time token indicator."
synonyms:
- "{&lt;CUR&gt;|_} &lt;TIME&gt;"
- "what &lt;TIME&gt; {is it now|now|is it|_}"
</pre>
<ul>
<li>
Same macros and the same element as in previous example are defined here in
<code>time_model.yaml</code> YAML file.
</li>
</ul>
<pre class="brush: scala, highlight: [3]">
val mdl = new NCModel(
NCModelConfig("test.id", "Test Model", "1.0"),
new NCPipelineBuilder().withSemantic("en", "time_model.yaml").build
):
// Add your callbacks definition or references on them here.
</pre>
<ul>
<li>
<code>Line 3</code> makes semantic model which elements are defined in <code>time_model.yaml</code> YAML file.
</li>
</ul>
<p>
If you want to use {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
with not English language, you have to provide custom
{% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %} and
{% scaladoc NCTokenParser NCTokenParser %}
implementations for required language. Look at the <a href="examples/light_switch_fr.html">Light Switch FR</a>
for more details.
</p>
<pre class="brush: scala, highlight: [4, 7, 8]">
package demo
import opennlp.tools.stemmer.snowball.SnowballStemmer
import demo.nlp.token.parser.NCFrTokenParser
import org.apache.nlpcraft.nlp.parsers.*
class NCFrSemanticEntityParser(src: String) extends NCSemanticEntityParser(
new NCSemanticStemmer:
private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.FRENCH)
override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString }
,
new NCFrTokenParser(),
mdlSrcOpt = Option(src)
)
</pre>
<ul>
<li>
<code>Line 4</code> includes <code>NCFrTokenParser</code> import.
Its custom {% scaladoc NCTokenParser NCTokenParser %}
implementation for French language, described here: <a href="examples/light_switch_fr.html">Light Switch FR</a>.
</li>
<li>
<code>Line 8</code> defines custom {% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %}
implementation for French language.
</li>
<li>
As you can see, <code>NCFrSemanticEntityParser</code> is very simple extension of
{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
base class, look at <code>line 7</code>.
</li>
</ul>
</section>
<section id="parser-semantic-extending">
<h3 class="sub-section-title">Languages Extending</h3>
<p>
If you want to use
{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
with any not English language you have to provide custom
{% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %} and
{% scaladoc NCTokenParser NCTokenParser %}
implementations for this desirable language.
Look at the <a href="examples/light_switch_fr.html">Light Switch FR</a> for more details.
</p>
</section>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#parser-opennlp">OpenNLP Based Parser</a></li>
<li><a href="#parser-stanford">Stanford NLP Based Entity</a></li>
<li><a href="#parser-nlp">NLP Parser</a></li>
<li><a href="#parser-semantic">Semantic Parser</a></li>
<li><a href="#parser-semantic-ne">Semantic Parser Named Entities</a></li>
<li><a href="#parser-semantic-elements">Semantic Parser Elements</a></li>
<li><a href="#parser-semantic-examples">Semantic Parser Examples</a></li>
<li><a href="#parser-semantic-extending">SemanticParser Languages Extending</a></li>
{% include quick-links.html %}
</ul>
</div>