blob: 2be6fea117bc2486ed8dfb08bc74479c3c04b570 [file] [log] [blame]
---
active_crumb: Synonyms Tool
layout: documentation
id: syn_tool
fa_icon: fa-tools
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Synonym suggester tool takes an existing model, analyses its synonyms and intents and comes up with
a list of synonyms that are currently missing that you might want to add to your model.
</p>
<p>
This tool is accessed via REST call. It is based on Google's BERT and Facebook fasttext
models. It requires <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSample.html">@NCIntentSample</a> or
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSampleRef.html">@NCIntentSampleRef</a> annotations present on intent
callbacks. When invoked, the tool scans the given data model for intents and these annotations, and based on these samples tries to determine
which synonyms are missing in the model.
</p>
<div class="bq info">
<p>
<b>Single Word Synonyms</b>
</p>
<p>
Synonym suggester tool analyses only single word synonyms ignoring any multi-word synonyms. You
can often convert a named element with multi-word synonyms into a combination of multiple named
elements each with a single word synonyms using <a href="/data-model.html#dsl">Composable NERs</a>
technique.
</p>
</div>
</section>
<section id="usage">
<h2 class="section-title">Usage <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
In order to use this tool the <code>ctxword</code> server and NLPCraft server should be started as well as
the server's configuration should potentially be updated.
</p>
<h2 class="section-sub-title"><code>ctxword</code> Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<div class="bq warn">
<b>Python 3.6-3.8</b>
<p>
As of this writing (Dec 2020) the <code>ctxword</code> server and its dependencies work only with Python 3.6-3.8 version.
</p>
</div>
<p>
'ctxword' server is a Python-based module that provides BERT and fasttext based implementation
for finding a contextually related words for a given word from the input sentence. NLPCraft server interacts
with 'ctxword' server via internal REST interface. To configure NLPCraft server and start 'ctxword' Python-based
server follow these steps:
</p>
<ol>
<li>
<p>
Install necessary dependencies by running the following commands from the NLPCraft installation
directory:<br/>
<b>NOTE:</b> this step should only be performed once.
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#nav-nix" role="tab">Linux/Unix/MacOS</a>
<a class="nav-item nav-link" data-toggle="tab" href="#nav-win" role="tab">Windows</a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="nav-nix" role="tabpanel">
<pre class="brush: bash">
$ cd nlpcraft/src/main/python/ctxword
$ bin/install_dependencies.sh
</pre>
</div>
<div class="tab-pane fade show" id="nav-win" role="tabpanel">
<p></p>
<p>
Read <code>src\main\python\ctxword\bin\WINDOWS_SETUP.md</code> file for manual installation instructions.
</p>
</div>
</div>
</li>
<li>
<em>Optional.</em>
<br/>
Configure <code>nlpcraft.server.ctxword.url</code> property in <code>nlpcraft.conf</code> file (or your own configuration file).
This property comes with a default endpoint and you only need to change it if you change the
'ctxword' module implementation.
</li>
<li>
Start the 'ctxword' server by running the following commands from NLPCraft installation directory:
<pre class="brush: bash">
$ cd nlpcraft/src/main/python/ctxword
$ bin/start_server.{sh|cmd}
</pre>
<div class="bq info">
<p>
<b>1st Start</b>
</p>
Note that on the first start the server will try to load compressed BERT model which is not yet
available. It will then download this library and compress it which will take a several minutes
and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the
server will start much faster.
</div>
</li>
</ol>
<h2 class="section-sub-title">REST Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
REST server should be <a href="/server-and-probe.html#server">started</a>.
</p>
<h2 class="section-sub-title">Running <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Synonyms tool can be run in two different ways:
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#nav-script" role="tab">NLPCraft CLI</a>
<a class="nav-item nav-link" data-toggle="tab" href="#nav-rest" role="tab">REST Call</a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="nav-script" role="tabpanel">
<pre class="brush: bash">
$ bin/nlpcraft.sh help --cmd=model-sugsyn
$ bin/nlpcraft.sh model-sugsyn --mdlId=nlpcraft.alarm.ex --minScore=0.5
</pre>
<p>
<b>NOTES:</b>
</p>
<ul>
<li>
<code>mldId</code> parameter is only required if there is more than one model
deployed in the connected data probe. If the data probe has only one model you can
ommit this parameter.
</li>
<li>
<code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
<code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
</li>
<li>
<a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for
<i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code>
for <i class="fab fa-fw fa-windows"></i>.
</li>
<li>
Run <code class="script">bin/nlpcraft.sh help --cmd=model-sugsyn</code> to get a full help on this command.
</li>
</ul>
</div>
<div class="tab-pane fade show" id="nav-rest" role="tabpanel">
<p></p>
<p>
<a href="/using-rest.html">REST API</a> accepts only <code>POST</code> HTTP calls and <code>application/json</code> content type
for JSON payload and responses. When issuing a REST call for this tool you will be using the following URL:
</p>
<pre class="brush: plain">
https://localhost:8081/api/v1/model/sugsyn
</pre>
<p>
where:
<dl>
<dt><code>http</code></dt>
<dd>Either <code>http</code> or <code>https</code> protocol.</dd>
<dt><code>localhost:8081</code></dt>
<dd>Host and port on which REST server is started. <code>localhost:8081</code> is the default configuration and can be <a href="/server-and-probe.html">changed</a>.</dd>
<dt><code>/api/v1</code></dt>
<dd>Mandatory prefix indicating API version.</dd>
<dt><code>model/sugsyn</code></dt>
<dd>Synonym suggester REST call.</dd>
</dl>
<p>
The parameters should be passed in as JSON:
</p>
<pre class="brush: js">
{
"acsTok": "qweqw9123uqwe",
"mdlId": "nlpcraft.alarm.ex",
"minScore": 0.5
}
</pre>
<p>
where:
</p>
<ul>
<li>
<code>acsTok</code> - access token obtain via previous <code>'/signin'</code> call.
</li>
<li>
<code>mdlId</code> - ID of the model to run synonym suggester on.
</li>
<li>
<code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
<code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
</li>
</ul>
</div>
</div>
<p>
Either way the synonym suggester returns the following JSON result (<code>nlpcraft.alarm.ex</code>
model from <a href="/examples/alarm_clock.html">Alarm</a> example):
</p>
<pre class="brush: js">
{
"status": "API_OK",
"result": {
"modelId": "nlpcraft.alarm.ex",
"minScore": 0.5,
"durationMs": 424.0,
"timestamp": 1.60091239852E12,
"suggestions": [
{
"x:alarm": [
{
"score": 1.0,
"synonym": "ask"
},
{
"score": 0.9477103542042674,
"synonym": "join"
},
{
"score": 0.8882341083867801,
"synonym": "get"
},
{
"score": 0.7330826349218547,
"synonym": "remember"
},
{
"score": 0.6902880910527778,
"synonym": "contact"
},
{
"score": 0.6014764219771813,
"synonym": "time"
},
{
"score": 0.5816398376889104,
"synonym": "follow"
},
{
"score": 0.5640882890681899,
"synonym": "watch"
},
{
"score": 0.5139855649326083,
"synonym": "stop"
},
{
"score": 0.5136895804732818,
"synonym": "kill"
},
{
"score": 0.5001167992233122,
"synonym": "send"
}
]
}
],
"warnings": [
"Model has too few (3) intents samples. It will negatively affect the quality of suggestions. Try to increase overall sample count to at least 20."
]
}
</pre>
<p>
The result is structured as a list of proposed synonyms with their corresponding scores for each model's
element.
You should analyse the results for their fitness for your model and its existing synonyms. The tool cannot guarantee
that every suggested synonym is appropriate or valid - but it gives a good "courtesy" check for potentially
missing synonyms.
</p>
<div class="bq info">
<p>
<b>Run Periodically</b>
</p>
<p>
It is a good idea to run this tool periodically if you are actively changing the model. With dozens or hundreds
of model elements it is very hard to manually maintain quality set of synonyms. With a good list of
user input samples for each intent this tool can be indispensable for easy maintenance of the synonyms.
</p>
</div>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#usage">Usage</a></li>
{% include quick-links.html %}
</ul>
</div>