tools/syn_tool.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Synonyms Tool
 layout: documentation
 id: syn_tool
 fa_icon: fa-tools
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div class="col-md-8 second-column">
     <section id="overview">
         <h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             Synonym suggester tool takes an existing model, analyses its synonyms and intents and comes up with
             a list of synonyms that are currently missing that you might want to add to your model.
         </p>
         <p>
             This tool is accessed via REST call. It is based on Google's BERT and Facebook fasttext
             models. It requires <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSample.html">@NCIntentSample</a> or
             <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSampleRef.html">@NCIntentSampleRef</a> annotations present on intent
             callbacks. When invoked, the tool scans the given data model for intents and these annotations, and based on these samples tries to determine
             which synonyms are missing in the model.
         </p>
         <div class="bq info">
             <p>
                 <b>Single Word Synonyms</b>
             </p>
             <p>
                 Synonym suggester tool analyses only single word synonyms ignoring any multi-word synonyms. You
                 can often convert a named element with multi-word synonyms into a combination of multiple named
                 elements each with a single word synonyms using <a href="/data-model.html#dsl">Composable NERs</a>
                 technique.
             </p>
         </div>
     </section>
     <section id="usage">
         <h2 class="section-title">Usage <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             In order to use this tool the <code>ctxword</code> server and NLPCraft server should be started as well as
             the server's configuration should potentially be updated.
         </p>
         <h2 class="section-sub-title"><code>ctxword</code> Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <div class="bq warn">
             <b>Python 3.6-3.8</b>
             <p>
                 As of this writing (Dec 2020) the <code>ctxword</code> server and its dependencies work only with Python 3.6-3.8 version.
             </p>
         </div>
         <p>
             'ctxword' server is a Python-based module that provides BERT and fasttext based implementation
             for finding a contextually related words for a given word from the input sentence. NLPCraft server interacts
             with 'ctxword' server via internal REST interface. To configure NLPCraft server and start 'ctxword' Python-based
             server follow these steps:
         </p>
         <ol>
             <li>
                 <p>
                     Install necessary dependencies by running the following commands from the NLPCraft installation
                     directory:<br/>
                     <b>NOTE:</b> this step should only be performed once.
                 </p>
                 <nav>
                     <div class="nav nav-tabs" role="tablist">
                         <a class="nav-item nav-link active" data-toggle="tab" href="#nav-nix" role="tab">Linux/Unix/MacOS</a>
                         <a class="nav-item nav-link" data-toggle="tab" href="#nav-win" role="tab">Windows</a>
                     </div>
                 </nav>
                 <div class="tab-content">
                     <div class="tab-pane fade show active" id="nav-nix" role="tabpanel">
                         <pre class="brush: bash">
                             $ cd nlpcraft/src/main/python/ctxword
                             $ bin/install_dependencies.sh
                         </pre>
                     </div>
                     <div class="tab-pane fade show" id="nav-win" role="tabpanel">
                         <p></p>
                         <p>
                             Read <code>src\main\python\ctxword\bin\WINDOWS_SETUP.md</code> file for manual installation instructions.
                         </p>
                     </div>
                 </div>
             </li>
             <li>
                 <em>Optional.</em>
                 <br/>
                 Configure <code>nlpcraft.server.ctxword.url</code> property in <code>nlpcraft.conf</code> file (or your own configuration file).
                 This property comes with a default endpoint and you only need to change it if you change the
                 'ctxword' module implementation.
             </li>
             <li>
                 Start the 'ctxword' server by running the following commands from NLPCraft installation directory:
                 <pre class="brush: bash">
                     $ cd nlpcraft/src/main/python/ctxword
                     $ bin/start_server.{sh|cmd}
                 </pre>
                 <div class="bq info">
                     <p>
                         <b>1st Start</b>
                     </p>
                     Note that on the first start the server will try to load compressed BERT model which is not yet
                     available. It will then download this library and compress it which will take a several minutes
                     and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the
                     server will start much faster.
                 </div>
             </li>
         </ol>
         <h2 class="section-sub-title">REST Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             REST server should be <a href="/server-and-probe.html#server">started</a>.
         </p>
         <h2 class="section-sub-title">Running <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             Synonyms tool can be run in two different ways:
         </p>
         <nav>
             <div class="nav nav-tabs" role="tablist">
                 <a class="nav-item nav-link active" data-toggle="tab" href="#nav-script" role="tab">NLPCraft CLI</a>
                 <a class="nav-item nav-link" data-toggle="tab" href="#nav-rest" role="tab">REST Call</a>
             </div>
         </nav>
         <div class="tab-content">
             <div class="tab-pane fade show active" id="nav-script" role="tabpanel">
                 <pre class="brush: bash">
                     $ bin/nlpcraft.sh help --cmd=model-sugsyn
                     $ bin/nlpcraft.sh model-sugsyn --mdlId=nlpcraft.alarm.ex --minScore=0.5
                 </pre>
                 <p>
                     <b>NOTES:</b>
                 </p>
                 <ul>
                     <li>
                         <code>mldId</code> parameter is only required if there is more than one model
                         deployed in the connected data probe. If the data probe has only one model you can
                         ommit this parameter.
                     </li>
                     <li>
                         <code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
                         <code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
                         with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
                     </li>
                     <li>
                         <a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for
                         <i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code>
                         for <i class="fab fa-fw fa-windows"></i>.
                     </li>
                     <li>
                         Run <code class="script">bin/nlpcraft.sh help --cmd=model-sugsyn</code> to get a full help on this command.
                     </li>
                 </ul>
             </div>
             <div class="tab-pane fade show" id="nav-rest" role="tabpanel">
                 <p></p>
                 <p>
                     <a href="/using-rest.html">REST API</a> accepts only <code>POST</code> HTTP calls and <code>application/json</code> content type
                     for JSON payload and responses. When issuing a REST call for this tool you will be using the following URL:
                 </p>
                 <pre class="brush: plain">
                     https://localhost:8081/api/v1/model/sugsyn
                 </pre>
                 <p>
                     where:
                 <dl>
                     <dt><code>http</code></dt>
                     <dd>Either <code>http</code> or <code>https</code> protocol.</dd>
                     <dt><code>localhost:8081</code></dt>
                     <dd>Host and port on which REST server is started. <code>localhost:8081</code> is the default configuration and can be <a href="/server-and-probe.html">changed</a>.</dd>
                     <dt><code>/api/v1</code></dt>
                     <dd>Mandatory prefix indicating API version.</dd>
                     <dt><code>model/sugsyn</code></dt>
                     <dd>Synonym suggester REST call.</dd>
                 </dl>
                 <p>
                     The parameters should be passed in as JSON:
                 </p>
                 <pre class="brush: js">
         {
             "acsTok": "qweqw9123uqwe",
             "mdlId": "nlpcraft.alarm.ex",
             "minScore": 0.5
         }
                 </pre>
                 <p>
                     where:
                 </p>
                 <ul>
                     <li>
                         <code>acsTok</code> - access token obtain via previous <code>'/signin'</code> call.
                     </li>
                     <li>
                         <code>mdlId</code> - ID of the model to run synonym suggester on.
                     </li>
                     <li>
                         <code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
                         <code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
                         with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
                     </li>
                 </ul>
             </div>
         </div>
         <p>
             Either way the synonym suggester returns the following JSON result (<code>nlpcraft.alarm.ex</code>
             model from <a href="/examples/alarm_clock.html">Alarm</a> example):
         </p>
         <pre class="brush: js">
 {
 "status": "API_OK",
 "result": {
   "modelId": "nlpcraft.alarm.ex",
   "minScore": 0.5,
   "durationMs": 424.0,
   "timestamp": 1.60091239852E12,
   "suggestions": [
     {
       "x:alarm": [
         {
           "score": 1.0,
           "synonym": "ask"
         },
         {
           "score": 0.9477103542042674,
           "synonym": "join"
         },
         {
           "score": 0.8882341083867801,
           "synonym": "get"
         },
         {
           "score": 0.7330826349218547,
           "synonym": "remember"
         },
         {
           "score": 0.6902880910527778,
           "synonym": "contact"
         },
         {
           "score": 0.6014764219771813,
           "synonym": "time"
         },
         {
           "score": 0.5816398376889104,
           "synonym": "follow"
         },
         {
           "score": 0.5640882890681899,
           "synonym": "watch"
         },
         {
           "score": 0.5139855649326083,
           "synonym": "stop"
         },
         {
           "score": 0.5136895804732818,
           "synonym": "kill"
         },
         {
           "score": 0.5001167992233122,
           "synonym": "send"
         }
       ]
     }
   ],
   "warnings": [
     "Model has too few (3) intents samples. It will negatively affect the quality of suggestions. Try to increase overall sample count to at least 20."
   ]
 }
         </pre>
         <p>
             The result is structured as a list of proposed synonyms with their corresponding scores for each model's
             element.
             You should analyse the results for their fitness for your model and its existing synonyms. The tool cannot guarantee
             that every suggested synonym is appropriate or valid - but it gives a good "courtesy" check for potentially
             missing synonyms.
         </p>
         <div class="bq info">
             <p>
                 <b>Run Periodically</b>
             </p>
             <p>
                 It is a good idea to run this tool periodically if you are actively changing the model. With dozens or hundreds
                 of model elements it is very hard to manually maintain quality set of synonyms. With a good list of
                 user input samples for each intent this tool can be indispensable for easy maintenance of the synonyms.
             </p>
         </div>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Overview</a></li>
         <li><a href="#usage">Usage</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Synonyms Tool
	layout: documentation
	id: syn_tool
	fa_icon: fa-tools
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div class="col-md-8 second-column">
	<section id="overview">
	<h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	Synonym suggester tool takes an existing model, analyses its synonyms and intents and comes up with
	a list of synonyms that are currently missing that you might want to add to your model.
	</p>
	<p>
	This tool is accessed via REST call. It is based on Google's BERT and Facebook fasttext
	models. It requires <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSample.html">@NCIntentSample</a> or
	<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSampleRef.html">@NCIntentSampleRef</a> annotations present on intent
	callbacks. When invoked, the tool scans the given data model for intents and these annotations, and based on these samples tries to determine
	which synonyms are missing in the model.
	</p>
	<div class="bq info">
	<p>
	<b>Single Word Synonyms</b>
	</p>
	<p>
	Synonym suggester tool analyses only single word synonyms ignoring any multi-word synonyms. You
	can often convert a named element with multi-word synonyms into a combination of multiple named
	elements each with a single word synonyms using <a href="/data-model.html#dsl">Composable NERs</a>
	technique.
	</p>
	</div>
	</section>
	<section id="usage">
	<h2 class="section-title">Usage <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	In order to use this tool the <code>ctxword</code> server and NLPCraft server should be started as well as
	the server's configuration should potentially be updated.
	</p>
	<h2 class="section-sub-title"><code>ctxword</code> Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<div class="bq warn">
	<b>Python 3.6-3.8</b>
	<p>
	As of this writing (Dec 2020) the <code>ctxword</code> server and its dependencies work only with Python 3.6-3.8 version.
	</p>
	</div>
	<p>
	'ctxword' server is a Python-based module that provides BERT and fasttext based implementation
	for finding a contextually related words for a given word from the input sentence. NLPCraft server interacts
	with 'ctxword' server via internal REST interface. To configure NLPCraft server and start 'ctxword' Python-based
	server follow these steps:
	</p>
	<ol>
	<li>
	<p>
	Install necessary dependencies by running the following commands from the NLPCraft installation
	directory:<br/>
	<b>NOTE:</b> this step should only be performed once.
	</p>
	<nav>
	<div class="nav nav-tabs" role="tablist">
	<a class="nav-item nav-link active" data-toggle="tab" href="#nav-nix" role="tab">Linux/Unix/MacOS</a>
	<a class="nav-item nav-link" data-toggle="tab" href="#nav-win" role="tab">Windows</a>
	</div>
	</nav>
	<div class="tab-content">
	<div class="tab-pane fade show active" id="nav-nix" role="tabpanel">
	<pre class="brush: bash">
	$ cd nlpcraft/src/main/python/ctxword
	$ bin/install_dependencies.sh
	</pre>
	</div>
	<div class="tab-pane fade show" id="nav-win" role="tabpanel">
	<p></p>
	<p>
	Read <code>src\main\python\ctxword\bin\WINDOWS_SETUP.md</code> file for manual installation instructions.
	</p>
	</div>
	</div>
	</li>
	<li>
	<em>Optional.</em>
	<br/>
	Configure <code>nlpcraft.server.ctxword.url</code> property in <code>nlpcraft.conf</code> file (or your own configuration file).
	This property comes with a default endpoint and you only need to change it if you change the
	'ctxword' module implementation.
	</li>
	<li>
	Start the 'ctxword' server by running the following commands from NLPCraft installation directory:
	<pre class="brush: bash">
	$ cd nlpcraft/src/main/python/ctxword
	$ bin/start_server.{sh\|cmd}
	</pre>
	<div class="bq info">
	<p>
	<b>1st Start</b>
	</p>
	Note that on the first start the server will try to load compressed BERT model which is not yet
	available. It will then download this library and compress it which will take a several minutes
	and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the
	server will start much faster.
	</div>
	</li>
	</ol>
	<h2 class="section-sub-title">REST Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	REST server should be <a href="/server-and-probe.html#server">started</a>.
	</p>
	<h2 class="section-sub-title">Running <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	Synonyms tool can be run in two different ways:
	</p>
	<nav>
	<div class="nav nav-tabs" role="tablist">
	<a class="nav-item nav-link active" data-toggle="tab" href="#nav-script" role="tab">NLPCraft CLI</a>
	<a class="nav-item nav-link" data-toggle="tab" href="#nav-rest" role="tab">REST Call</a>
	</div>
	</nav>
	<div class="tab-content">
	<div class="tab-pane fade show active" id="nav-script" role="tabpanel">
	<pre class="brush: bash">
	$ bin/nlpcraft.sh help --cmd=model-sugsyn
	$ bin/nlpcraft.sh model-sugsyn --mdlId=nlpcraft.alarm.ex --minScore=0.5
	</pre>
	<p>
	<b>NOTES:</b>
	</p>
	<ul>
	<li>
	<code>mldId</code> parameter is only required if there is more than one model
	deployed in the connected data probe. If the data probe has only one model you can
	ommit this parameter.
	</li>
	<li>
	<code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
	<code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
	with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
	</li>
	<li>
	<a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for
	<i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code>
	for <i class="fab fa-fw fa-windows"></i>.
	</li>
	<li>
	Run <code class="script">bin/nlpcraft.sh help --cmd=model-sugsyn</code> to get a full help on this command.
	</li>
	</ul>
	</div>
	<div class="tab-pane fade show" id="nav-rest" role="tabpanel">
	<p></p>
	<p>
	<a href="/using-rest.html">REST API</a> accepts only <code>POST</code> HTTP calls and <code>application/json</code> content type
	for JSON payload and responses. When issuing a REST call for this tool you will be using the following URL:
	</p>
	<pre class="brush: plain">
	https://localhost:8081/api/v1/model/sugsyn
	</pre>
	<p>
	where:
	<dl>
	<dt><code>http</code></dt>
	<dd>Either <code>http</code> or <code>https</code> protocol.</dd>
	<dt><code>localhost:8081</code></dt>
	<dd>Host and port on which REST server is started. <code>localhost:8081</code> is the default configuration and can be <a href="/server-and-probe.html">changed</a>.</dd>
	<dt><code>/api/v1</code></dt>
	<dd>Mandatory prefix indicating API version.</dd>
	<dt><code>model/sugsyn</code></dt>
	<dd>Synonym suggester REST call.</dd>
	</dl>
	<p>
	The parameters should be passed in as JSON:
	</p>
	<pre class="brush: js">
	{
	"acsTok": "qweqw9123uqwe",
	"mdlId": "nlpcraft.alarm.ex",
	"minScore": 0.5
	}
	</pre>
	<p>
	where:
	</p>
	<ul>
	<li>
	<code>acsTok</code> - access token obtain via previous <code>'/signin'</code> call.
	</li>
	<li>
	<code>mdlId</code> - ID of the model to run synonym suggester on.
	</li>
	<li>
	<code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0.
	<code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results
	with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.
	</li>
	</ul>
	</div>
	</div>
	<p>
	Either way the synonym suggester returns the following JSON result (<code>nlpcraft.alarm.ex</code>
	model from <a href="/examples/alarm_clock.html">Alarm</a> example):
	</p>
	<pre class="brush: js">
	{
	"status": "API_OK",
	"result": {
	"modelId": "nlpcraft.alarm.ex",
	"minScore": 0.5,
	"durationMs": 424.0,
	"timestamp": 1.60091239852E12,
	"suggestions": [
	{
	"x:alarm": [
	{
	"score": 1.0,
	"synonym": "ask"
	},
	{
	"score": 0.9477103542042674,
	"synonym": "join"
	},
	{
	"score": 0.8882341083867801,
	"synonym": "get"
	},
	{
	"score": 0.7330826349218547,
	"synonym": "remember"
	},
	{
	"score": 0.6902880910527778,
	"synonym": "contact"
	},
	{
	"score": 0.6014764219771813,
	"synonym": "time"
	},
	{
	"score": 0.5816398376889104,
	"synonym": "follow"
	},
	{
	"score": 0.5640882890681899,
	"synonym": "watch"
	},
	{
	"score": 0.5139855649326083,
	"synonym": "stop"
	},
	{
	"score": 0.5136895804732818,
	"synonym": "kill"
	},
	{
	"score": 0.5001167992233122,
	"synonym": "send"
	}
	]
	}
	],
	"warnings": [
	"Model has too few (3) intents samples. It will negatively affect the quality of suggestions. Try to increase overall sample count to at least 20."
	]
	}
	</pre>
	<p>
	The result is structured as a list of proposed synonyms with their corresponding scores for each model's
	element.
	You should analyse the results for their fitness for your model and its existing synonyms. The tool cannot guarantee
	that every suggested synonym is appropriate or valid - but it gives a good "courtesy" check for potentially
	missing synonyms.
	</p>
	<div class="bq info">
	<p>
	<b>Run Periodically</b>
	</p>
	<p>
	It is a good idea to run this tool periodically if you are actively changing the model. With dozens or hundreds
	of model elements it is very hard to manually maintain quality set of synonyms. With a good list of
	user input samples for each intent this tool can be indispensable for easy maintenance of the synonyms.
	</p>
	</div>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#overview">Overview</a></li>
	<li><a href="#usage">Usage</a></li>
	{% include quick-links.html %}
	</ul>
	</div>