| --- |
| active_crumb: Synonyms Tool |
| layout: documentation |
| id: syn_tool |
| fa_icon: fa-tools |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Synonym suggester tool takes an existing model, analyses its synonyms and intents and comes up with |
| a list of synonyms that are currently missing that you might want to add to your model. |
| </p> |
| <p> |
| This tool is accessed via REST call. It is based on Google's BERT and Facebook fasttext |
| models. It requires <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSample.html">@NCIntentSample</a> or |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSampleRef.html">@NCIntentSampleRef</a> annotations present on intent |
| callbacks. When invoked, the tool scans the given data model for intents and these annotations, and based on these samples tries to determine |
| which synonyms are missing in the model. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Single Word Synonyms</b> |
| </p> |
| <p> |
| Synonym suggester tool analyses only single word synonyms ignoring any multi-word synonyms. You |
| can often convert a named element with multi-word synonyms into a combination of multiple named |
| elements each with a single word synonyms using <a href="/data-model.html#dsl">Composable NERs</a> |
| technique. |
| </p> |
| </div> |
| </section> |
| <section id="usage"> |
| <h2 class="section-title">Usage <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| In order to use this tool the <code>ctxword</code> server and NLPCraft server should be started as well as |
| the server's configuration should potentially be updated. |
| </p> |
| <h2 class="section-sub-title"><code>ctxword</code> Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <div class="bq warn"> |
| <b>Python 3.6-3.8</b> |
| <p> |
| As of this writing (Dec 2020) the <code>ctxword</code> server and its dependencies work only with Python 3.6-3.8 version. |
| </p> |
| </div> |
| <p> |
| 'ctxword' server is a Python-based module that provides BERT and fasttext based implementation |
| for finding a contextually related words for a given word from the input sentence. NLPCraft server interacts |
| with 'ctxword' server via internal REST interface. To configure NLPCraft server and start 'ctxword' Python-based |
| server follow these steps: |
| </p> |
| <ol> |
| <li> |
| <p> |
| Install necessary dependencies by running the following commands from the NLPCraft installation |
| directory:<br/> |
| <b>NOTE:</b> this step should only be performed once. |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-nix" role="tab">Linux/Unix/MacOS</a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-win" role="tab">Windows</a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-nix" role="tabpanel"> |
| <pre class="brush: bash"> |
| $ cd nlpcraft/src/main/python/ctxword |
| $ bin/install_dependencies.sh |
| </pre> |
| </div> |
| <div class="tab-pane fade show" id="nav-win" role="tabpanel"> |
| <p></p> |
| <p> |
| Read <code>src\main\python\ctxword\bin\WINDOWS_SETUP.md</code> file for manual installation instructions. |
| </p> |
| </div> |
| </div> |
| </li> |
| <li> |
| <em>Optional.</em> |
| <br/> |
| Configure <code>nlpcraft.server.ctxword.url</code> property in <code>nlpcraft.conf</code> file (or your own configuration file). |
| This property comes with a default endpoint and you only need to change it if you change the |
| 'ctxword' module implementation. |
| </li> |
| <li> |
| Start the 'ctxword' server by running the following commands from NLPCraft installation directory: |
| <pre class="brush: bash"> |
| $ cd nlpcraft/src/main/python/ctxword |
| $ bin/start_server.{sh|cmd} |
| </pre> |
| <div class="bq info"> |
| <p> |
| <b>1st Start</b> |
| </p> |
| Note that on the first start the server will try to load compressed BERT model which is not yet |
| available. It will then download this library and compress it which will take a several minutes |
| and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the |
| server will start much faster. |
| </div> |
| </li> |
| </ol> |
| <h2 class="section-sub-title">REST Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| REST server should be <a href="/server-and-probe.html#server">started</a>. |
| </p> |
| <h2 class="section-sub-title">Running <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Synonyms tool can be run in two different ways: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-script" role="tab">NLPCraft CLI</a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-rest" role="tab">REST Call</a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-script" role="tabpanel"> |
| <pre class="brush: bash"> |
| $ bin/nlpcraft.sh help --cmd=model-sugsyn |
| $ bin/nlpcraft.sh model-sugsyn --mdlId=nlpcraft.alarm.ex --minScore=0.5 |
| </pre> |
| <p> |
| <b>NOTES:</b> |
| </p> |
| <ul> |
| <li> |
| <code>mldId</code> parameter is only required if there is more than one model |
| deployed in the connected data probe. If the data probe has only one model you can |
| ommit this parameter. |
| </li> |
| <li> |
| <code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0. |
| <code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results |
| with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested. |
| </li> |
| <li> |
| <a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for |
| <i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code> |
| for <i class="fab fa-fw fa-windows"></i>. |
| </li> |
| <li> |
| Run <code class="script">bin/nlpcraft.sh help --cmd=model-sugsyn</code> to get a full help on this command. |
| </li> |
| </ul> |
| </div> |
| <div class="tab-pane fade show" id="nav-rest" role="tabpanel"> |
| <p></p> |
| <p> |
| <a href="/using-rest.html">REST API</a> accepts only <code>POST</code> HTTP calls and <code>application/json</code> content type |
| for JSON payload and responses. When issuing a REST call for this tool you will be using the following URL: |
| </p> |
| <pre class="brush: plain"> |
| https://localhost:8081/api/v1/model/sugsyn |
| </pre> |
| <p> |
| where: |
| <dl> |
| <dt><code>http</code></dt> |
| <dd>Either <code>http</code> or <code>https</code> protocol.</dd> |
| <dt><code>localhost:8081</code></dt> |
| <dd>Host and port on which REST server is started. <code>localhost:8081</code> is the default configuration and can be <a href="/server-and-probe.html">changed</a>.</dd> |
| <dt><code>/api/v1</code></dt> |
| <dd>Mandatory prefix indicating API version.</dd> |
| <dt><code>model/sugsyn</code></dt> |
| <dd>Synonym suggester REST call.</dd> |
| </dl> |
| <p> |
| The parameters should be passed in as JSON: |
| </p> |
| <pre class="brush: js"> |
| { |
| "acsTok": "qweqw9123uqwe", |
| "mdlId": "nlpcraft.alarm.ex", |
| "minScore": 0.5 |
| } |
| </pre> |
| <p> |
| where: |
| </p> |
| <ul> |
| <li> |
| <code>acsTok</code> - access token obtain via previous <code>'/signin'</code> call. |
| </li> |
| <li> |
| <code>mdlId</code> - ID of the model to run synonym suggester on. |
| </li> |
| <li> |
| <code>minScore</code> - Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0. |
| <code>minScore</code> of 0 will include all results, and <code>minScore</code> of 1 will include only results |
| with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested. |
| </li> |
| </ul> |
| </div> |
| </div> |
| <p> |
| Either way the synonym suggester returns the following JSON result (<code>nlpcraft.alarm.ex</code> |
| model from <a href="/examples/alarm_clock.html">Alarm</a> example): |
| </p> |
| <pre class="brush: js"> |
| { |
| "status": "API_OK", |
| "result": { |
| "modelId": "nlpcraft.alarm.ex", |
| "minScore": 0.5, |
| "durationMs": 424.0, |
| "timestamp": 1.60091239852E12, |
| "suggestions": [ |
| { |
| "x:alarm": [ |
| { |
| "score": 1.0, |
| "synonym": "ask" |
| }, |
| { |
| "score": 0.9477103542042674, |
| "synonym": "join" |
| }, |
| { |
| "score": 0.8882341083867801, |
| "synonym": "get" |
| }, |
| { |
| "score": 0.7330826349218547, |
| "synonym": "remember" |
| }, |
| { |
| "score": 0.6902880910527778, |
| "synonym": "contact" |
| }, |
| { |
| "score": 0.6014764219771813, |
| "synonym": "time" |
| }, |
| { |
| "score": 0.5816398376889104, |
| "synonym": "follow" |
| }, |
| { |
| "score": 0.5640882890681899, |
| "synonym": "watch" |
| }, |
| { |
| "score": 0.5139855649326083, |
| "synonym": "stop" |
| }, |
| { |
| "score": 0.5136895804732818, |
| "synonym": "kill" |
| }, |
| { |
| "score": 0.5001167992233122, |
| "synonym": "send" |
| } |
| ] |
| } |
| ], |
| "warnings": [ |
| "Model has too few (3) intents samples. It will negatively affect the quality of suggestions. Try to increase overall sample count to at least 20." |
| ] |
| } |
| </pre> |
| <p> |
| The result is structured as a list of proposed synonyms with their corresponding scores for each model's |
| element. |
| You should analyse the results for their fitness for your model and its existing synonyms. The tool cannot guarantee |
| that every suggested synonym is appropriate or valid - but it gives a good "courtesy" check for potentially |
| missing synonyms. |
| </p> |
| <div class="bq info"> |
| <p> |
| <b>Run Periodically</b> |
| </p> |
| <p> |
| It is a good idea to run this tool periodically if you are actively changing the model. With dozens or hundreds |
| of model elements it is very hard to manually maintain quality set of synonyms. With a good list of |
| user input samples for each intent this tool can be indispensable for easy maintenance of the synonyms. |
| </p> |
| </div> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#usage">Usage</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |
| |
| |