ctxword
module provides Python-based internal tool for finding a contextually related words for a given word from the input sentence. This utility provides a single REST endpoint and is based on Google‘s BERT models and Facebook’s FastText library.
To install necessary dependency:
src/main/python/ctxword/bin/install_dependencies.sh
script.WINDOWS_SETUP.md
file for manual installation.To start ‘ctxword’ module REST server:
src/main/python/ctxword/bin/start_server.{sh|cmd}
script.NOTE: on the 1st start the server will try to load compressed BERT model which is not yet available. It will then download this library and compress it which will take a several minutes and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the server will start much faster.
Once the REST server is started you can issue REST calls to get suggestions for the contextual related words. REST server provides a single application/json
endpoint:
/suggestions
(POST)Returns contextual word replacement(s) for the specified word in the input sentence. Accepts JSON object parameter with the following fields:
"sentences"
"text"
represents the sentence text."indexes"
array of positions in the sentence of the words to generate suggestions for."simple"
false
. If set to true
, returns simple objects. If set to false
returns expanded objects with total, BERT and fasttext scores."limit"
"min_score"
"min_ftext"
"min_bert"
Endpoint returns one or more JSON objects with the following fields (depending on "simple"
request field):
"simple"
set to true
: [word1, word2, ...]
"simple"
set to false
:[{word1, total_score1, ft_score1, bert_score1}, {...}]
Here's the sample request and response JSON objects:
"simple": true, "sentences": [{"text": "foo bar baz", "indexes": [0, 2]}, {"text": "sample second sentence", indexes:[1]}]
[["word1", "word2", "word3"]]
suggest.{sh|cmd}
You can use Curl-based src/main/python/ctxword/bin/suggest.{sh|cmd}
scripts for the suggestion processing of single sentences from the command line. Following call returns list of contextual suggestions for the 5th word (counting from zero) in the given sentence:
$ bin/suggest.sh "what is the chance of rain tomorrow?" 5 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 214 100 104 100 110 954 1009 --:--:-- --:--:-- --:--:-- 1963 [ [ "rain", "snow", "rainfall", "precipitation", "rains", "flooding", "storms", "raining", "sunshine", "showers" ] ]
Copyright (C) 2020 Apache Software Foundation