tree: 6c7a89aa9153a5f15326cb600783a54b6f2d303a [path history] [tgz]
  1. bertft/
  2. bin/
  3. jupyter/
  4. README.md
  5. server.py
  6. WINDOWS_SETUP.md
nlpcraft/src/main/python/ctxword/README.md

License Build Documentation Status Gitter

Overview

ctxword module provides Python-based internal tool for finding a contextually related words for a given word from the input sentence. This utility provides a single REST endpoint and is based on Google‘s BERT models and Facebook’s FastText library.

Dependencies

To install necessary dependency:

  • Linux/MacOS:
    • $ cd nlpcraft/src/main/python/ctxword
    • $ bin/install_dependencies.sh
  • Windows: read nlpcraft\src\main\python\ctxword\bin\WINDOWS_SETUP.md file for manual installation.

Start REST Server

To start ‘ctxword’ module REST server:

  • $ cd nlpcrat/src/main/python/ctxword
  • $ bin/start_server.{sh|cmd}

NOTE: on the 1st start the server will try to load compressed BERT model which is not yet available. It will then download this library and compress it which will take a several minutes and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the server will start much faster.

REST API

Once the REST server starts you can issue REST calls to get suggestions for the contextual related words. REST server provides a single application/json endpoint:

/suggestions (POST)

Returns contextual word replacement(s) for the specified word in the input sentence. Accepts JSON object parameter with the following fields:

  • "sentences"
    • List of sentences. Each sentence encoded as object with the following fields:
      • "text" represents the sentence text.
      • "indexes" array of positions in the sentence of the words to generate suggestions for.
  • "simple"
    • Optional, defaults to false. If set to true, returns simple objects. If set to false returns expanded objects with total, BERT and fasttext scores.
  • "limit"
    • Optional, defaults to 10. Sets limit of result words number.
  • "min_score"
    • Optional, defaults to 0. Sets the minimal requirement for total score.
  • "min_ftext"
    • Optional, default to 0.25. Sets the minimal requirement of FastText score.
  • "min_bert"
    • Optional, default to 0. Sets the minimal requirement of Bert score.

Endpoint returns one or more JSON objects with the following fields (depending on "simple" request field):

  • If "simple" set to true: [word1, word2, ...]
  • If "simple" set to false:[{word1, total_score1, ft_score1, bert_score1}, {...}]

Examples

Here's the sample request and response JSON objects:

  • Request JSON:
    • "simple": true, "sentences": [{"text": "foo bar baz", "indexes": [0, 2]}, {"text": "sample second sentence", indexes:[1]}]
  • Response JSON:
    • [["word1", "word2", "word3"]]

suggest.{sh|cmd}

You can use Curl-based nlpcraft/src/main/python/ctxword/bin/suggest.{sh|cmd} scripts for the suggestion processing of single sentences from the command line. Following call returns list of contextual suggestions for the 5th word (counting from zero) in the given sentence:

$ bin/suggest.sh "what is the chance of rain tomorrow?" 5
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   214  100   104  100   110    954   1009 --:--:-- --:--:-- --:--:--  1963
[
    [
        "rain",
        "snow",
        "rainfall",
        "precipitation",
        "rains",
        "flooding",
        "storms",
        "raining",
        "sunshine",
        "showers"
    ]
]

Copyright

Copyright (C) 2021 Apache Software Foundation