| --- |
| active_crumb: Integrations |
| layout: documentation |
| id: integrations |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <!--suppress CheckImageSize --> |
| <div id="integrations" class="col-md-8 second-column"> |
| <section> |
| <span id="overview" class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| NLPCraft provides several integration points for a underlying SQL storage, <a href="#gridgain">GridGain Control Center</a> and |
| <a href="#nlp">NLP functionality</a>. |
| </p> |
| <span id="nlp" class="section-title">NLP Functionality <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></span> |
| <p> |
| NLPCraft comes with integrations for several 3rd party NLP libraries and projects. External |
| integrations can be used for two distinct purposes inside of NLPCraft: |
| </p> |
| <ul> |
| <li> |
| <b>Base NLP Engine</b> |
| <p> |
| As a base NLP engine the external project is responsible for all basic NLP pre-processing |
| such as tokenization, lemmatization, stemmatization, PoS tagging, etc. Base NLP engine |
| has significant performance requirement and therefore cannot be based on a APIs that |
| requires a network trip. |
| </p> |
| </li> |
| <li> |
| <b>Token Provider</b> |
| <p> |
| As a token provider the external project will be used for detection of the named entities. |
| </p> |
| </li> |
| </ul> |
| <p> |
| Note that the same external project can be used for both roles, and projects can be mixed and matched |
| together through NLPCraft configuration. You can only have one base NLP engine but you can configure |
| multiple token providers. The following table shows supported 3rd party integrations and |
| their roles: |
| </p> |
| <table class="gradient-table checks"> |
| <thead> |
| <tr> |
| <th>Project</th> |
| <th>Base NLP Engine</th> |
| <th>Token Provider</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>NLPCraft</td> |
| <td style="text-align: center;"><i class="fas fa-times"></i></td> |
| <td style="text-align: center;"><i class="fas fa-check-double"></i></td> |
| </tr> |
| <tr> |
| <td><a href="#opennlp">OpenNLP</a></td> |
| <td style="text-align: center;"><i class="fas fa-check-double"></i></td> |
| <td style="text-align: center;"><i class="fas fa-check"></i></td> |
| </tr> |
| <tr> |
| <td><a href="#google">Google Natural Language</a></td> |
| <td style="text-align: center;"><i class="fas fa-times"></i></td> |
| <td style="text-align: center;"><i class="fas fa-check"></i></td> |
| </tr> |
| <tr> |
| <td><a href="#stanford">Stanford CoreNLP</a></td> |
| <td style="text-align: center;"><i class="fas fa-check"></i></td> |
| <td style="text-align: center;"><i class="fas fa-check"></i></td> |
| </tr> |
| <tr> |
| <td><a href="#spacy">spaCy</a></td> |
| <td style="text-align: center;"><i class="fas fa-times"></i></td> |
| <td style="text-align: center;"><i class="fas fa-check"></i></td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <div class="bq warn"> |
| <b>Configuring Token Providers</b> |
| <p> |
| REST server configuration support zero or more token providers. Data models also have to specify |
| the specific tokens they are expecting the REST server and probe to detect. This is done to limit the |
| unnecessary processing since implicit enabling of all token providers and all tokens can lead to |
| a significant slow down of processing: |
| </p> |
| <ul> |
| <li> |
| REST server <a href="/server-and-probe.html">configuration property</a> <code>tokenProvides</code> provides the list of enabled token providers. |
| </li> |
| <li> |
| Data model provides its required tokens via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </div> |
| </section> |
| <section> |
| <img id="nlpcraft" class="img-title" src="/images/nlpcraft_logo_black.gif" height="48px" alt=""> |
| <p> |
| NLPCraft is an open source library for adding natural language Interface to any applications. |
| </p> |
| <h2 class="section-title">Base NLP Engine <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| N/A |
| </p> |
| <h2 class="section-title">Token Provider <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| NLPCraft provides its own set of built-in elements. NLPCraft token IDs start with <code>nlpcraft</code>. Note |
| also that all NLPCraft built-in tokens are normalized named entities (NNE), i.e. they provide normalized |
| information and not just their IDs: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Token ID</th> |
| <th>Description</th> |
| <th>Example</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>nlpcraft:nlp</code></td> |
| <td> |
| <p> |
| This token denotes a word (always a single word) that is not a part of any other token. It's |
| also call a free-word, i.e. a word that is not linked to any other detected model element. |
| </p> |
| <p> |
| <b>NOTE:</b> the metadata from this token defines a common set of NLP properties and |
| is present in every other token as well. |
| </p> |
| </td> |
| <td> |
| <ul> |
| <li>Jamie goes <code>home</code> (assuming that a word 'home' does not belong to any model element).</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:date</code></td> |
| <td> |
| This token denotes a date range. It recognizes dates from 1900 up to 2023. Note that it does not |
| currently recognize time component. |
| </td> |
| <td> |
| <ul> |
| <li>Meeting <code>next tuesday</code>.</li> |
| <li>Report for entire <code>2018 year</code>.</li> |
| <li>Data <code>from 1/1/2017 to 12/31/2018</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:num</code></td> |
| <td> |
| This token denotes a single numeric value or numeric condition. |
| </td> |
| <td> |
| <ul> |
| <li>Price <code>> 100</code>.</li> |
| <li>Price is <code>less than $100</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:continent</code></td> |
| <td> |
| This token denotes a geographical continent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Africa</code>.</li> |
| <li>Surface area of <code>America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:subcontinent</code></td> |
| <td> |
| This token denotes a geographical subcontinent. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Alaskan peninsula</code>.</li> |
| <li>Surface area of <code>South America</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:region</code></td> |
| <td> |
| This token denotes a geographical region/state. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>California</code>.</li> |
| <li>Surface area of <code>South Dakota</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:country</code></td> |
| <td> |
| This token denotes a country. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>France</code>.</li> |
| <li>Surface area of <code>USA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:city</code></td> |
| <td> |
| This token denotes a city. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Paris</code>.</li> |
| <li>Surface area of <code>Washington DC</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:metro</code></td> |
| <td> |
| This token denotes a metro area. |
| </td> |
| <td> |
| <ul> |
| <li>Population of <code>Cedar Rapids-Waterloo-Iowa City & Dubuque, IA</code> metro area.</li> |
| <li>Surface area of <code>Norfolk-Portsmouth-Newport News, VA</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:sort</code></td> |
| <td> |
| This token denotes a sorting or ordering. |
| </td> |
| <td> |
| <ul> |
| <li>Report <code>sorted from top to bottom</code>.</li> |
| <li>Analysis <code>sorted in descending order</code>.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:limit</code></td> |
| <td> |
| This token denotes a numerical limit. |
| </td> |
| <td> |
| <ul> |
| <li>Show <code>top 5</code> brands.</li> |
| <li>Show <code>several</code> brands.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:coordinate</code></td> |
| <td> |
| This token denotes a latitude and longitude coordinates. |
| </td> |
| <td> |
| <ul> |
| <li>Route the path to <code>55.7558, 37.6173</code> location.</li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td><code>nlpcraft:relation</code></td> |
| <td> |
| This token denotes a relation function: |
| <code>compare</code> or |
| <code>correlate</code>. Note this token always need another two tokens that it references. |
| </td> |
| <td> |
| <ul> |
| <li> |
| What is the <code><b>correlation between</b></code> <code>price</code> <code><b>and</b></code> <code>location</code> |
| (assuming that 'price' and 'location' are also detected tokens). |
| </li> |
| </ul> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| <p> |
| <b>NOTES:</b> |
| </p> |
| <ul> |
| <li> |
| See <a href="data-model.html#meta">token metadata</a> documentation for detailed information |
| for token metadata properties. |
| </li> |
| <li> |
| Make sure to enable this token provider <code>nlpcraft</code> in REST server configuration |
| using <code>nlpcraft.server.tokenProviders</code> property. |
| </li> |
| <li> |
| Make sure to also properly configure required tokens in you model configuration via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="opennlp" class="img-title" src="/images/opennlp-logo.png" height="48px" alt=""> |
| <p> |
| <a href="https://opennlp.apache.org">Apache OpenNLP</a> is an open-source library for a machine learning based |
| processing of natural language text. |
| </p> |
| <h2 class="section-title">Base NLP Engine <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| <a href="https://opennlp.apache.org">Apache OpenNLP</a> is used by NLPCraft as a default base NLP engine. You can also set |
| it explicitly on REST server and probe via configuration property: <code>nlpcraft.nlpEngine=opennlp</code> |
| </p> |
| <h2 class="section-title">Token Provider <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| OpenNLP can be used independently as a token provider (even if other library is used as a base NLP engine). |
| OpenNLP provides its own set of built-in tokens supported by NLPCraft. |
| OpenNLP token IDs have a form of <code>opennlp:xxx</code>, where <code>xxx</code> is a lower case |
| name of the named entity in OpenNLP. |
| </p> |
| <p> |
| Configuration notes: |
| </p> |
| <ul> |
| <li> |
| <p> |
| OpenNLP integration is configured with the following pre-train English OpenNLP |
| <a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">models</a> version 1.5: |
| </p> |
| <table class="gradient-table"> |
| <thead> |
| <tr> |
| <th>Named Entity</th> |
| <th>OpenNLP Model</th> |
| <th>Token ID</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>Location</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-location.bin</a></td> |
| <td><code>opennlp:location</code></td> |
| </tr> |
| <tr> |
| <td>Money</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-money.bin</a></td> |
| <td><code>opennlp:money</code></td> |
| </tr> |
| <tr> |
| <td>Person</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-person.bin</a></td> |
| <td><code>opennlp:person</code></td> |
| </tr> |
| <tr> |
| <td>Organization</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-organization.bin</a></td> |
| <td><code>opennlp:organization</code></td> |
| </tr> |
| <tr> |
| <td>Date</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-date.bin</a></td> |
| <td><code>opennlp:date</code></td> |
| </tr> |
| <tr> |
| <td>Time</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-time.bin</a></td> |
| <td><code>opennlp:time</code></td> |
| </tr> |
| <tr> |
| <td>Percentage</td> |
| <td><a target="opennlp" href="https://opennlp.sourceforge.net/models-1.5/">en-ner-percentage.bin</a></td> |
| <td><code>opennlp:percentage</code></td> |
| </tr> |
| </tbody> |
| </table> |
| </li> |
| <li> |
| See <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> |
| documentation for token properties. |
| </li> |
| <li> |
| Make sure to enable this token provider <code>opennlp</code> in REST server configuration |
| using <code>nlpcraft.server.tokenProviders</code> property. |
| </li> |
| <li> |
| Make sure to properly configure required tokens in you model configuration via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="google" class="img-title" src="/images/google-cloud-logo-small.png" height="56px" alt=""> |
| <p> |
| <a href="https://cloud.google.com/natural-language/">Google Natural Language</a> uses machine learning |
| to reveal the structure and meaning of text. |
| </p> |
| <h2 class="section-title">Base NLP Engine <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| N/A |
| </p> |
| <h2 class="section-title">Token Provider <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Google Natural Language provides its own set of built-in elements. |
| To use Google token provider the environment variable <code>GOOGLE_APPLICATION_CREDENTIALS</code> |
| should be configured to point to proper Google JSON credential file (see |
| <a href="https://cloud.google.com/docs/authentication/production">Google documentation</a> for more details). |
| Google Natural Language token IDs have a form of <code>google:xxx</code>, where <code>xxx</code> is a lower |
| case name of the Named Entity in Google APIs, i.e. <code>google:person</code>, <code>google:location</code>, |
| etc. |
| </p> |
| <p>Configuration notes:</p> |
| <ul> |
| <li> |
| See Google Natural Language |
| <a target="google" href="https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity#Type">documentation</a> |
| for more details on supported tokens. |
| </li> |
| <li> |
| See <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> documentation for token properties. |
| </li> |
| <li> |
| Make sure to enable this token provider <code>google</code> in REST server configuration |
| using <code>nlpcraft.server.tokenProviders</code> property. |
| </li> |
| <li> |
| Make sure to also properly configure required tokens in you model configuration via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="stanford" class="img-title" src="/images/corenlp-logo.png" height="64px" alt=""> |
| <p> |
| <a href="https://stanfordnlp.github.io/CoreNLP">Stanford CoreNLP</a> is a set of human language technology tools. |
| </p> |
| <p> |
| Note that due to the fact that Stanford CoreNLP |
| is licensed under <a target=_ href="https://www.gnu.org/licenses/gpl-3.0.en.html">GNU General Public License v3</a> you need to add |
| both Stanford CoreNLP |
| dependencies and NLPCraft Stanford CoreNLP integration separately and make them available to your project. |
| </p> |
| <p> |
| Default <code>pom.xml</code> shipped with NLPCraft release contains Stanford CoreNLP dependency in a separate |
| <code>stanford-corenlp</code> profile. To use this, you need to enable this profile when building the project |
| from sources, i.e. <code class="script">mvn clean package -P stanford-corenlp</code>, or enable this profile in maven |
| configuration: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-stanfordnlp-maven" role="tab">Maven <sup>Java</sup></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-stanfordnlp-grape" role="tab" aria-controls="nav-profile" aria-selected="false">Grape <sup>Groovy</sup></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-stanfordnlp-gradle" role="tab" aria-controls="nav-profile" aria-selected="false">Gradle <sup>Kotlin</sup></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-stanfordnlp-sbt" role="tab" aria-controls="nav-contact" aria-selected="false">SBT <sup>Scala</sup></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-stanfordnlp-maven" role="tabpanel"> |
| <pre class="brush: xml, highlight: 4"> |
| <dependency> |
| <groupId>edu.stanford.nlp</groupId> |
| <artifactId>stanford-corenlp</artifactId> |
| <version>3.9.2</version> |
| </dependency> |
| <dependency> |
| <groupId>org.apache.nlpcraft</groupId> |
| <artifactId>nlpcraft-stanford</artifactId> |
| <version>{{site.latest_version}}</version> |
| </dependency> |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-stanfordnlp-grape" role="tabpanel"> |
| <pre class="brush: java"> |
| @Grab ('edu.stanford.nlp:stanford-corenlp:3.9.2') |
| @Grab ('org.apache.nlpcraft:nlpcraft-stanford:{{site.latest_version}}') |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-stanfordnlp-gradle" role="tabpanel"> |
| <pre class="brush: java"> |
| dependencies { |
| runtime group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.9.2' |
| runtime group: 'org.apache.nlpcraft', name: 'nlpcraft-stanford', version: '{{site.latest_version}}' |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-stanfordnlp-sbt" role="tabpanel"> |
| <pre class="brush: scala"> |
| libraryDependencies += "edu.stanford.nlp" % "stanford-corenlp" % "3.9.2" |
| libraryDependencies += "org.apache.nlpcraft" % "nlpcraft-stanford" % "{{site.latest_version}}" |
| </pre> |
| </div> |
| </div> |
| <div class="bq warn"> |
| Make sure to change Stanford CoreNLP <code>3.9.2</code> version to the latest or required one. |
| </div> |
| <p> |
| Note that you can also <a target=_ href="https://stanfordnlp.github.io/CoreNLP/">download</a> |
| Stanford CoreNLP as a separate JAR file and add it to your |
| project classpath if you are not using, or instead of, build tools. |
| </p> |
| <h2 class="section-title">Base NLP Engine <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You can set Stanford CoreNLP as a base NLP engine: |
| </p> |
| <ul> |
| <li> |
| Set configuration property <code>nlpcraft.nlpEngine=stanford</code> |
| </li> |
| <li> |
| Stanford CoreNLP library must be available <b>on both</b> the REST server and the data probe. |
| </li> |
| </ul> |
| <h2 class="section-title">Token Provider <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Stanford CoreNLP can be used as a token provider independently from base NLP engine: |
| <p> |
| <ul> |
| <li> |
| Stanford CoreNLP library should <b>only</b> be available on the data probe. |
| </li> |
| </ul> |
| <p> |
| Stanford CoreNLP provides its own set of built-in elements. |
| Stanford CoreNLP token IDs have a form of <code>stanford:xxx</code>, where <code>xxx</code> is a lower |
| case name of the Named Entity in Stanford CoreNLP, i.e. <code>stanford:person</code>, <code>stanford:location</code>, |
| etc. |
| </p> |
| <p>Configuration notes:</p> |
| <ul> |
| <li> |
| See Stanford CoreNLP Named Entity Recognition |
| <a target="_blank" href="https://stanfordnlp.github.io/CoreNLP/ner.html">documentation</a> |
| for more details on supported token types. |
| </li> |
| <li> |
| See <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> |
| documentation for token properties. |
| </li> |
| <li> |
| Make sure to enable this token provider <code>stanford</code> in REST server configuration |
| using <code>nlpcraft.server.tokenProviders</code> property. |
| </li> |
| <li> |
| Make sure to also properly configure required tokens in you model configuration via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="spacy" class="img-title" src="/images/spacy-logo.png" height="48px" alt=""> |
| <p> |
| <a href="https://spacy.io">spaCy</a> is a free open-source library for Natural Language Processing in Python. |
| </p> |
| <h2 class="section-title">Base NLP Engine <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| N/A |
| </p> |
| <h2 class="section-title">Token Provider <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| spaCy provides its own set of built-in elements. NLPCraft integrates with spaCy via local Python-based |
| REST server <code>/src/main/python/spacy_proxy.py</code>. It is a very simple Flask-based implementation |
| that you can freely modify to change the spaCy models or their external attributes that are made available. |
| </p> |
| <p> |
| This is entire source code for this local REST server: |
| </p> |
| <pre class="brush: python, highlight: [11, 29, 30, 58, 59]"> |
| import urllib.parse |
| |
| import spacy |
| from flask import Flask, request |
| from flask_restful import Resource, Api |
| |
| # |
| # Add your own or modify spaCy libraries here. |
| # By default, the English model 'en_core_web_sm' is loaded. |
| # |
| nlp = spacy.load("en_core_web_sm") |
| |
| app = Flask(__name__) |
| api = Api(app) |
| |
| |
| class Ner(Resource): |
| @staticmethod |
| def get(): |
| |
| doc = nlp(urllib.parse.unquote_plus(request.args.get('text'))) |
| res = [] |
| for e in doc.ents: |
| meta = {} |
| |
| # Change the following two lines to implements your own logic for |
| # filling up meta object with custom user attributes. 'meta' should be a dictionary (JSON) |
| # with types 'string:string'. |
| for key in e._.span_extensions: |
| meta[key] = e._.__getattr__(key) |
| |
| res.append( |
| { |
| "text": e.text, |
| "from": e.start_char, |
| "to": e.end_char, |
| "ner": e.label_, |
| "vector": str(e.vector_norm), |
| "sentiment": str(e.sentiment), |
| "meta": meta |
| } |
| ) |
| |
| return res |
| |
| |
| api.add_resource(Ner, '/spacy') |
| |
| # |
| # Default endpoint is 'localhost:5002'. |
| # |
| # If the endpoint here is changed make sure to provide |
| # the same endpoint via configuration property 'nlpcraft.server.spacy.proxy.url', |
| # i.e. 'nlpcraft.server.spacy.proxy.url=myhost:1234' |
| # |
| if __name__ == '__main__': |
| app.run( |
| host="localhost", |
| port='5002' |
| ) |
| </pre> |
| <p> |
| You need to start this REST server before you can use spaCy integration in NLPCraft. Note that for |
| production environment it is recommended to use |
| <a target=_ href="https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface">WSGI-based server</a> instead. |
| </p> |
| <p> |
| Comments: |
| </p> |
| <ul> |
| <li> |
| On line 11 you can add or change spaCy models to be loaded. |
| </li> |
| <li> |
| On lines 29-30 you can change how spans' external attributes are collected. |
| </li> |
| <li> |
| On lines 58-59 you can change the endpoint on which this REST server starts. Note that you |
| need to change the same endpoint on REST server via configuration property <code>nlpcraft.server.spacy.proxy.url</code>, |
| e.g. <code>nlpcraft.server.spacy.proxy.url=myhost:1234</code>. |
| </li> |
| </ul> |
| <p> |
| spaCy token IDs have a form of <code>spacy:xxx</code>, where <code>xxx</code> is a lower case name of the Named Entity |
| in spaCy APIs, i.e. <code>spacy:person</code>, <code>spacy:location</code>, etc. |
| </p> |
| <p> |
| Configuration notes: |
| </p> |
| <ul> |
| <li> |
| See spaCy Named Entity Recognition |
| <a target="spacy" href="https://spacy.io/usage/linguistic-features#named-entities">documentation</a> |
| for more details on supported token types. |
| </li> |
| <li> |
| See <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCToken.html">NCToken</a> |
| documentation for token properties. |
| </li> |
| <li> |
| Make sure to enable this token provider <code>spacy</code> in REST server configuration |
| using <code>nlpcraft.server.tokenProviders</code> property. |
| </li> |
| <li> |
| Make sure to also properly configure required tokens in you model configuration via |
| <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCModelView.html#getEnabledBuiltInTokens()">NCModelView.getEnabledBuiltInTokens()</a> method. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="mysql" class="img-title" src="/images/mysql-logo.png" height="80px" alt=""> |
| <p> |
| You can install and use MySQL as a system database for the REST server instead of the built-in |
| distributed SQL storage from Apache Ignite that is used by default. Add the following dependency to your project: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-mysql-maven" role="tab">Maven <img src="/images/java2-h20.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-mysql-grape" role="tab">Grape <img src="/images/groovy-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-mysql-gradle" role="tab">Gradle <img src="/images/kotlin-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-mysql-sbt" role="tab">SBT <img src="/images/scala-logo-h16.png" alt=""></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-mysql-maven" role="tabpanel"> |
| <pre class="brush: xml, highlight: 4"> |
| <dependency> |
| <groupId>mysql</groupId> |
| <artifactId>mysql-connector-java</artifactId> |
| <version>8.0.15</version> |
| </dependency> |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-mysql-grape" role="tabpanel"> |
| <pre class="brush: java"> |
| @Grab ('mysql:mysql-connector-java:8.0.15') |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-mysql-gradle" role="tabpanel"> |
| <pre class="brush: java"> |
| dependencies { |
| runtime group: 'mysql', name: 'mysql-connector-java', version: '8.0.15' |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-mysql-sbt" role="tabpanel"> |
| <pre class="brush: scala"> |
| libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15" |
| </pre> |
| </div> |
| </div> |
| <p> |
| Comments: |
| </p> |
| <ul> |
| <li> |
| Make sure to change <code>8.0.15</code> version to the latest or required one. |
| </li> |
| <li> |
| Update configuration property <code>nlpcraft.server.database.jdbc</code> |
| with required JDBC driver class and JDBC URL. |
| </li> |
| <li> |
| Use scripts from <code>sql/mysql</code> folder to create database and initialize DB schema. |
| </li> |
| <li> |
| Note that you can also <a target=_ href="https://dev.mysql.com/downloads/connector/j">download</a> MySQL |
| JDBC driver as a separate JAR file and add it to your |
| project classpath if you are not using, or instead of, build tools. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="postgres" class="img-title" src="/images/postgresql-logo.png" height="80px" alt=""> |
| <p> |
| You can install and use PostgreSQL as a system database for the REST server instead of the built-in |
| distributed SQL storage from Apache Ignite that is used by default. Add the following dependency to your project: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-postgres-maven" role="tab">Maven <img src="/images/java2-h20.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-postgres-grape" role="tab">Grape <img src="/images/groovy-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-postgres-gradle" role="tab">Gradle <img src="/images/kotlin-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-postgres-sbt" role="tab">SBT <img src="/images/scala-logo-h16.png" alt=""></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-postgres-maven" role="tabpanel"> |
| <pre class="brush: xml, highlight: 4"> |
| <dependency> |
| <groupId>org.postgresql</groupId> |
| <artifactId>postgresql</artifactId> |
| <version>42.2.5</version> |
| </dependency> |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-postgres-grape" role="tabpanel"> |
| <pre class="brush: java"> |
| @Grab ('org.postgresql:postgresql:42.2.5') |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-postgres-gradle" role="tabpanel"> |
| <pre class="brush: java"> |
| dependencies { |
| runtime group: 'org.postgresql', name: 'postgresql', version: '42.2.5' |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-postgres-sbt" role="tabpanel"> |
| <pre class="brush: scala"> |
| libraryDependencies += "org.postgresql" % "postgresql" % "42.2.5" |
| </pre> |
| </div> |
| </div> |
| <p> |
| Comments: |
| </p> |
| <ul> |
| <li> |
| Make sure to change <code>42.2.5</code> version to the latest or required one. |
| </li> |
| <li> |
| Update configuration property <code>nlpcraft.server.database.jdbc</code> |
| with required JDBC driver class and JDBC URL. |
| </li> |
| <li> |
| Use scripts from <code>sql/postgres</code> folder to create database and initialize DB schema. |
| </li> |
| <li> |
| Note that you can also <a target=_ href="https://jdbc.postgresql.org/">download</a> PostgreSQL |
| JDBC driver as a separate JAR file and add it to your |
| project classpath if you are not using, or instead of, build tools. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="oracle" class="img-title" src="/images/oracle-logo.png" width="200px" alt=""> |
| <p> |
| You can install and use Oracle RDBMS as a system database for the REST server instead of the built-in |
| distributed SQL storage from Apache Ignite that is used by default. Add the following dependency to your project: |
| </p> |
| <nav> |
| <div class="nav nav-tabs" role="tablist"> |
| <a class="nav-item nav-link active" data-toggle="tab" href="#nav-oracle-maven" role="tab">Maven <img src="/images/java2-h20.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-oracle-grape" role="tab">Grape <img src="/images/groovy-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-oracle-gradle" role="tab">Gradle <img src="/images/kotlin-h18.png" alt=""></a> |
| <a class="nav-item nav-link" data-toggle="tab" href="#nav-oracle-sbt" role="tab">SBT <img src="/images/scala-logo-h16.png" alt=""></a> |
| </div> |
| </nav> |
| <div class="tab-content"> |
| <div class="tab-pane fade show active" id="nav-oracle-maven" role="tabpanel"> |
| <pre class="brush: xml, highlight: 4"> |
| <dependency> |
| <groupId>org.oracle</groupId> |
| <artifactId>ojdbc14</artifactId> |
| <version>10.2.0.4.0</version> |
| </dependency> |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-oracle-grape" role="tabpanel"> |
| <pre class="brush: java"> |
| @Grab ('org.oracle:ojdbc14:10.2.0.4.0') |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-oracle-gradle" role="tabpanel"> |
| <pre class="brush: java"> |
| dependencies { |
| runtime group: 'org.oracle', name: 'ojdbc14', version: '10.2.0.4.0' |
| } |
| </pre> |
| </div> |
| <div class="tab-pane fade" id="nav-oracle-sbt" role="tabpanel"> |
| <pre class="brush: scala"> |
| libraryDependencies += "org.oracle" % "ojdbc14" % "10.2.0.4.0" |
| </pre> |
| </div> |
| </div> |
| <p> |
| Comments: |
| </p> |
| <ul> |
| <li> |
| Make sure to change <code>10.2.0.4.0</code> version to the latest or required one. |
| </li> |
| <li> |
| Update configuration property <code>nlpcraft.server.database.jdbc</code> |
| with required JDBC driver class and JDBC URL. |
| </li> |
| <li> |
| Use scripts from <code>sql/oracle</code> folder to create database and initialize DB schema. |
| </li> |
| </ul> |
| </section> |
| <section> |
| <img id="gridgain" class="img-title" src="/images/gridgain-logo.png" width="200px" alt=""> |
| <p> |
| NLPCraft server is running on top of <a target="_" href="https://ignite.apache.org/">Apache Ignite</a>. |
| <a target="_" href="https://www.gridgain.com/">GridGain Systems</a> develops enterprise in-memory computing |
| platform that is based on Apache Ignite. GridGain also develops the <a target="_" href="https://www.gridgain.com/products/software/control-center">GridGain Control Center</a> that support Apache Ignite |
| and is available for free for Apache Ignite users. In order to use GridGain Control Center to manage and monitor |
| NLPCraft server internals you need to have <a target="_" href="https://www.gridgain.com/resources/download#controlcenter">GridGain Web Agent</a> installed and available on the classpath for NLPCraft server. |
| </p> |
| <p> |
| NLPCraft <code>pom.xml</code> comes with necessary dependencies that are located in a separate |
| <code>gridgain-agent</code> Maven profile. To enable GridGain Web Agent you need to manually enable this Maven profile |
| when building NLPCraft from source code. |
| </p> |
| <div class="bq warn"> |
| <p><b>GridGain Control Center</b></p> |
| <p> |
| Note that GridGain Control Center is a commercial software with free access for Apache Ignite. Its integration is not included |
| into standard Apache NLPCraft release. You need to manually enable the special <code>gridgain-agent</code> |
| Maven profile or <a target="_" href="https://www.gridgain.com/resources/download#controlcenter">download</a> and install GridGain Web Agent manually. |
| </p> |
| </div> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#nlpcraft">NLPCraft</a></li> |
| <li><a href="#opennlp">OpenNLP</a></li> |
| <li><a href="#google">Google</a></li> |
| <li><a href="#stanford">Stanford CoreNLP</a></li> |
| <li><a href="#spacy">spaCy</a></li> |
| <li><a href="#mysql">MySQL</a></li> |
| <li><a href="#postgres">PostgreSQL</a></li> |
| <li><a href="#oracle">Oracle</a></li> |
| <li><a href="#gridgain">GridGain</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |