| --- |
| active_crumb: Docs |
| layout: documentation |
| id: built-in-components |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Built-in components <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| Model <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> |
| is base model element. It defines chain of components traits which are responsible for sentence processing. |
| Some built-in implementations of these traits are described below. |
| </p> |
| |
| <div class="bq info"> |
| <p><b>Built-in component licenses.</b></p> |
| <p> |
| All built-in components which are based on <a href="https://nlp.stanford.edu/">Stanford NLP</a> models and classes |
| are provided with <a href="http://www.gnu.org/licenses/gpl-2.0.html">GNU General Public License</a>. |
| Look at Stanford NLP <a href="https://nlp.stanford.edu/software/">Software</a> page. |
| All such components are placed in special project module <code>nlpcraft-stanford</code>. |
| All other components are proved with <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License Version 2.0</a> license. |
| </p> |
| </div> |
| |
| <ul> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>. |
| There are provided two built-in implementations, both of them are for English language. |
| <ul> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>. |
| It is token parser implementation which is wrapper on |
| <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer. |
| </li> |
| <li> |
| <code>NCStanfordNLPTokenParser</code>. It is token parser implementation which is wrapper on |
| <a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer. |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a>. |
| There are provided a number of built-in implementations, all of them are for English language. |
| <ul> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a> - |
| this component allows to add <code>lemma</code> and <code>pos</code> values to processed token. |
| Look at these links fpr more details: <a href="https://www.wikiwand.com/en/Lemma_(morphology)">Lemma</a> and |
| <a href="https://www.wikiwand.com/en/Part-of-speech_tagging">Part of speech</a>. |
| Current implementation is based on <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project components. |
| Is uses Apache OpenNLP models, which are accessible |
| <a href="http://opennlp.sourceforge.net/models-1.5/">here</a> for POS taggers. |
| English lemmatization model is accessible <a href="https://raw.githubusercontent.com/richardwilly98/elasticsearch-opennlp-auto-tagging/master/src/main/resources/models/en-lemmatizer.dict">here</a>. |
| You can use any models which are compatible with Apache OpenNLP <a href="https://opennlp.apache.org/docs/2.0.0/apidocs/opennlp-tools/opennlp/tools/postag/POSTaggerME.html">POSTaggerME</a> and |
| <a href="https://opennlp.apache.org/docs/2.0.0/apidocs/opennlp-tools/opennlp/tools/lemmatizer/DictionaryLemmatizer.html">DictionaryLemmatizer</a> components. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnBracketsTokenEnricher.html">NCEnBracketsTokenEnricher</a> - |
| this component allows to add <code>brackets</code> boolean flag to processed token. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnQuotesTokenEnricher.html">NCEnQuotesTokenEnricher</a> - |
| this component allows to add <code>quoted</code> boolean flag to processed token. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a> - |
| this component allows to add <code>dict</code> boolean flag to processed token. |
| Note that it requires already defined <code>lemma</code> token property, |
| You can use <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a> or any another component which sets |
| <code>lemma</code> into the token. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a> - |
| this component allows to add <code>stopword</code> boolean flag to processed token. |
| It is based on predefined rules for English language, but it can be also extended by custom user word list and excluded list. |
| Note that it requires already defined <code>lemma</code> token property, |
| You can use <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a> or any another component which sets |
| <code>lemma</code> into the token. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnSwearWordsTokenEnricher.html">NCEnSwearWordsTokenEnricher</a> - |
| this component allows to add <code>swear</code> boolean flag to processed token. |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>. |
| There are provided a number of language independent built-in implementations. |
| They use their own models which can be on different languages. |
| <ul> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCNLPEntityParser.html">NCNLPEntityParser</a> converts NLP tokens into entities with four mandatory properties: |
| <code>nlp:token:text</code>, <code>nlp:token:index</code>, <code>nlp:token:startCharIndex</code> and |
| <code>nlp:token:endCharIndex</code>. However, if any other properties were added into |
| processed tokens by <a href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a> components, they will be copied also with names |
| prefixed with <code>nlp:token:</code>. |
| It is language independent component. |
| Note that converted tokens set can be restricted by predicate. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a> is wrapper on <a href="https://opennlp.apache.org/">Apache OpenNLP</a> NER components. |
| Look at the supported <b>Name Finder</b> models <a href="https://opennlp.sourceforge.net/models-1.5/">here</a>. |
| For example for English language are accessible: <code>Location</code>, <code>Money</code>, |
| <code>Person</code>, <code>Organization</code>, <code>Date</code>, <code>Time</code> and <code>Percentage</code>. |
| There are also accessible dome models for another languages. |
| </li> |
| <li> |
| <code>NCStanfordNLPEntityParser</code> is wrapper on <a href="https://nlp.stanford.edu/">Stanford NLP</a> NER components. |
| For example for English language are accessible: <code>Location</code>, <code>Money</code>, |
| <code>Person</code>, <code>Organization</code>, <code>Date</code>, <code>Time</code> and <code>Percent</code>. |
| There are also accessible dome models for another languages. |
| Look at the detailed information <a href="https://nlp.stanford.edu/software/CRF-NER.shtml">here</a>. |
| </li> |
| <li> |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a> is entity parser which is based on list of synonyms elements. |
| It is very important component which allow to solve a wide range of tasks. |
| If you want to use <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a> |
| with not English language, you have to provide custom |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticStemmer.html">NCSemanticStemmer</a> and |
| <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a> |
| implementations for required language. |
| Look at the <a href="examples/light_switch_fr.html">Light Switch FR</a> for more details. |
| Separated chapter <a href="semantic.html">Semantic parser</a> is dedicated to its detailed description. |
| </li> |
| </ul> |
| </li> |
| </ul> |
| |
| <p> |
| Following pipeline components cannot have build implementation because their logic are depend on concrete user model: |
| <a href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a>, |
| <a href="apis/latest/org/apache/nlpcraft/NCEntityEnricher.html">NCEntityEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a>, |
| <a href="apis/latest/org/apache/nlpcraft/NCEntityMapper.html">NCEntityMapper</a> and |
| <a href="apis/latest/org/apache/nlpcraft/NCVariantFilter.html">NCVariantFilter</a>. |
| </p> |
| |
| <p> |
| <a href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html">NCPipelineBuilder</a> class |
| is designed for simplifying preparing <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance. |
| It contains a number of methods <code>withSemantic()</code> which allow to prepare pipeline instance based on |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser">NCSemanticEntityParser</a> and configured language. |
| Currently only one language is supported - English. |
| It also adds following English components into pipeline: |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnSwearWordsTokenEnricher.html">NCEnSwearWordsTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnQuotesTokenEnricher.html">NCEnQuotesTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnBracketsTokenEnricher.html">NCEnBracketsTokenEnricher</a>. |
| </p> |
| </section> |
| |
| <section id="examples"> |
| <h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p><b>Simple example</b>:</p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build |
| </pre> |
| <ul> |
| <li> |
| It defines pipeline with all default English language components and one semantic entity parser with |
| model defined in <code>lightswitch_model.yaml</code>. |
| </li> |
| </ul> |
| |
| <p><b>Example with pipeline configured by built-in components:</b></p> |
| |
| <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15]"> |
| val pipeline = |
| val stanford = |
| val props = new Properties() |
| props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner") |
| new StanfordCoreNLP(props) |
| val tokParser = new NCStanfordNLPTokenParser(stanford) |
| val stemmer = new NCSemanticStemmer(): |
| private val ps = new PorterStemmer |
| override def stem(txt: String): String = ps.synchronized { ps.stem(txt) } |
| |
| new NCPipelineBuilder(). |
| withTokenParser(tokParser). |
| withTokenEnricher(new NCEnStopWordsTokenEnricher()). |
| withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))). |
| build |
| </pre> |
| <ul> |
| <li> |
| <code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance. |
| Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details. |
| </li> |
| <li> |
| <code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component. |
| Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and |
| in <code>NCSemanticEntityParser</code> definition on <code>line 15</code>. |
| </li> |
| <li> |
| <code>Line 7</code> defines simple implementation of semantic stemmer which is necessary part |
| of <code>NCSemanticEntityParser</code>. |
| </li> |
| <li> |
| <code>Line 13</code> defines configured <code>NCEnStopWordsTokenEnricher</code> token enricher. |
| </li> |
| <li> |
| <code>Line 14</code> defines <code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER |
| configured for number values detection. |
| </li> |
| <li> |
| <code>Line 14</code> defines <code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER |
| configured for number values detection. |
| </li> |
| <li> |
| <code>Line 15</code> defines pipeline building. |
| </li> |
| </ul> |
| |
| <p><b>Example with pipeline configured by custom components:</b></p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = |
| new NCPipelineBuilder(). |
| withTokenParser(new NCFrTokenParser()). |
| withTokenEnricher(new NCFrLemmaPosTokenEnricher()). |
| withTokenEnricher(new NCFrStopWordsTokenEnricher()). |
| withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")). |
| build |
| </pre> |
| |
| <ul> |
| <li> |
| There is the pipeline created for work with French Language. All components of this pipeline are custom components. |
| You can get fore information at examples description chapters: |
| <a href="examples/light_switch_fr.html">Light Switch FR</a> and |
| <a href="examples/light_switch_ru.html">Light Switch RU</a>. |
| Note that these custom components are mostly wrappers on existing solutions and |
| should be prepared just once when you start work with new language. |
| </li> |
| </ul> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#examples">Examples</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |