| --- |
| active_crumb: Docs |
| layout: documentation |
| id: built-in-builder |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Pipeline builder<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| <a href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html">NCPipelineBuilder</a> class |
| is designed for simplifying preparing <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance. |
| It allows to construct <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance |
| adding nested components via its methods. |
| It also contains a number of methods <a href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html#withSemantic-fffff4b0">withSemantic()</a> |
| which allow to prepare pipeline instance based on |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser">NCSemanticEntityParser</a> and configured language. |
| Currently only <b>English</b> language is supported with broad set of built-in components: |
| <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnSwearWordsTokenEnricher.html">NCEnSwearWordsTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnQuotesTokenEnricher.html">NCEnQuotesTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a>, |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnBracketsTokenEnricher.html">NCEnBracketsTokenEnricher</a>. |
| </p> |
| </section> |
| |
| <section id="examples"> |
| <h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p><b>Simple example</b>:</p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build |
| </pre> |
| <ul> |
| <li> |
| It defines pipeline with all built-in English language components and one semantic entity parser with |
| model defined in <code>lightswitch_model.yaml</code>. |
| </li> |
| </ul> |
| |
| <p><b>Pipeline creation example constructed from built-in components:</b></p> |
| |
| <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15, 16]"> |
| val pipeline = |
| val stanford = |
| val props = new Properties() |
| props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner") |
| new StanfordCoreNLP(props) |
| val tokParser = new NCStanfordNLPTokenParser(stanford) |
| val stemmer = new NCSemanticStemmer(): |
| private val ps = new PorterStemmer |
| override def stem(txt: String): String = ps.synchronized { ps.stem(txt) } |
| |
| new NCPipelineBuilder(). |
| withTokenParser(tokParser). |
| withTokenEnricher(new NCEnStopWordsTokenEnricher()). |
| withEntityParser(NCSemanticEntityParser(stemmer, tokParser, "pizzeria_model.yaml")). |
| withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))). |
| build |
| </pre> |
| <ul> |
| <li> |
| <code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance. |
| Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details. |
| </li> |
| <li> |
| <code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component. |
| Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and |
| in <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a> definition on <code>line 14</code>. |
| </li> |
| |
| <li> |
| <code>Line 13</code> defines configured |
| <a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher </a> |
| token enricher. |
| </li> |
| <li> |
| <code>Line 14</code> defines <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a> |
| configured in YAML file <code>pizzeria_model.yaml</code>. |
| It uses also simple implementation of <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticStemmer.html">NCSemanticStemmer</a> |
| created on <code>line 7</code> and token parser prepared on <code>line 6</code>. |
| </li> |
| <li> |
| <code>Line 15</code> defines <code>NCStanfordNLPEntityParser</code> based on Stanford NER |
| configured for number values detection. |
| </li> |
| <li> |
| <code>Line 16</code> defines pipeline building. |
| </li> |
| </ul> |
| |
| <p><b>Example with pipeline configured by custom components:</b></p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = |
| new NCPipelineBuilder(). |
| withTokenParser(new NCFrTokenParser()). |
| withTokenEnricher(new NCFrLemmaPosTokenEnricher()). |
| withTokenEnricher(new NCFrStopWordsTokenEnricher()). |
| withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")). |
| build |
| </pre> |
| |
| <ul> |
| <li> |
| There is the pipeline created for work with French Language. All components of this pipeline are custom components. |
| You can get fore information at examples description chapters: |
| <a href="examples/light_switch_fr.html">Light Switch FR</a> and |
| <a href="examples/light_switch_ru.html">Light Switch RU</a>. |
| Note that these custom components are mostly wrappers on existing open source on NLPCraft built-in solutions and |
| should be prepared just once when you start work with new language. |
| </li> |
| </ul> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#examples">Examples</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |