| --- |
| active_crumb: Docs |
| layout: documentation |
| id: built-in-builder |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Pipeline Builder<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class |
| is designed for simplifying preparing {% scaladoc NCPipeline NCPipeline %} instance. |
| It allows to prepare {% scaladoc NCPipeline NCPipeline %} instance |
| adding pipeline chain components via its methods. |
| Also, it contains a number of {% scaladoc NCPipelineBuilder#withSemantic-fffff4b0 withSemantic() %} methods |
| which allow to prepare pipeline instance based on |
| {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} and configured language. |
| Currently only <b>English</b> language is supported. |
| Pipeline for <b>English</b> language is created with useful set of built-in components. |
| </p> |
| </section> |
| |
| <section id="examples"> |
| <h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p><b>Simple example</b>:</p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build |
| </pre> |
| <ul> |
| <li> |
| It defines pipeline with all built-in English language components and one semantic entity parser with |
| model defined in <code>lightswitch_model.yaml</code>. |
| </li> |
| <li> |
| It adds to the pipeline by default token parser implementation |
| {% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} and |
| following token enrichers implementations: |
| {% scaladoc nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher NCOpenNLPLemmaPosTokenEnricher %}, |
| {% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher %}, |
| {% scaladoc nlp/enrichers/NCEnSwearWordsTokenEnricher NCEnSwearWordsTokenEnricher %}, |
| {% scaladoc nlp/enrichers/NCEnQuotesTokenEnricher NCEnQuotesTokenEnricher %}, |
| {% scaladoc nlp/enrichers/NCEnDictionaryTokenEnricher NCEnDictionaryTokenEnricher %}, |
| {% scaladoc nlp/enrichers/NCEnBracketsTokenEnricher NCEnBracketsTokenEnricher %}. |
| </li> |
| </ul> |
| |
| <p><b>Pipeline creation example constructed from built-in components:</b></p> |
| |
| <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15, 16]"> |
| val pipeline = |
| val stanford = |
| val props = new Properties() |
| props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner") |
| new StanfordCoreNLP(props) |
| val tokParser = new NCStanfordNLPTokenParser(stanford) |
| val stemmer = new NCSemanticStemmer(): |
| private val ps = new PorterStemmer |
| override def stem(txt: String): String = ps.synchronized { ps.stem(txt) } |
| |
| new NCPipelineBuilder(). |
| withTokenParser(tokParser). |
| withTokenEnricher(new NCEnStopWordsTokenEnricher()). |
| withEntityParser(NCSemanticEntityParser(stemmer, tokParser, "pizzeria_model.yaml")). |
| withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))). |
| build |
| </pre> |
| <ul> |
| <li> |
| <code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance. |
| Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details. |
| </li> |
| <li> |
| <code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component. |
| Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and |
| in {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} definition on <code>line 14</code>. |
| </li> |
| |
| <li> |
| <code>Line 13</code> defines configured |
| {% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher %} |
| token enricher. |
| </li> |
| <li> |
| <code>Line 14</code> defines {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} |
| configured in YAML file <code>pizzeria_model.yaml</code>. |
| It uses also simple implementation of {% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %} |
| created on <code>line 7</code> and token parser prepared on <code>line 6</code>. |
| </li> |
| <li> |
| <code>Line 15</code> defines <code>NCStanfordNLPEntityParser</code> based on Stanford NER |
| configured for number values detection. |
| </li> |
| <li> |
| <code>Line 16</code> defines pipeline building. |
| </li> |
| </ul> |
| |
| <p><b>Example with pipeline configured by custom components:</b></p> |
| |
| <pre class="brush: scala, highlight: []"> |
| val pipeline = |
| new NCPipelineBuilder(). |
| withTokenParser(new NCFrTokenParser()). |
| withTokenEnricher(new NCFrLemmaPosTokenEnricher()). |
| withTokenEnricher(new NCFrStopWordsTokenEnricher()). |
| withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")). |
| build |
| </pre> |
| |
| <ul> |
| <li> |
| There is the pipeline created for work with French Language. All components of this pipeline are custom components. |
| You can get fore information at examples description chapters: |
| <a href="examples/light_switch_fr.html">Light Switch FR</a> and |
| <a href="examples/light_switch_ru.html">Light Switch RU</a>. |
| Note that these custom components are mostly wrappers on existing open source on NLPCraft built-in solutions and |
| should be prepared just once when you start work with new language. |
| </li> |
| </ul> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#examples">Examples</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |