built-in-builder.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Docs
 layout: documentation
 id: built-in-builder
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div class="col-md-8 second-column">
     <section id="overview">
         <h2 class="section-title">Pipeline Builder<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             {% scaladoc NCPipelineBuilder NCPipelineBuilder %} class
             is designed for simplifying preparing {% scaladoc NCPipeline NCPipeline %} instance.
             It allows to prepare {% scaladoc NCPipeline NCPipeline %} instance
             adding pipeline chain components via its methods.
             Also, it contains a number of {% scaladoc NCPipelineBuilder#withSemantic-fffff4b0 withSemantic() %} methods
             which allow to prepare pipeline instance based on
             {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} and configured language.
             Currently only <b>English</b> language is supported.
             Pipeline for <b>English</b> language is created with useful set of built-in components.
         </p>
     </section>

     <section id="examples">
         <h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p><b>Simple example</b>:</p>

         <pre class="brush: scala, highlight: []">
             val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build
         </pre>
         <ul>
             <li>
                 It defines pipeline with all built-in English language components and one semantic entity parser with
                 model defined in <code>lightswitch_model.yaml</code>.
             </li>
             <li>
                 It adds to the pipeline by default token parser implementation
                 {% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} and
                 following token enrichers implementations:
                 {% scaladoc nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher NCOpenNLPLemmaPosTokenEnricher %},
                 {% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher %},
                 {% scaladoc nlp/enrichers/NCEnSwearWordsTokenEnricher NCEnSwearWordsTokenEnricher %},
                 {% scaladoc nlp/enrichers/NCEnQuotesTokenEnricher NCEnQuotesTokenEnricher %},
                 {% scaladoc nlp/enrichers/NCEnDictionaryTokenEnricher NCEnDictionaryTokenEnricher %},
                 {% scaladoc nlp/enrichers/NCEnBracketsTokenEnricher NCEnBracketsTokenEnricher %}.
             </li>
         </ul>

         <p><b>Pipeline creation example constructed from built-in components:</b></p>

         <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15, 16]">
             val pipeline =
                 val stanford =
                     val props = new Properties()
                     props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner")
                     new StanfordCoreNLP(props)
                 val tokParser = new NCStanfordNLPTokenParser(stanford)
                 val stemmer = new NCSemanticStemmer():
                     private val ps = new PorterStemmer
                     override def stem(txt: String): String = ps.synchronized { ps.stem(txt) }

                 new NCPipelineBuilder().
                     withTokenParser(tokParser).
                     withTokenEnricher(new NCEnStopWordsTokenEnricher()).
                     withEntityParser(NCSemanticEntityParser(stemmer, tokParser, "pizzeria_model.yaml")).
                     withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))).
                     build
         </pre>
         <ul>
             <li>
                 <code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance.
                 Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details.
             </li>
             <li>
                 <code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component.
                 Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and
                 in {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} definition on <code>line 14</code>.
             </li>

             <li>
                 <code>Line 13</code> defines configured
                 {% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher  %}
                 token enricher.
             </li>
             <li>
                 <code>Line 14</code> defines {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
                 configured in YAML file <code>pizzeria_model.yaml</code>.
                 It uses also simple implementation of {% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %}
                 created on <code>line 7</code> and token parser prepared on <code>line 6</code>.
             </li>
             <li>
                 <code>Line 15</code> defines <code>NCStanfordNLPEntityParser</code> based on Stanford NER
                 configured for number values detection.
             </li>
             <li>
                 <code>Line 16</code> defines pipeline building.
             </li>
         </ul>

         <p><b>Example with pipeline configured by custom components:</b></p>

         <pre class="brush: scala, highlight: []">
             val pipeline =
                 new NCPipelineBuilder().
                     withTokenParser(new NCFrTokenParser()).
                     withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
                     withTokenEnricher(new NCFrStopWordsTokenEnricher()).
                     withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
                     build
         </pre>

         <ul>
             <li>
                 There is the pipeline created for work with French Language. All components of this pipeline are custom components.
                 You can get fore information at examples description chapters:
                 <a href="examples/light_switch_fr.html">Light Switch FR</a> and
                 <a href="examples/light_switch_ru.html">Light Switch RU</a>.
                 Note that these custom components are mostly wrappers on existing open source on NLPCraft built-in solutions and
                 should be prepared just once when you start work with new language.
             </li>
         </ul>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Overview</a></li>
         <li><a href="#examples">Examples</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Docs
	layout: documentation
	id: built-in-builder
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div class="col-md-8 second-column">
	<section id="overview">
	<h2 class="section-title">Pipeline Builder<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	{% scaladoc NCPipelineBuilder NCPipelineBuilder %} class
	is designed for simplifying preparing {% scaladoc NCPipeline NCPipeline %} instance.
	It allows to prepare {% scaladoc NCPipeline NCPipeline %} instance
	adding pipeline chain components via its methods.
	Also, it contains a number of {% scaladoc NCPipelineBuilder#withSemantic-fffff4b0 withSemantic() %} methods
	which allow to prepare pipeline instance based on
	{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} and configured language.
	Currently only <b>English</b> language is supported.
	Pipeline for <b>English</b> language is created with useful set of built-in components.
	</p>
	</section>

	<section id="examples">
	<h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p><b>Simple example</b>:</p>

	<pre class="brush: scala, highlight: []">
	val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build
	</pre>
	<ul>
	<li>
	It defines pipeline with all built-in English language components and one semantic entity parser with
	model defined in <code>lightswitch_model.yaml</code>.
	</li>
	<li>
	It adds to the pipeline by default token parser implementation
	{% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} and
	following token enrichers implementations:
	{% scaladoc nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher NCOpenNLPLemmaPosTokenEnricher %},
	{% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher %},
	{% scaladoc nlp/enrichers/NCEnSwearWordsTokenEnricher NCEnSwearWordsTokenEnricher %},
	{% scaladoc nlp/enrichers/NCEnQuotesTokenEnricher NCEnQuotesTokenEnricher %},
	{% scaladoc nlp/enrichers/NCEnDictionaryTokenEnricher NCEnDictionaryTokenEnricher %},
	{% scaladoc nlp/enrichers/NCEnBracketsTokenEnricher NCEnBracketsTokenEnricher %}.
	</li>
	</ul>

	<p><b>Pipeline creation example constructed from built-in components:</b></p>

	<pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15, 16]">
	val pipeline =
	val stanford =
	val props = new Properties()
	props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner")
	new StanfordCoreNLP(props)
	val tokParser = new NCStanfordNLPTokenParser(stanford)
	val stemmer = new NCSemanticStemmer():
	private val ps = new PorterStemmer
	override def stem(txt: String): String = ps.synchronized { ps.stem(txt) }

	new NCPipelineBuilder().
	withTokenParser(tokParser).
	withTokenEnricher(new NCEnStopWordsTokenEnricher()).
	withEntityParser(NCSemanticEntityParser(stemmer, tokParser, "pizzeria_model.yaml")).
	withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))).
	build
	</pre>
	<ul>
	<li>
	<code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance.
	Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details.
	</li>
	<li>
	<code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component.
	Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and
	in {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} definition on <code>line 14</code>.
	</li>

	<li>
	<code>Line 13</code> defines configured
	{% scaladoc nlp/enrichers/NCEnStopWordsTokenEnricher NCEnStopWordsTokenEnricher %}
	token enricher.
	</li>
	<li>
	<code>Line 14</code> defines {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
	configured in YAML file <code>pizzeria_model.yaml</code>.
	It uses also simple implementation of {% scaladoc nlp/parsers/NCSemanticStemmer NCSemanticStemmer %}
	created on <code>line 7</code> and token parser prepared on <code>line 6</code>.
	</li>
	<li>
	<code>Line 15</code> defines <code>NCStanfordNLPEntityParser</code> based on Stanford NER
	configured for number values detection.
	</li>
	<li>
	<code>Line 16</code> defines pipeline building.
	</li>
	</ul>

	<p><b>Example with pipeline configured by custom components:</b></p>

	<pre class="brush: scala, highlight: []">
	val pipeline =
	new NCPipelineBuilder().
	withTokenParser(new NCFrTokenParser()).
	withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
	withTokenEnricher(new NCFrStopWordsTokenEnricher()).
	withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
	build
	</pre>

	<ul>
	<li>
	There is the pipeline created for work with French Language. All components of this pipeline are custom components.
	You can get fore information at examples description chapters:
	<a href="examples/light_switch_fr.html">Light Switch FR</a> and
	<a href="examples/light_switch_ru.html">Light Switch RU</a>.
	Note that these custom components are mostly wrappers on existing open source on NLPCraft built-in solutions and
	should be prepared just once when you start work with new language.
	</li>
	</ul>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#overview">Overview</a></li>
	<li><a href="#examples">Examples</a></li>
	{% include quick-links.html %}
	</ul>
	</div>