blob: cc9ba8a9544d91e20103110c600cf13566fa31aa [file] [log] [blame]
---
active_crumb: Docs
layout: documentation
id: built-in-builder
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Pipeline builder<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<a href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html">NCPipelineBuilder</a> class
is designed for simplifying preparing <a href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance.
It contains a number of methods <a href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html#withSemantic-fffff4b0">withSemantic()</a>
which allow to prepare pipeline instance based on
<a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser">NCSemanticEntityParser</a> and configured language.
Currently only English language is supported.
It also adds following English built-in components into pipeline:
<a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnSwearWordsTokenEnricher.html">NCEnSwearWordsTokenEnricher</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnQuotesTokenEnricher.html">NCEnQuotesTokenEnricher</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a>,
<a href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnBracketsTokenEnricher.html">NCEnBracketsTokenEnricher</a>.
</p>
</section>
<section id="examples">
<h2 class="section-title">Examples <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p><b>Simple example</b>:</p>
<pre class="brush: scala, highlight: []">
val pipeline = new NCPipelineBuilder().withSemantic("en", "lightswitch_model.yaml").build
</pre>
<ul>
<li>
It defines pipeline with all default English language components and one semantic entity parser with
model defined in <code>lightswitch_model.yaml</code>.
</li>
</ul>
<p><b>Example with pipeline configured by built-in components:</b></p>
<pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15]">
val pipeline =
val stanford =
val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner")
new StanfordCoreNLP(props)
val tokParser = new NCStanfordNLPTokenParser(stanford)
val stemmer = new NCSemanticStemmer():
private val ps = new PorterStemmer
override def stem(txt: String): String = ps.synchronized { ps.stem(txt) }
new NCPipelineBuilder().
withTokenParser(tokParser).
withTokenEnricher(new NCEnStopWordsTokenEnricher()).
withEntityParser(new NCStanfordNLPEntityParser(stanford, Set("number"))).
build
</pre>
<ul>
<li>
<code>Line 2</code> defines configured <code>StanfordCoreNLP</code> class instance.
Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a> documentation for more details.
</li>
<li>
<code>Line 6</code> defines token parser <code>NCStanfordNLPTokenParser</code>, pipeline mandatory component.
Note that this one instance is used for two places: in pipeline definition on <code>line 12</code> and
in <code>NCSemanticEntityParser</code> definition on <code>line 15</code>.
</li>
<li>
<code>Line 7</code> defines simple implementation of semantic stemmer which is necessary part
of <code>NCSemanticEntityParser</code>.
</li>
<li>
<code>Line 13</code> defines configured <code>NCEnStopWordsTokenEnricher</code> token enricher.
</li>
<li>
<code>Line 14</code> defines <code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER
configured for number values detection.
</li>
<li>
<code>Line 14</code> defines <code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER
configured for number values detection.
</li>
<li>
<code>Line 15</code> defines pipeline building.
</li>
</ul>
<p><b>Example with pipeline configured by custom components:</b></p>
<pre class="brush: scala, highlight: []">
val pipeline =
new NCPipelineBuilder().
withTokenParser(new NCFrTokenParser()).
withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
withTokenEnricher(new NCFrStopWordsTokenEnricher()).
withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
build
</pre>
<ul>
<li>
There is the pipeline created for work with French Language. All components of this pipeline are custom components.
You can get fore information at examples description chapters:
<a href="examples/light_switch_fr.html">Light Switch FR</a> and
<a href="examples/light_switch_ru.html">Light Switch RU</a>.
Note that these custom components are mostly wrappers on existing solutions and
should be prepared just once when you start work with new language.
</li>
</ul>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#examples">Examples</a></li>
{% include quick-links.html %}
</ul>
</div>