built-in-token-parser.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Docs
 layout: documentation
 id: built-in-token-parser
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div class="col-md-8 second-column">
     <section id="overview">
         <h2 class="section-title">Overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             {% scaladoc NCTokenParser NCTokenParser %} trait is part of <a href="api-components.html#model-pipeline">Model Pipeline</a>.
             Its implementation should parse user input plain text and split this text
             into <code>tokens</code> list.
             NLPCraft provides two English language token parser implementations:
             <a href="#parser-opennlp">Apache OpenNLP Based Parser</a> and
             <a href="#parser-stanford">Stanford NLP Based Parser</a>.
             Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
             <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
         </p>

     </section>

     <section id="parser-opennlp">
         <h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             There is {% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} implementation.

             This implementation is wrapper on
             <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer.
             </p>
     </section>

     <section id="parser-stanford">
         <h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             There is <code>NCStanfordNLPTokenParser</code> implementation.

             This implementation is wrapper on
             <a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer.
         </p>
     </section>
     <section id="remarks">
         <h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             There are two different English language implementations are provided because they have some difference
             in their algorithms and can provide different list of tokens for same user text input.
             Some built-in components are required token parser instance as their parameter.
         </p>
         <ul>
             <li>
                 If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components
                 you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline.
             </li>
             <li>
                 If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components
                 you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline.
             </li>
         </ul>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Overview</a></li>
         <li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li>
         <li><a href="#parser-stanford">Stanford NLP Based Parser</a></li>
         <li><a href="#remarks">Remarks</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Docs
	layout: documentation
	id: built-in-token-parser
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div class="col-md-8 second-column">
	<section id="overview">
	<h2 class="section-title">Overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	{% scaladoc NCTokenParser NCTokenParser %} trait is part of <a href="api-components.html#model-pipeline">Model Pipeline</a>.
	Its implementation should parse user input plain text and split this text
	into <code>tokens</code> list.
	NLPCraft provides two English language token parser implementations:
	<a href="#parser-opennlp">Apache OpenNLP Based Parser</a> and
	<a href="#parser-stanford">Stanford NLP Based Parser</a>.
	Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
	<a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
	</p>

	</section>

	<section id="parser-opennlp">
	<h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	There is {% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} implementation.

	This implementation is wrapper on
	<a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer.
	</p>
	</section>

	<section id="parser-stanford">
	<h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	There is <code>NCStanfordNLPTokenParser</code> implementation.

	This implementation is wrapper on
	<a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer.
	</p>
	</section>
	<section id="remarks">
	<h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	There are two different English language implementations are provided because they have some difference
	in their algorithms and can provide different list of tokens for same user text input.
	Some built-in components are required token parser instance as their parameter.
	</p>
	<ul>
	<li>
	If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components
	you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline.
	</li>
	<li>
	If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components
	you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline.
	</li>
	</ul>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#overview">Overview</a></li>
	<li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li>
	<li><a href="#parser-stanford">Stanford NLP Based Parser</a></li>
	<li><a href="#remarks">Remarks</a></li>
	{% include quick-links.html %}
	</ul>
	</div>