built-in-token-parser.html - incubator-nlpcraft-website - Git at Google

 ---
 active_crumb: Docs
 layout: documentation
 id: built-in-token-parser
 ---

 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->

 <div class="col-md-8 second-column">
     <section id="overview">
         <h2 class="section-title">Built-in Token Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
             component implementation should parse user input plain text and split this text
             into <code>tokens</code> list.

             NLPCraft provides default English language implementation of token parser.

             Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
             <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
         </p>

     </section>

     <section id="parser-opennlp">
         <h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             There is <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a> implementation.

             It is token parser implementation which is wrapper on
             <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer.
             </p>
     </section>

     <section id="parser-stanford">
         <h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
         <p>
             There is <a href="NCStanfordNLPTokenParser.html">NCStanfordNLPTokenParser</a> implementation.

             It is token parser implementation which is wrapper on
             <a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer.
         </p>
     </section>
     <section id="remarks">
         <h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

         <p>
             There are two built-in token parsers added for one English language because they have some difference
             in their algorithm and can provide different list of tokens for same user text input.
             Some built-in components are required token parser instance as their parameter.

         </p>
         <ul>
             <li>
                 If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components
                 you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline.
             </li>
             <li>
                 If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components
                 you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline.
             </li>
         </ul>
     </section>
 </div>
 <div class="col-md-2 third-column">
     <ul class="side-nav">
         <li class="side-nav-title">On This Page</li>
         <li><a href="#overview">Overview</a></li>
         <li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li>
         <li><a href="#parser-stanford">Stanford NLP Based Parser</a></li>
         <li><a href="#remarks">Remarks</a></li>
         {% include quick-links.html %}
     </ul>
 </div>
	---
	active_crumb: Docs
	layout: documentation
	id: built-in-token-parser
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<div class="col-md-8 second-column">
	<section id="overview">
	<h2 class="section-title">Built-in Token Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	<a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
	component implementation should parse user input plain text and split this text
	into <code>tokens</code> list.

	NLPCraft provides default English language implementation of token parser.

	Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
	<a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
	</p>

	</section>

	<section id="parser-opennlp">
	<h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	There is <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a> implementation.

	It is token parser implementation which is wrapper on
	<a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer.
	</p>
	</section>

	<section id="parser-stanford">
	<h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
	<p>
	There is <a href="NCStanfordNLPTokenParser.html">NCStanfordNLPTokenParser</a> implementation.

	It is token parser implementation which is wrapper on
	<a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer.
	</p>
	</section>
	<section id="remarks">
	<h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>

	<p>
	There are two built-in token parsers added for one English language because they have some difference
	in their algorithm and can provide different list of tokens for same user text input.
	Some built-in components are required token parser instance as their parameter.

	</p>
	<ul>
	<li>
	If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components
	you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline.
	</li>
	<li>
	If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components
	you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline.
	</li>
	</ul>
	</section>
	</div>
	<div class="col-md-2 third-column">
	<ul class="side-nav">
	<li class="side-nav-title">On This Page</li>
	<li><a href="#overview">Overview</a></li>
	<li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li>
	<li><a href="#parser-stanford">Stanford NLP Based Parser</a></li>
	<li><a href="#remarks">Remarks</a></li>
	{% include quick-links.html %}
	</ul>
	</div>