blob: a961125abe7cbdc34303aa7ba3506e59bd4f40a5 [file] [log] [blame]
---
active_crumb: Docs
layout: documentation
id: built-in-token-parser
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column">
<section id="overview">
<h2 class="section-title">Built-in Token Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
<a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
component implementation should parse user input plain text and split this text
into <code>tokens</code> list.
NLPCraft provides default English language implementation of token parser.
Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and
<a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations.
</p>
</section>
<section id="parser-opennlp">
<h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There is <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a> implementation.
It is token parser implementation which is wrapper on
<a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer.
</p>
</section>
<section id="parser-stanford">
<h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There is <a href="NCStanfordNLPTokenParser.html">NCStanfordNLPTokenParser</a> implementation.
It is token parser implementation which is wrapper on
<a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer.
</p>
</section>
<section id="remarks">
<h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
There are two built-in token parsers added for one English language because they have some difference
in their algorithm and can provide different list of tokens for same user text input.
Some built-in components are required token parser instance as their parameter.
</p>
<ul>
<li>
If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components
you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline.
</li>
<li>
If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components
you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline.
</li>
</ul>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li>
<li><a href="#parser-stanford">Stanford NLP Based Parser</a></li>
<li><a href="#remarks">Remarks</a></li>
{% include quick-links.html %}
</ul>
</div>