| --- |
| active_crumb: Docs |
| layout: documentation |
| id: built-in-token-parser |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Overview<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| {% scaladoc NCTokenParser NCTokenParser %} trait is part of <a href="api-components.html#model-pipeline">Model Pipeline</a>. |
| Its implementation should parse user input plain text and split this text |
| into <code>tokens</code> list. |
| NLPCraft provides two English language token parser implementations: |
| <a href="#parser-opennlp">Apache OpenNLP Based Parser</a> and |
| <a href="#parser-stanford">Stanford NLP Based Parser</a>. |
| Also, project contains examples for <a href="examples/light_switch_fr.html">French</a> and |
| <a href="examples/light_switch_ru.html">Russia</a> languages token parser implementations. |
| </p> |
| |
| </section> |
| |
| <section id="parser-opennlp"> |
| <h2 class="section-title">Apache OpenNLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| There is {% scaladoc nlp/parsers/NCOpenNLPTokenParser NCOpenNLPTokenParser %} implementation. |
| |
| This implementation is wrapper on |
| <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project tokenizer. |
| </p> |
| </section> |
| |
| <section id="parser-stanford"> |
| <h2 class="section-title">Stanford NLP Based Parser<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| There is <code>NCStanfordNLPTokenParser</code> implementation. |
| |
| This implementation is wrapper on |
| <a href="https://nlp.stanford.edu/">Stanford NLP</a> project tokenizer. |
| </p> |
| </section> |
| <section id="remarks"> |
| <h2 class="section-title">Remarks<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| There are two different English language implementations are provided because they have some difference |
| in their algorithms and can provide different list of tokens for same user text input. |
| Some built-in components are required token parser instance as their parameter. |
| </p> |
| <ul> |
| <li> |
| If you use <a href="https://opennlp.apache.org/">Apache OpenNLP</a> based components |
| you should use <a href="#parser-opennlp">Apache OpenNLP based parser</a> in your model pipeline. |
| </li> |
| <li> |
| If you use <a href="https://nlp.stanford.edu/">Stanford NLP</a> based components |
| you should use <a href="#parser-stanford">Stanford based parser</a> in your model pipeline. |
| </li> |
| </ul> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#parser-opennlp">Apache OpenNLP Based Parser</a></li> |
| <li><a href="#parser-stanford">Stanford NLP Based Parser</a></li> |
| <li><a href="#remarks">Remarks</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |