| --- |
| active_crumb: Docs |
| layout: documentation |
| id: custom-components |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column"> |
| <section id="overview"> |
| <h2 class="section-title">Custom components <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| NLPCraft provides a numeric of useful built-in components which allow to solve a wide range of tasks |
| without coding. |
| But you can need to extend provided functionality and develop your own components. |
| Let's look how to do it and when it can be useful for all kind of components step by step. |
| </p> |
| </section> |
| <section id="token-parser"> |
| <h2 class="section-title">Token parser <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a> trait. |
| </p> |
| <p> |
| It's not often situation when you need to prepare your own language tokenizer. |
| Mostly it can be necessary if you want to work with some new language. |
| You have to prepare new implementation once and can use it for all projects on this language. |
| Usually you just should find open source solution and wrap it for |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a> trait. |
| </p> |
| <pre class="brush: scala, highlight: [2, 6]"> |
| import org.apache.nlpcraft.* |
| import org.languagetool.tokenizers.fr.FrenchWordTokenizer |
| import scala.jdk.CollectionConverters.* |
| |
| class NCFrTokenParser extends NCTokenParser: |
| private val tokenizer = new FrenchWordTokenizer |
| |
| override def tokenize(text: String): List[NCToken] = |
| val toks = collection.mutable.ArrayBuffer.empty[NCToken] |
| var sumLen = 0 |
| |
| for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex) |
| val start = sumLen |
| val end = sumLen + word.length |
| |
| if word.strip.nonEmpty then |
| toks += new NCPropertyMapAdapter with NCToken: |
| override def getText: String = word |
| override def getIndex: Int = idx |
| override def getStartCharIndex: Int = start |
| override def getEndCharIndex: Int = end |
| |
| sumLen = end |
| |
| toks.toList |
| </pre> |
| <ul> |
| <li> |
| <code>NCFrTokenParser</code> is a simple wrapper which implements <code>NCTokenParser</code> based on |
| open source <a href="https://languagetool.org">Language Tool</a> library. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="token-enricher"> |
| <h2 class="section-title">Token enricher <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a> trait. |
| </p> |
| <p> |
| Tokens enricher is component which allows to add additional properties to prepared tokens. |
| These tokens properties are used later when entities detection. |
| </p> |
| <pre class="brush: scala, highlight: [25, 26]"> |
| import org.apache.nlpcraft.* |
| import org.languagetool.AnalyzedToken |
| import org.languagetool.tagging.ru.RussianTagger |
| import scala.jdk.CollectionConverters.* |
| |
| class NCRuLemmaPosTokenEnricher extends NCTokenEnricher: |
| private def nvl(v: String, dflt : => String): String = if v != null then v else dflt |
| |
| override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = |
| val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala |
| |
| require(toks.size == tags.size) |
| |
| toks.zip(tags).foreach { case (tok, tag) => |
| val readings = tag.getReadings.asScala |
| |
| val (lemma, pos) = readings.size match |
| // No data. Lemma is word as is, POS is undefined. |
| case 0 => (tok.getText, "") |
| // Takes first. Other variants ignored. |
| case _ => |
| val aTok: AnalyzedToken = readings.head |
| (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, "")) |
| |
| tok.put("pos", pos) |
| tok.put("lemma", lemma) |
| |
| () // Otherwise NPE. |
| } |
| </pre> |
| <ul> |
| <li> |
| <code>Lines 25 and 26</code> enriches <a href="apis/latest/org/apache/nlpcraft/NCToken.html">NCToken</a> |
| by two new properties which can be used for <a href="intent-matching.html">Intent matching</a> later. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="token-validator"> |
| <h2 class="section-title">Token validator <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a> trait. |
| </p> |
| |
| <p> |
| There are tokens are inspected and exception can be thrown from user code to break user input processing. |
| </p> |
| |
| <pre class="brush: scala, highlight: [3]"> |
| new NCTokenValidator: |
| override def validate(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = |
| if toks.exists(_.contains("restrictionFlag")) |
| then throw new NCException("Sentence cannot be processed.") |
| </pre> |
| |
| <ul> |
| <li> |
| There is anonymous instance of <a href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a> |
| created. |
| </li> |
| <li> |
| <code>Lines 3</code> defines the rule when exception should be thrown and sentence processing should be stopped. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="entity-parser"> |
| <h2 class="section-title">Entity parser <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a> trait. |
| </p> |
| |
| <p> |
| Most important component which finds user specific data. |
| These defined entities are input for <a href="intent-matching.html">Intent matching</a> conditions. |
| If built-in <a href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a> |
| is not enough, you can implement your own NER searching here. |
| There is point for potential integrations with neural networks or any other solutions which |
| help you find and mark your domain specific named entities. |
| </p> |
| |
| <pre class="brush: scala, highlight: [5]"> |
| import org.apache.nlpcraft.* |
| |
| class CommentsEntityParser extends NCEntityParser : |
| def parse(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): List[NCEntity] = |
| if req.getText.trim.startsWith("--") then |
| List( |
| new NCPropertyMapAdapter with NCEntity : |
| override def getTokens: List[NCToken] = toks |
| override def getRequestId: String = req.getRequestId |
| override def getId: String = "comment" |
| ) |
| else |
| List.empty |
| </pre> |
| <ul> |
| <li> |
| In given example whole input sentence is marked as single element <code>comment</code> if |
| condition defined on <code>line 5</code> is <code>true</code>. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="entity-enricher"> |
| <h2 class="section-title">Entity enricher <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCEntityEnricher.html">NCEntityEnricher</a> trait. |
| </p> |
| <p> |
| Entity enricher is component which allows to add additional properties to prepared entities. |
| Can be useful for extending existing entity enrichers functionality. |
| </p> |
| |
| <pre class="brush: scala, highlight: [4, 10, 11]"> |
| import org.apache.nlpcraft.* |
| |
| object CityPopulationEntityEnricher: |
| val citiesPopulation: Map[String, Int] = someExternalService.getCitiesPopulation() |
| |
| import CityPopulationEntityEnricher.* |
| |
| class CityPopulationEntityEnricher extends NCEntityEnricher : |
| def enrich(req: NCRequest, cfg: NCModelConfig, ents: List[NCEntity]): Unit = |
| ents. |
| filter(_.getId == "city"). |
| foreach(e => e.put("city:population", citiesPopulation(e("city:name")))) |
| </pre> |
| |
| <ul> |
| <li> |
| <code>Line 4</code> defines getting cities population data from some external service. |
| </li> |
| <li> |
| <code>Line 10</code> filters entities by <code>ID</code>. |
| </li> |
| <li> |
| <code>Line 11</code> enriches entities by new <code>city:population</code> property. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="entity-mapper"> |
| <h2 class="section-title">Entity mapper<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCEntityMapper.html">NCEntityMapper</a> trait. |
| </p> |
| |
| <p> |
| Entity mapper is component which allows to map one set of entities into another after the entities |
| were parsed and enriched. Can be useful for building complex parsers based on existing. |
| </p> |
| |
| <pre class="brush: scala, highlight: [4, 10, 12, 13, 14]"> |
| import org.apache.nlpcraft.* |
| |
| object CityPopulationEntityMapper: |
| val citiesPopulation: Map[String, Int] = externalService.getCitiesPopulation() |
| |
| import CityPopulationEntityMapper.* |
| |
| class CityPopulationEntityMapper extends NCEntityMapper : |
| def map(req: NCRequest, cfg: NCModelConfig, ents: List[NCEntity]): List[NCEntity] = |
| val cities = ents.filter(_.getId == "city") |
| |
| ents.filterNot(_.getId == "city") ++ |
| cities ++ |
| cities.filter(city => citiesPopulation(city("city:name")) > 1000000). |
| map(city => |
| new NCPropertyMapAdapter with NCEntity : |
| override def getTokens: List[NCToken] = city.getTokens |
| override def getRequestId: String = req.getRequestId |
| override def getId: String = "big-city" |
| ) |
| </pre> |
| <ul> |
| <li> |
| <code>Line 4</code> defines getting cities population data from some external service. |
| </li> |
| <li> |
| <code>Line 10</code> filters entities by <code>ID</code>. |
| </li> |
| <li> |
| <code>Line 12, 13 and 14</code> define component result entities set. |
| It contains previously defined <code>city</code> elements, new elements <code>big-city</code> and |
| another not city elements. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="entity-validator"> |
| <h2 class="section-title">Entity validator<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a> trait. |
| </p> |
| <p> |
| Entities validator is user defined component, where prepared entities are inspected and exceptions |
| can be thrown from user code to break user input processing. |
| </p> |
| |
| <pre class="brush: scala, highlight: [3]"> |
| new NCEntityValidator : |
| override def validate(req: NCRequest, cfg: NCModelConfig, ents: List[NCEntity]): Unit = |
| if ents.exists(_.getId == "restrictedID") |
| then throw new NCException("Sentence cannot be processed.") |
| </pre> |
| |
| <ul> |
| <li> |
| There is anonymous instance of <a href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a> |
| created. |
| </li> |
| <li> |
| <code>Lines 3</code> defines the rule when exception should be thrown and sentence processing should be stopped. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="variant-filter"> |
| <h2 class="section-title">Variant filter<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You have to implement <a href="apis/latest/org/apache/nlpcraft/NCVariantFilter.html">NCVariantFilter</a> trait. |
| </p> |
| |
| <p> |
| Component which allows filtering detected variants, rejecting undesirable. |
| </p> |
| |
| <pre class="brush: scala, highlight: [3]"> |
| new NCVariantFilter : |
| def filter(req: NCRequest, cfg: NCModelConfig, vars: List[NCVariant]): List[NCVariant] = |
| vars.filter(_.getEntities.exists(_.getId == "requiredID")) |
| </pre> |
| |
| <ul> |
| <li> |
| There is anonymous instance of <a href="apis/latest/org/apache/nlpcraft/NCVariantFilter.html">NCVariantFilter</a> |
| created. |
| </li> |
| <li> |
| <code>Lines 3</code> defines variant's filter, |
| it passed only variants which contain <code>requiredID</code> elements. |
| </li> |
| </ul> |
| </section> |
| |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#token-parser">Token parser</a></li> |
| <li><a href="#token-enricher">Token enricher</a></li> |
| <li><a href="#token-validator">Token validator</a></li> |
| <li><a href="#entity-parser">Entity parser</a></li> |
| <li><a href="#entity-enricher">Entity enricher</a></li> |
| <li><a href="#entity-mapper">Entity mapper</a></li> |
| <li><a href="#entity-validator">Entity validator</a></li> |
| <li><a href="#variant-filter">Variant filter</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |