| --- |
| active_crumb: Light Switch RU <code><sub>ex</sub></code> |
| layout: documentation |
| id: light_switch_ru |
| fa_icon: fa-cube |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <div class="col-md-8 second-column example"> |
| <section id="overview"> |
| <h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| This example provides a very simple Russian language implementation for NLI-powered light switch. You can say something like |
| "Выключи свет по всем доме" or "Включи свет в детской". |
| By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the |
| actual light switching. |
| </p> |
| <p> |
| <b>Complexity:</b> <span class="complexity-two-star"><i class="fas fa-square"></i> <i class="fas fa-square"></i> <i class="far fa-square"></i></span><br/> |
| <span class="ex-src">Source code: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/lightswitch_ru">GitHub <i class="fab fa-fw fa-github"></i></a><br/></span> |
| <span class="ex-review-all">Review: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">All Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span> |
| </p> |
| </section> |
| <section id="new_project"> |
| <h2 class="section-title">Create New Project <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You can create new Scala projects in many ways - we'll use SBT |
| to accomplish this task. Make sure that <code>build.sbt</code> file has the following content: |
| </p> |
| <pre class="brush: js, highlight: [7, 8, 9, 10]"> |
| ThisBuild / version := "0.1.0-SNAPSHOT" |
| ThisBuild / scalaVersion := "3.2.2" |
| lazy val root = (project in file(".")) |
| .settings( |
| name := "NLPCraft LightSwitch RU Example", |
| version := "{{site.latest_version}}", |
| libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "{{site.latest_version}}", |
| libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2", |
| libraryDependencies += "org.languagetool" % "languagetool-core" % "6.0", |
| libraryDependencies += "org.languagetool" % "language-ru" % "6.0", |
| libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.15" % "test" |
| ) |
| </pre> |
| |
| <p> |
| <code>Lines 8, 9 and 10</code> add libraries which used for support base NLP operations with Russian language. |
| </p> |
| |
| <p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p> |
| <p>Create the following files so that resulting project structure would look like the following:</p> |
| <ul> |
| <li><code>lightswitch_model_ru.yaml</code> - YAML configuration file which contains model description.</li> |
| <li><code>LightSwitchRuModel.scala</code> - Model implementation.</li> |
| <li> |
| <code>NCRuSemanticEntityParser.scala</code> - Semantic entity parser, custom implementation of |
| {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} |
| for Russian language. |
| </li> |
| <li> |
| <code>NCRuLemmaPosTokenEnricher.scala</code> - Lemma and point of speech token enricher, custom implementation of |
| {% scaladoc NCTokenEnricher NCTokenEnricher %} |
| for Russian language. |
| </li> |
| <li> |
| <code>NCRuStopWordsTokenEnricher.scala</code> - Stop-words token enricher, custom implementation of |
| {% scaladoc NCTokenEnricher NCTokenEnricher %} |
| for Russian language. |
| </li> |
| <li> |
| <code>NCRuTokenParser.scala</code> - Token parser, custom implementation of |
| {% scaladoc NCTokenParser NCTokenParser %} |
| for Russian language. |
| </li> |
| <li><code>LightSwitchRuModelSpec.scala</code> - Test that allows to test your model.</li> |
| </ul> |
| <pre class="brush: plain, highlight: [7, 10, 14, 17, 18, 20, 24]"> |
| | build.sbt |
| +--project |
| | build.properties |
| \--src |
| +--main |
| | +--resources |
| | | lightswitch_model_ru.yaml |
| | \--scala |
| | \--demo |
| | | LightSwitchRuModel.scala |
| | \--nlp |
| | +--entity |
| | | \--parser |
| | | NCRuSemanticEntityParser.scala |
| | \--token |
| | +--enricher |
| | | NCRuLemmaPosTokenEnricher.scala |
| | | NCRuStopWordsTokenEnricher.scala |
| | \--parser |
| | NCRuTokenParser.scala |
| \--test |
| \--scala |
| \--demo |
| LightSwitchRuModelSpec.scala |
| </pre> |
| </section> |
| <section id="model"> |
| <h2 class="section-title">Data Model<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| We are going to start with declaring the static part of our model using YAML which we will later load |
| in our Scala-based model implementation. |
| Open <code>src/main/resources/<b>lightswitch_model_ru.yaml</b></code> |
| file and replace its content with the following YAML: |
| </p> |
| <pre class="brush: js, highlight: [1, 8, 13, 21]"> |
| macros: |
| "<TURN_ON>" : "{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}" |
| "<TURN_OFF>" : "{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}" |
| "<ENTIRE_OPT>" : "{весь|все|всё|повсюду|вокруг|полностью|везде|_}" |
| "<LIGHT_OPT>" : "{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}" |
| |
| elements: |
| - type: "ls:loc" |
| description: "Location of lights." |
| synonyms: |
| - "<ENTIRE_OPT> {здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная} комната}" |
| |
| - type: "ls:on" |
| groups: |
| - "act" |
| description: "Light switch ON action." |
| synonyms: |
| - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_ON>" |
| - "<TURN_ON> <ENTIRE_OPT> <LIGHT_OPT>" |
| |
| - type: "ls:off" |
| groups: |
| - "act" |
| description: "Light switch OFF action." |
| synonyms: |
| - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_OFF>" |
| - "<TURN_OFF> <ENTIRE_OPT> <LIGHT_OPT>" |
| - "без <ENTIRE_OPT> <LIGHT_OPT>" |
| </pre> |
| |
| <ul> |
| <li> |
| <code>Line 1</code> defines several macros that are used later on throughout the model's elements |
| to shorten the synonym declarations. Note how macros coupled with option groups |
| shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations. |
| </li> |
| <li> |
| <code>Lines 8, 13, 21</code> define three model elements: the location of the light, and actions to turn |
| the light on and off. Action elements belong to the same group <code>act</code> which |
| will be used in our intent, defined in <code>LightSwitchRuModel</code> class. Note that these model |
| elements are defined mostly through macros we have defined above. |
| |
| </li> |
| </ul> |
| <div class="bq info"> |
| <p><b>YAML vs. API</b></p> |
| <p> |
| As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions |
| can be provided programmatically inside Scala model <code>LightSwitchRuModel</code> class as well. |
| </p> |
| </div> |
| </section> |
| <section id="code"> |
| <h2 class="section-title">Model Class <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| Open <code>src/main/scala/demo/<b>LightSwitchRuModel.scala</b></code> file and replace its content with the following code: |
| </p> |
| <pre class="brush: scala, highlight: [11, 12, 13, 20, 21, 24, 25, 32]"> |
| package demo |
| |
| import com.google.gson.Gson |
| import org.apache.nlpcraft.* |
| import org.apache.nlpcraft.annotations.* |
| import demo.nlp.entity.parser.NCRuSemanticEntityParser |
| import demo.nlp.token.enricher.* |
| import demo.nlp.token.parser.NCRuTokenParser |
| import scala.jdk.CollectionConverters.* |
| |
| class LightSwitchRuModel extends NCModel( |
| NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch Example Model RU", "1.0"), |
| new NCPipelineBuilder(). |
| withTokenParser(new NCRuTokenParser()). |
| withTokenEnricher(new NCRuLemmaPosTokenEnricher()). |
| withTokenEnricher(new NCRuStopWordsTokenEnricher()). |
| withEntityParser(new NCRuSemanticEntityParser("lightswitch_model_ru.yaml")). |
| build |
| ): |
| @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*") |
| def onMatch( |
| ctx: NCContext, |
| im: NCIntentMatch, |
| @NCIntentTerm("act") actEnt: NCEntity, |
| @NCIntentTerm("loc") locEnts: List[NCEntity] |
| ): NCResult = |
| val action = if actEnt.getType == "ls:on" then "включить" else "выключить" |
| val locations = if locEnts.isEmpty then "весь дом" else locEnts.map(_.mkText).mkString(", ") |
| |
| // Add HomeKit, Arduino or other integration here. |
| // By default - just return a descriptive action string. |
| NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava)) |
| </pre> |
| <p> |
| The intent callback logic is very simple - we return a descriptive confirmation message |
| back (explaining what lights were changed). With action and location detected, you can add |
| the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step: |
| </p> |
| <ul> |
| <li> |
| On <code>line 11</code> our class extends {% scaladoc NCModel NCModel %} with two mandatory parameters. |
| </li> |
| <li> |
| <code>Line 12</code> creates model configuration with most default parameters. |
| </li> |
| <li> |
| <code>Line 13</code> creates pipeline based on custom Russian language components: |
| <ul> |
| <li><code>NCRuTokenParser</code> - Token parser.</li> |
| <li><code>NCRuLemmaPosTokenEnricher</code> - Lemma and point of speech token enricher.</li> |
| <li><code>NCRuStopWordsTokenEnricher</code> - Stop-words token enricher.</li> |
| <li><code>NCRuSemanticEntityParser</code> - Semantic entity parser extending.</li> |
| </ul> |
| Note that <code>NCRuSemanticEntityParser</code> is based on semantic model definition |
| described in <code>lightswitch_model_ru.yaml</code> file. |
| </li> |
| <li> |
| <code>Lines 20 and 21</code> annotate intents <code>ls</code> and its callback method <code>onMatch()</code>. |
| Intent <code>ls</code> requires one action (a token belonging to the group <code>act</code>) and optional list of light locations |
| (zero or more tokens with ID <code>ls:loc</code>) - by default we assume the entire house as a default location. |
| </li> |
| <li> |
| <code>Lines 24 and 25</code> map terms from detected intent to the formal method parameters of the |
| <code>onMatch()</code> method. |
| </li> |
| <li> |
| On the <code>line 32</code> the intent callback simply returns a confirmation message. |
| </li> |
| </ul> |
| |
| </section> |
| <section id="custom"> |
| <h2 class="section-title">Custom Components <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| |
| <p> |
| Open <code>src/main/scala/demo/nlp/token/parser/<b>NCRuTokenParser.scala</b></code> file and replace its content with the following code: |
| </p> |
| |
| <pre class="brush: scala, highlight: [19]"> |
| package demo.nlp.token.parser |
| |
| import org.apache.nlpcraft.* |
| import org.languagetool.tokenizers.WordTokenizer |
| import scala.jdk.CollectionConverters.* |
| |
| class NCRuTokenParser extends NCTokenParser: |
| private val tokenizer = new WordTokenizer |
| |
| override def tokenize(text: String): List[NCToken] = |
| val toks = collection.mutable.ArrayBuffer.empty[NCToken] |
| var sumLen = 0 |
| |
| for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex) |
| val start = sumLen |
| val end = sumLen + word.length |
| |
| if word.strip.nonEmpty then |
| toks += new NCPropertyMapAdapter with NCToken: |
| override def getText: String = word |
| override def getIndex: Int = idx |
| override def getStartCharIndex: Int = start |
| override def getEndCharIndex: Int = end |
| |
| sumLen = end |
| |
| toks.toList |
| </pre> |
| |
| <ul> |
| <li> |
| <code>NCRuTokenParser</code> is a simple wrapper which implements |
| {% scaladoc NCTokenParser NCTokenParser %} |
| based on open source <a href="https://languagetool.org">Language Tool</a> library. |
| </li> |
| <li> |
| <code>Line 19</code> creates the {% scaladoc NCToken NCToken %} instance. |
| </li> |
| </ul> |
| |
| <p> |
| Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuLemmaPosTokenEnricher.scala</b></code> file and replace its content with the following code: |
| </p> |
| <pre class="brush: scala, highlight: [27, 28]"> |
| package demo.nlp.token.enricher |
| |
| import org.apache.nlpcraft.* |
| import org.languagetool.AnalyzedToken |
| import org.languagetool.tagging.ru.RussianTagger |
| import scala.jdk.CollectionConverters.* |
| |
| class NCRuLemmaPosTokenEnricher extends NCTokenEnricher: |
| private def nvl(v: String, dflt : => String): String = if v != null then v else dflt |
| |
| override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = |
| val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala |
| |
| require(toks.size == tags.size) |
| |
| toks.zip(tags).foreach { case (tok, tag) => |
| val readings = tag.getReadings.asScala |
| |
| val (lemma, pos) = readings.size match |
| // No data. Lemma is word as is, POS is undefined. |
| case 0 => (tok.getText, "") |
| // Takes first. Other variants ignored. |
| case _ => |
| val aTok: AnalyzedToken = readings.head |
| (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, "")) |
| |
| tok.put("pos", pos) |
| tok.put("lemma", lemma) |
| |
| () // Otherwise NPE. |
| } |
| </pre> |
| |
| <ul> |
| <li> |
| <code>NCRuLemmaPosTokenEnricher</code> lemma and point of speech tokens enricher is based on |
| open source <a href="https://languagetool.org">Language Tool</a> library. |
| </li> |
| <li> |
| On <code>line 27 and 28</code> the tokens are enriched by <code>pos</code> and <code>lemma</code> data. |
| </li> |
| </ul> |
| |
| <p> |
| Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuStopWordsTokenEnricher.scala</b></code> file and replace its content with the following code: |
| </p> |
| |
| <pre class="brush: scala, highlight: [17]"> |
| package demo.nlp.token.enricher |
| |
| import org.apache.lucene.analysis.ru.RussianAnalyzer |
| import org.apache.nlpcraft.* |
| |
| class NCRuStopWordsTokenEnricher extends NCTokenEnricher: |
| private val stops = RussianAnalyzer.getDefaultStopSet |
| |
| private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token.")) |
| private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token.")) |
| |
| override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = |
| for (t <- toks) |
| val lemma = getLemma(t) |
| lazy val pos = getPos(t) |
| |
| t.put( |
| "stopword", |
| lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) || |
| stops.contains(lemma.toLowerCase) || |
| pos.startsWith("PARTICLE") || |
| pos.startsWith("INTERJECTION") || |
| pos.startsWith("PREP") |
| ) |
| </pre> |
| |
| <ul> |
| <li> |
| <code>NCRuStopWordsTokenEnricher</code> is a stop-words tokens enricher based on |
| open source <a href="https://lucene.apache.org/">Apache Lucene</a> library. |
| </li> |
| <li> |
| On <code>line 17</code> tokens are enriched by <code>stopword</code> flags data. |
| </li> |
| </ul> |
| |
| <p> |
| Open <code>src/main/scala/demo/nlp/entity/parser/<b>NCRuSemanticEntityParser.scala</b></code> file and replace its content with the following code: |
| </p> |
| |
| <pre class="brush: scala, highlight: [8, 12]"> |
| package demo.nlp.entity.parser |
| |
| import opennlp.tools.stemmer.snowball.SnowballStemmer |
| import demo.nlp.token.parser.NCRuTokenParser |
| import org.apache.nlpcraft.nlp.parsers.* |
| import org.apache.nlpcraft.nlp.stemmer.NCStemmer |
| |
| class NCRuSemanticEntityParser(src: String) extends NCSemanticEntityParser( |
| new NCStemmer: |
| private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN) |
| override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString } |
| , |
| new NCRuTokenParser(), |
| src |
| ) |
| </pre> |
| |
| <ul> |
| <li> |
| <code>NCRuSemanticEntityParser</code> extends {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}. |
| It uses stemmer implementation from <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project. |
| </li> |
| </ul> |
| </section> |
| |
| <section id="testing"> |
| <h2 class="section-title">Testing <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| The test defined in <code>LightSwitchRuModelSpec</code> allows to check that all input test sentences are |
| processed correctly and trigger the expected intent <code>ls</code>: |
| </p> |
| <pre class="brush: scala, highlight: [9, 11]"> |
| package demo |
| |
| import org.apache.nlpcraft.* |
| import org.scalatest.funsuite.AnyFunSuite |
| import scala.util.Using |
| |
| class LightSwitchRuModelSpec extends AnyFunSuite: |
| test("test") { |
| Using.resource(new NCModelClient(new LightSwitchRuModel)) { client => |
| def check(txt: String): Unit = |
| require(client.debugAsk(txt, "userId", true).getIntentId == "ls") |
| |
| check("Выключи свет по всем доме") |
| check("Выруби электричество!") |
| check("Включи свет в детской") |
| check("Включай повсюду освещение") |
| check("Включайте лампы в детской комнате") |
| check("Свет на кухне, пожалуйста, приглуши") |
| check("Нельзя ли повсюду выключить свет?") |
| check("Пожалуйста без света") |
| check("Отключи электричество в ванной") |
| check("Выключи, пожалуйста, тут всюду свет") |
| check("Выключай все!") |
| check("Свет пожалуйста везде включи") |
| check("Зажги лампу на кухне") |
| } |
| } |
| </pre> |
| <ul> |
| <li> |
| <code>Line 9</code> creates the client for our model. |
| </li> |
| <li> |
| <code>Line 11</code> calls a special method |
| {% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}. |
| It allows to check the winning intent and its callback parameters without actually |
| calling the intent. |
| </li> |
| <li> |
| <code>Lines 13-25</code> define all the test input sentences that should all |
| trigger <code>ls</code> intent. |
| </li> |
| </ul> |
| <p> |
| You can run this test via SBT task <code>executeTests</code> or using IDE. |
| </p> |
| <pre class="brush: scala, highlight: []"> |
| $ sbt executeTests |
| </pre> |
| </section> |
| <section> |
| <h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> |
| <p> |
| You've created light switch data model and tested it. |
| </p> |
| </section> |
| </div> |
| <div class="col-md-2 third-column"> |
| <ul class="side-nav"> |
| <li class="side-nav-title">On This Page</li> |
| <li><a href="#overview">Overview</a></li> |
| <li><a href="#new_project">New Project</a></li> |
| <li><a href="#model">Data Model</a></li> |
| <li><a href="#code">Model Class</a></li> |
| <li><a href="#custom">Custom Components</a></li> |
| <li><a href="#testing">Testing</a></li> |
| {% include quick-links.html %} |
| </ul> |
| </div> |
| |
| |
| |
| |
| |
| |