blob: 92a5cd2a09c8ea5bf23f3f20ccd59f56c8233cb5 [file] [log] [blame]
---
active_crumb: Light Switch RU <code><sub>ex</sub></code>
layout: documentation
id: light_switch_ru
fa_icon: fa-cube
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column example">
<section id="overview">
<h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This example provides a very simple Russian language implementation for NLI-powered light switch. You can say something like
"Выключи свет по всем доме" or "Включи свет в детской".
By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the
actual light switching.
</p>
<p>
<b>Complexity:</b> <span class="complexity-two-star"><i class="fas fa-square"></i> <i class="fas fa-square"></i> <i class="far fa-square"></i></span><br/>
<span class="ex-src">Source code: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/lightswitch_ru">GitHub <i class="fab fa-fw fa-github"></i></a><br/></span>
<span class="ex-review-all">Review: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">All Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span>
</p>
</section>
<section id="new_project">
<h2 class="section-title">Create New Project <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
You can create new Scala projects in many ways - we'll use SBT
to accomplish this task. Make sure that <code>build.sbt</code> file has the following content:
</p>
<pre class="brush: js, highlight: [7, 8, 9, 10]">
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "3.2.2"
lazy val root = (project in file("."))
.settings(
name := "NLPCraft LightSwitch RU Example",
version := "{{site.latest_version}}",
libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "{{site.latest_version}}",
libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2",
libraryDependencies += "org.languagetool" % "languagetool-core" % "6.0",
libraryDependencies += "org.languagetool" % "language-ru" % "6.0",
libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.15" % "test"
)
</pre>
<p>
<code>Lines 8, 9 and 10</code> add libraries which used for support base NLP operations with Russian language.
</p>
<p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p>
<p>Create the following files so that resulting project structure would look like the following:</p>
<ul>
<li><code>lightswitch_model_ru.yaml</code> - YAML configuration file which contains model description.</li>
<li><code>LightSwitchRuModel.scala</code> - Model implementation.</li>
<li>
<code>NCRuSemanticEntityParser.scala</code> - Semantic entity parser, custom implementation of
{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}
for Russian language.
</li>
<li>
<code>NCRuLemmaPosTokenEnricher.scala</code> - Lemma and point of speech token enricher, custom implementation of
{% scaladoc NCTokenEnricher NCTokenEnricher %}
for Russian language.
</li>
<li>
<code>NCRuStopWordsTokenEnricher.scala</code> - Stop-words token enricher, custom implementation of
{% scaladoc NCTokenEnricher NCTokenEnricher %}
for Russian language.
</li>
<li>
<code>NCRuTokenParser.scala</code> - Token parser, custom implementation of
{% scaladoc NCTokenParser NCTokenParser %}
for Russian language.
</li>
<li><code>LightSwitchRuModelSpec.scala</code> - Test that allows to test your model.</li>
</ul>
<pre class="brush: plain, highlight: [7, 10, 14, 17, 18, 20, 24]">
| build.sbt
+--project
| build.properties
\--src
+--main
| +--resources
| | lightswitch_model_ru.yaml
| \--scala
| \--demo
| | LightSwitchRuModel.scala
| \--nlp
| +--entity
| | \--parser
| | NCRuSemanticEntityParser.scala
| \--token
| +--enricher
| | NCRuLemmaPosTokenEnricher.scala
| | NCRuStopWordsTokenEnricher.scala
| \--parser
| NCRuTokenParser.scala
\--test
\--scala
\--demo
LightSwitchRuModelSpec.scala
</pre>
</section>
<section id="model">
<h2 class="section-title">Data Model<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
We are going to start with declaring the static part of our model using YAML which we will later load
in our Scala-based model implementation.
Open <code>src/main/resources/<b>lightswitch_model_ru.yaml</b></code>
file and replace its content with the following YAML:
</p>
<pre class="brush: js, highlight: [1, 8, 13, 21]">
macros:
"&lt;TURN_ON&gt;" : "{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}"
"&lt;TURN_OFF&gt;" : "{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}"
"&lt;ENTIRE_OPT&gt;" : "{весь|все|всё|повсюду|вокруг|полностью|везде|_}"
"&lt;LIGHT_OPT&gt;" : "{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}"
elements:
- type: "ls:loc"
description: "Location of lights."
synonyms:
- "&lt;ENTIRE_OPT&gt; {здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная} комната}"
- type: "ls:on"
groups:
- "act"
description: "Light switch ON action."
synonyms:
- "&lt;LIGHT_OPT&gt; &lt;ENTIRE_OPT&gt; &lt;TURN_ON&gt;"
- "&lt;TURN_ON&gt; &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
- type: "ls:off"
groups:
- "act"
description: "Light switch OFF action."
synonyms:
- "&lt;LIGHT_OPT&gt; &lt;ENTIRE_OPT&gt; &lt;TURN_OFF&gt;"
- "&lt;TURN_OFF&gt; &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
- "без &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
</pre>
<ul>
<li>
<code>Line 1</code> defines several macros that are used later on throughout the model's elements
to shorten the synonym declarations. Note how macros coupled with option groups
shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations.
</li>
<li>
<code>Lines 8, 13, 21</code> define three model elements: the location of the light, and actions to turn
the light on and off. Action elements belong to the same group <code>act</code> which
will be used in our intent, defined in <code>LightSwitchRuModel</code> class. Note that these model
elements are defined mostly through macros we have defined above.
</li>
</ul>
<div class="bq info">
<p><b>YAML vs. API</b></p>
<p>
As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions
can be provided programmatically inside Scala model <code>LightSwitchRuModel</code> class as well.
</p>
</div>
</section>
<section id="code">
<h2 class="section-title">Model Class <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Open <code>src/main/scala/demo/<b>LightSwitchRuModel.scala</b></code> file and replace its content with the following code:
</p>
<pre class="brush: scala, highlight: [11, 12, 13, 20, 21, 24, 25, 32]">
package demo
import com.google.gson.Gson
import org.apache.nlpcraft.*
import org.apache.nlpcraft.annotations.*
import demo.nlp.entity.parser.NCRuSemanticEntityParser
import demo.nlp.token.enricher.*
import demo.nlp.token.parser.NCRuTokenParser
import scala.jdk.CollectionConverters.*
class LightSwitchRuModel extends NCModel(
NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch Example Model RU", "1.0"),
new NCPipelineBuilder().
withTokenParser(new NCRuTokenParser()).
withTokenEnricher(new NCRuLemmaPosTokenEnricher()).
withTokenEnricher(new NCRuStopWordsTokenEnricher()).
withEntityParser(new NCRuSemanticEntityParser("lightswitch_model_ru.yaml")).
build
):
@NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*")
def onMatch(
ctx: NCContext,
im: NCIntentMatch,
@NCIntentTerm("act") actEnt: NCEntity,
@NCIntentTerm("loc") locEnts: List[NCEntity]
): NCResult =
val action = if actEnt.getType == "ls:on" then "включить" else "выключить"
val locations = if locEnts.isEmpty then "весь дом" else locEnts.map(_.mkText).mkString(", ")
// Add HomeKit, Arduino or other integration here.
// By default - just return a descriptive action string.
NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava))
</pre>
<p>
The intent callback logic is very simple - we return a descriptive confirmation message
back (explaining what lights were changed). With action and location detected, you can add
the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step:
</p>
<ul>
<li>
On <code>line 11</code> our class extends {% scaladoc NCModel NCModel %} with two mandatory parameters.
</li>
<li>
<code>Line 12</code> creates model configuration with most default parameters.
</li>
<li>
<code>Line 13</code> creates pipeline based on custom Russian language components:
<ul>
<li><code>NCRuTokenParser</code> - Token parser.</li>
<li><code>NCRuLemmaPosTokenEnricher</code> - Lemma and point of speech token enricher.</li>
<li><code>NCRuStopWordsTokenEnricher</code> - Stop-words token enricher.</li>
<li><code>NCRuSemanticEntityParser</code> - Semantic entity parser extending.</li>
</ul>
Note that <code>NCRuSemanticEntityParser</code> is based on semantic model definition
described in <code>lightswitch_model_ru.yaml</code> file.
</li>
<li>
<code>Lines 20 and 21</code> annotate intents <code>ls</code> and its callback method <code>onMatch()</code>.
Intent <code>ls</code> requires one action (a token belonging to the group <code>act</code>) and optional list of light locations
(zero or more tokens with ID <code>ls:loc</code>) - by default we assume the entire house as a default location.
</li>
<li>
<code>Lines 24 and 25</code> map terms from detected intent to the formal method parameters of the
<code>onMatch()</code> method.
</li>
<li>
On the <code>line 32</code> the intent callback simply returns a confirmation message.
</li>
</ul>
</section>
<section id="custom">
<h2 class="section-title">Custom Components <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Open <code>src/main/scala/demo/nlp/token/parser/<b>NCRuTokenParser.scala</b></code> file and replace its content with the following code:
</p>
<pre class="brush: scala, highlight: [19]">
package demo.nlp.token.parser
import org.apache.nlpcraft.*
import org.languagetool.tokenizers.WordTokenizer
import scala.jdk.CollectionConverters.*
class NCRuTokenParser extends NCTokenParser:
private val tokenizer = new WordTokenizer
override def tokenize(text: String): List[NCToken] =
val toks = collection.mutable.ArrayBuffer.empty[NCToken]
var sumLen = 0
for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex)
val start = sumLen
val end = sumLen + word.length
if word.strip.nonEmpty then
toks += new NCPropertyMapAdapter with NCToken:
override def getText: String = word
override def getIndex: Int = idx
override def getStartCharIndex: Int = start
override def getEndCharIndex: Int = end
sumLen = end
toks.toList
</pre>
<ul>
<li>
<code>NCRuTokenParser</code> is a simple wrapper which implements
{% scaladoc NCTokenParser NCTokenParser %}
based on open source <a href="https://languagetool.org">Language Tool</a> library.
</li>
<li>
<code>Line 19</code> creates the {% scaladoc NCToken NCToken %} instance.
</li>
</ul>
<p>
Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuLemmaPosTokenEnricher.scala</b></code> file and replace its content with the following code:
</p>
<pre class="brush: scala, highlight: [27, 28]">
package demo.nlp.token.enricher
import org.apache.nlpcraft.*
import org.languagetool.AnalyzedToken
import org.languagetool.tagging.ru.RussianTagger
import scala.jdk.CollectionConverters.*
class NCRuLemmaPosTokenEnricher extends NCTokenEnricher:
private def nvl(v: String, dflt : => String): String = if v != null then v else dflt
override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala
require(toks.size == tags.size)
toks.zip(tags).foreach { case (tok, tag) =>
val readings = tag.getReadings.asScala
val (lemma, pos) = readings.size match
// No data. Lemma is word as is, POS is undefined.
case 0 => (tok.getText, "")
// Takes first. Other variants ignored.
case _ =>
val aTok: AnalyzedToken = readings.head
(nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, ""))
tok.put("pos", pos)
tok.put("lemma", lemma)
() // Otherwise NPE.
}
</pre>
<ul>
<li>
<code>NCRuLemmaPosTokenEnricher</code> lemma and point of speech tokens enricher is based on
open source <a href="https://languagetool.org">Language Tool</a> library.
</li>
<li>
On <code>line 27 and 28</code> the tokens are enriched by <code>pos</code> and <code>lemma</code> data.
</li>
</ul>
<p>
Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuStopWordsTokenEnricher.scala</b></code> file and replace its content with the following code:
</p>
<pre class="brush: scala, highlight: [17]">
package demo.nlp.token.enricher
import org.apache.lucene.analysis.ru.RussianAnalyzer
import org.apache.nlpcraft.*
class NCRuStopWordsTokenEnricher extends NCTokenEnricher:
private val stops = RussianAnalyzer.getDefaultStopSet
private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token."))
private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token."))
override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
for (t <- toks)
val lemma = getLemma(t)
lazy val pos = getPos(t)
t.put(
"stopword",
lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) ||
stops.contains(lemma.toLowerCase) ||
pos.startsWith("PARTICLE") ||
pos.startsWith("INTERJECTION") ||
pos.startsWith("PREP")
)
</pre>
<ul>
<li>
<code>NCRuStopWordsTokenEnricher</code> is a stop-words tokens enricher based on
open source <a href="https://lucene.apache.org/">Apache Lucene</a> library.
</li>
<li>
On <code>line 17</code> tokens are enriched by <code>stopword</code> flags data.
</li>
</ul>
<p>
Open <code>src/main/scala/demo/nlp/entity/parser/<b>NCRuSemanticEntityParser.scala</b></code> file and replace its content with the following code:
</p>
<pre class="brush: scala, highlight: [8, 12]">
package demo.nlp.entity.parser
import opennlp.tools.stemmer.snowball.SnowballStemmer
import demo.nlp.token.parser.NCRuTokenParser
import org.apache.nlpcraft.nlp.parsers.*
import org.apache.nlpcraft.nlp.stemmer.NCStemmer
class NCRuSemanticEntityParser(src: String) extends NCSemanticEntityParser(
new NCStemmer:
private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN)
override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString }
,
new NCRuTokenParser(),
src
)
</pre>
<ul>
<li>
<code>NCRuSemanticEntityParser</code> extends {% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %}.
It uses stemmer implementation from <a href="https://opennlp.apache.org/">Apache OpenNLP</a> project.
</li>
</ul>
</section>
<section id="testing">
<h2 class="section-title">Testing <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
The test defined in <code>LightSwitchRuModelSpec</code> allows to check that all input test sentences are
processed correctly and trigger the expected intent <code>ls</code>:
</p>
<pre class="brush: scala, highlight: [9, 11]">
package demo
import org.apache.nlpcraft.*
import org.scalatest.funsuite.AnyFunSuite
import scala.util.Using
class LightSwitchRuModelSpec extends AnyFunSuite:
test("test") {
Using.resource(new NCModelClient(new LightSwitchRuModel)) { client =>
def check(txt: String): Unit =
require(client.debugAsk(txt, "userId", true).getIntentId == "ls")
check("Выключи свет по всем доме")
check("Выруби электричество!")
check("Включи свет в детской")
check("Включай повсюду освещение")
check("Включайте лампы в детской комнате")
check("Свет на кухне, пожалуйста, приглуши")
check("Нельзя ли повсюду выключить свет?")
check("Пожалуйста без света")
check("Отключи электричество в ванной")
check("Выключи, пожалуйста, тут всюду свет")
check("Выключай все!")
check("Свет пожалуйста везде включи")
check("Зажги лампу на кухне")
}
}
</pre>
<ul>
<li>
<code>Line 9</code> creates the client for our model.
</li>
<li>
<code>Line 11</code> calls a special method
{% scaladoc NCModelClient#debugAsk-fffff96c debugAsk() %}.
It allows to check the winning intent and its callback parameters without actually
calling the intent.
</li>
<li>
<code>Lines 13-25</code> define all the test input sentences that should all
trigger <code>ls</code> intent.
</li>
</ul>
<p>
You can run this test via SBT task <code>executeTests</code> or using IDE.
</p>
<pre class="brush: scala, highlight: []">
$ sbt executeTests
</pre>
</section>
<section>
<h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
You've created light switch data model and tested it.
</p>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#new_project">New Project</a></li>
<li><a href="#model">Data Model</a></li>
<li><a href="#code">Model Class</a></li>
<li><a href="#custom">Custom Components</a></li>
<li><a href="#testing">Testing</a></li>
{% include quick-links.html %}
</ul>
</div>