blob: 90eb96d10c37cc0a24a11c9b486e82b3b2e85349 [file] [log] [blame]
---
active_crumb: SQL Model <code><sub>ex</sub></code>
layout: documentation
id: sql_model
fa_icon: fa-cube
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-md-8 second-column example">
<section id="overview">
<h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
This example demonstrates a relatively complete implementation of natural language interface to a
SQL database. This is a non-trivial example, spanning over a thousand lines of code, and it can
act as a basis for production-ready implementation.
</p>
<p>
Note that a significant part of the implementation is not directly related to NLPCraft but rather deals
with SQL statement construction. We specifically decided not to use any 3rd party libraries for it to
show what can be done "from scratch".
</p>
<p>
<b>Complexity:</b> <span class="complexity-three-star"><i class="fas fa-gem"></i> <i class="fas fa-gem"></i> <i class="fas fa-gem"></i></span><br/>
<span class="ex-src">Source code: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/sql">GitHub <i class="fab fa-fw fa-github"></i></a><br/></span>
<span class="ex-review-all">Review: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">All Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span>
</p>
</section>
<section id="background">
<h2 class="section-title">Background <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Many of modern existing natural language-to-SQL implementations use variations of
deep learning approach where you first train the neural network on a pre-created training set and
then get to use the trained network to provide (infer) probabilistic answers for the new input sentences. Although the latest
natural language-to-SQL attempts to implement this approach are getting into 90% percentile of accuracy - they remain largely
unusable for the vast swath of enterprise applications where non-deterministic nature of such systems
renders them practically useless. The apparent problem with non-deterministic systems like that is that the user
never knows whether a given answer is correct or incorrect. In these use cases users can't tolerate the fact that
in 5-10% of the cases the answer will be invalid. Try that for your HR or revenue reporting system,
prescriptive analytics systems, and so on...
</p>
<p>
It's important to note, however, that for many non-mission-critical systems such non-determinism does
not pose a significant problem. We happily accept such imprecision when asking for direction on our
mobile devices, unlock our phones using face or fingerprint recognition, when performing sentiment
analysis or trying to detect faces of our friends across thousands of photographs. Cost of retries,
as well as the cost of initial errors, is insignificant in these cases. The same cost, however, in many business
applications can be too significant to tolerate.
</p>
<p>
As you may have learned by now, NLPCraft uses a <i>fully deterministic approach</i> in an attempt to
match the user input against a defined set of intents. If it finds the matching intent - it guarantees that
match for a given intent. If no matching intent can be found - it returns the negative
result without any ambiguity. In other words, if the answer is given, it is deterministically guaranteed to be correct.
Another positive side-effect of this approach is the fact that such matching logic is traceable, i.e.
the user can see why given user input was matched against a certain intent (and not any other). Such
traceability of the comprehension logic (or explainability vs "black box" approach from deep
learning techniques) is often critical for many real-life business applications.
</p>
<p>
On the flip side, one of the downsides of such an approach in general is the need to have a detailed,
domain-specific model
for each SQL database (when dealing with SQL databases). Building such a
model can be a non-trivial and time consuming experience. That is where NLPCraft brings a lot of built-in tooling
to simplify this task dramatically.
</p>
<div class="bq info">
<b>Source Code</b>
<p>
Due to size of this example the entire source code for it can be found on <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/sql">GitHub</a>.
</p>
</div>
</section>
<section id="sql">
<h2 class="section-title">Sample Database <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
We are going to be building natural language interface against slightly modified "Northwind" sample database
from <a target="_" href="https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/linq/downloading-sample-databases#get-the-northwind-sample-database-for-sql-server">Microsoft SQL Server</a>:
</p>
<figure>
<img class="img-fluid" src="/images/sql_example_model.png" alt="">
<figcaption><b>Fig 1.</b> SQL Schema</figcaption>
</figure>
<p>
You can find SQL script creating this database and populating it with the sample data at
<a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/resources/northwind.sql" target="github"><code>db/northwind.sql</code></a>
file in the root of the example.
</p>
</section>
<section id="h2">
<h2 class="section-title">H2 Database <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
In our example we are using H2 database instance. For convenience, example provides a command line application
<a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/java/org/apache/nlpcraft/examples/sql/db/SqlServer.scala" target="github"><code>db/SqlServer.scala</code></a>
that automatically starts local H2 database instance with default configuration (localhost on port 9092) and initializes it
using <a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/resources/northwind.sql" target="github"><code>db/northwind.sql</code></a> script.
</p>
</section>
<section id="model">
<h2 class="section-title">Data Model <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Creating data model manually for the existing SQL schema can be a daunting task. NLPCraft provides the tool
that can scan SQL schema and create working stub of such model automatically. All you have to do then is to
add any necessary modifications to it.
</p>
<p>
<a href="/tools/sql_model_gen.html">SQL Model Generation</a> is a Java-based utility that takes
JDBC configuration, reads database schema using it and creates initial JSON or YAML stub for the
data model. This stub then can be used as is or be extended further.
</p>
<p>
Examples specifically comes with two pre-generated files:
</p>
<ul>
<li>
<a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/resources/sql_model_init.yaml" target="github"><code>sql_model_init.yaml</code></a> - the initial file that was generated straight out of
SQL Model Generator (see above for the instructions).
</li>
</ul>
</section>
<section id="impl">
<h2 class="section-title">Implementation <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Implementation mainly consists of these files:
</p>
<ul>
<li>
<a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/java/org/apache/nlpcraft/examples/sql/SqlModel.scala" target="github"><code>SqlModel.scala</code></a> - the code behind the data model that loads YAML-defined model and
defines all intents.
</li>
<li>
<a href="https://github.com/apache/incubator-nlpcraft/blob/master/nlpcraft-examples/sql/src/main/java/org/apache/nlpcraft/examples/sql/db/SqlBuilder.scala" target="github"><code>db/SqlBuilder.scala</code></a> - the main utility that takes object model provided by
<a href="/tools/sql_model_gen.html">SQL Model Generation</a>
and builds a SQL query to execute. Note that we elected to build this functionality
from scratch to illustrate how it can be done. You are free, of course, to use many
of the existing libraries to help achieve this goal.
</li>
</ul>
</section>
<section id="build_project">
<h2 class="section-title">Build Project <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Assuming that project is located in <code>~/SqlModel</code> folder - let's go to into this directory and
run the Maven build:
</p>
<pre class="brush: bash">
$ cd ~/SqlModel
$ mvn clean package
</pre>
<p>
At this stage we have our project built and we are ready to start testing.
</p>
</section>
<section id="start_server">
<h2 class="section-title">Start Server <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Run the following command to start local REST server, if it hasn't been started already, from the NLPCraft installation directory:
</p>
<nav>
<div class="nav nav-tabs" role="tablist">
<a class="nav-item nav-link active" data-toggle="tab" href="#nav-srv-cmd" role="tab">Command</a>
<a class="nav-item nav-link" data-toggle="tab" href="#nav-srv-out" role="tab">Output <i class="fa fa-desktop output"></i></a>
</div>
</nav>
<div class="tab-content">
<div class="tab-pane fade show active" id="nav-srv-cmd" role="tabpanel">
<pre class="brush: bash">
$ bin/nlpcraft.sh start-server
</pre>
</div>
<div class="tab-pane fade show" id="nav-srv-out" role="tabpanel">
<p></p>
<p>
<img class="img-fluid" alt="" src="/images/server-fig1.png">
</p>
</div>
</div>
<p>
<b>NOTES:</b>
</p>
<ul>
<li>
<i style="color: #F39C12" class="fa fa-exclamation-triangle"></i> REST server is a "fire-and-forget" component
that you generally need to start it only once for this and any other examples.
</li>
<li>
Run <code class="script">bin/nlpcraft.sh help --cmd=start-server</code> to get a full help on this command.
</li>
<li>
<a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for
<i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code>
for <i class="fab fa-fw fa-windows"></i>.
</li>
</ul>
</section>
<section id="testing">
<h2 class="section-title">Testing <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Part of the <a href="/tools/test_framework.html">test framework</a>, the auto-validator class <a
target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/tools/test/NCTestAutoModelValidator.html">NCTestAutoModelValidator</a> takes one or more model IDs
(or class names) and performs validation. Validation consists of starting an <a href="/tools/embedded_probe.html">embedded probe</a> with a given model,
scanning for <a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSample.html">@NCIntentSample</a> and
<a target="javadoc" href="/apis/latest/org/apache/nlpcraft/model/NCIntentSampleRef.html">@NCIntentSampleRef</a> annotations
and their corresponding callback methods, submitting each sample input
sentences from these annotations and checking that resulting intent matches the intent the sample was attached to.
Note that auto-testing does not require any additional code to be written - the class gathers all required information from the model
itself.
</p>
<p>
As always, you can launch model auto-validator as any other Java class but we'll use NLPCraft CLI
to do it more conveniently:
</p>
<pre class="brush: bash">
$ bin/nlpcraft.sh test-model --cp=~/sql/target/classes --mdls=demo.SqlModel
</pre>
<p>
<b>NOTES:</b>
</p>
<ul>
<li>
Run <code class="script">bin/nlpcraft.sh help --cmd=test-model</code> to get a full help on this command.
</li>
<li>
Note that you can use <code>retest-model</code> command in REPL mode to re-run the last model test
avoiding the retyping of all required parameters.
</li>
<li>
<a href="/tools/script.html">NLPCraft CLI</a> is available as <code>nlpcraft.sh</code> for
<i class="fab fa-fw fa-linux"></i> and <code>nlpcraft.cmd</code>
for <i class="fab fa-fw fa-windows"></i>.
</li>
</ul>
</section>
<section id="rinse">
<h2 class="section-title">Rinse <span class="amp">&amp;</span> Repeat <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
Typical development cycle consists of:
</p>
<ul>
<li>
<a href="#model">Modifying the model</a>
</li>
<li>
<a href="#build_project">Re-building the project</a>
</li>
<li>
<a href="#testing">Re-running the test</a>
</li>
</ul>
<p>
All of these operations can be performed from <a href="/tools/script.html">NLPCraft CLI</a> in REPL mode or from any IDE.
</p>
<p>
NOTE: you don't need to restart REST server every time - it only needs to be started once.
</p>
</section>
<section>
<h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
You've created SQL model, started the REST server and tested this model using the built-in test framework.
</p>
</section>
</div>
<div class="col-md-2 third-column">
<ul class="side-nav">
<li class="side-nav-title">On This Page</li>
<li><a href="#overview">Overview</a></li>
<li><a href="#background">Background</a></li>
<li><a href="#sql">Sample Database</a></li>
<li><a href="#model">Data Model</a></li>
<li><a href="#impl">Implementation</a></li>
<li><a href="#build_project">Build Project</a></li>
<li><a href="#start_server">Start Server</a></li>
<li><a href="#testing">Testing</a></li>
<li><a href="#rinse">Rinse <span class="amp">&amp;</span> Repeat</a></li>
{% include quick-links.html %}
</ul>
</div>