blob: 3b093e05f6d425623ffc53426199984ba2fda38b [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="impala_ai_functions">
<title>Advantages and use cases of Impala AI functions</title>
<titlealts audience="PDF">
<navtitle>AI Functions</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Functions"/>
<data name="Category" value="UDF"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Querying"/>
</metadata>
</prolog>
<conbody>
<p> You can use Impala's ai_generate_text function to access Large Language Models
(LLMs) in SQL queries. This function enables you to input a prompt, retrieve the LLM response,
and include it in results. You can create custom UDFs for complex tasks like sentiment
analysis and translation.
</p>
<section id="section_ufg_4tv_fcc">
<title>Use LLMs directly in SQL with Impala's <codeph>ai_generate_text</codeph>
function></title>
<p>Impala introduces a built-in AI function called <codeph>ai_generate_text</codeph> that
enables direct access to and utilization of Large Language Models (LLMs) in SQL queries.
With this function, you can input a prompt, which may include data. The function
communicates with a supported LLM endpoint, sends the prompt, retrieves the response, and
includes it in the query result.</p>
<p>Alternatively, seamlessly integrate LLM intelligence into your Impala workflow by creating
custom User Defined Functions (UDFs) on top of <codeph>ai_generate_text</codeph>. This
allows you to use concise SQL statements for sending prompts to an LLM and receiving
responses. You can define UDFs for complex tasks like sentiment analysis, language
translation, and generative contextual analysis.</p>
</section>
<section id="ai_advantages">
<title>Advantages of using AI functions</title>
<p>
<ul id="ul_cdc_fm4_tbc">
<li><b>Simplified Workflow</b>: Eliminates the necessity for setting up intricate data
pipelines.</li>
<li><b>No ML Expertise Required</b>: No specialized machine learning skills are
needed.</li>
<li><b>Swift Decision-Making</b>: Enables faster insights on the data, facilitating
critical business decisions by using in-database function calls.</li>
<li><b>Integrated Functionality</b>: Requires no external applications, as it is a
built-in feature in Data Warehouse.</li>
</ul>
</p>
</section>
<section id="ai_usecases">
<title>List of possible use cases</title>
<p>Here are some practical applications of using AI models with the function:<ul
id="ul_ond_pm4_tbc">
<li><b>Sentiment Analysis</b>: Use the AI model to examine customer reviews for a product
and identify their sentiment as positive, negative, or neutral.</li>
<li><b>Language Translation</b>: Translate product reviews written in different languages
to understand customer feedback from various regions.</li>
<li><b>Generative Contextual Analysis</b>: Generate detailed reports and insights on
various topics based on provided data.</li>
</ul></p>
</section>
<section id="syntax_ai_function">
<title>Syntax for AI built-in function arguments</title>
<p>The following example of a built-in AI function demonstrates the use of the OpenAI API as a
large language model. Currently, OpenAI's public endpoint and Azure OpenAI endpoints are
supported.</p>
<dl id="dl_w5l_nn4_tbc">
<dlentry>
<dt>AI_GENERATE_TEXT_DEFAULT</dt>
<dd>Syntax:<codeblock id="codeblock_ibj_qn4_tbc">ai_generate_text_default(prompt)</codeblock></dd>
</dlentry>
<dlentry>
<dt>AI_GENERATE_TEXT</dt>
<dd>Syntax:<codeblock id="codeblock_qdg_zn4_tbc">ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params)
</codeblock></dd>
</dlentry>
</dl>
<p>The <codeph>ai_generate_text</codeph> function uses the values you provide as an argument
in the function for <codeph>ai_endpoint</codeph>, <codeph>ai_model</codeph>, and
<codeph>ai_api_key_jceks_secret</codeph>. If any of the arguments are left empty or set to
NULL, the function uses the default values defined at the instance level. These default
values correspond to the flag settings configured in the Impala instance. For example, if
the <codeph>ai_endpoint</codeph> argument is NULL or empty, the function will use the value
specified by the <codeph>ai_endpoint</codeph> flag as the default.</p>
<p dir="ltr">When using the <codeph>ai_generate_text_default</codeph> function, make sure to
set all parameters (<codeph>ai_endpoint</codeph>, <codeph>ai_model</codeph>, and
<codeph>ai_api_key_jceks_secret</codeph>) in the coordinator/executor flagfiles with
appropriate values.</p>
</section>
<section id="key_parameters_ai_function">
<title>Key parameters for using the AI model</title>
<ul id="ul_pfm_2p4_tbc">
<li><b>ai_endpoint</b>: The endpoint for the model API that is being interfaced with,
supports services like OpenAI and Azure OpenAI Service, for example,
https://api.openai.com/v1/chat/completions.</li>
<li><b>prompt</b>: The text you submit to the AI model to generate a response.</li>
<li><b>ai_model</b>: The specific model name you want to use within the desired API, for
example, gpt-3.5-turbo.</li>
<li><b>ai_api_key_jceks_secret</b>: The key name for the JCEKS secret that contains your API
key for the AI API you are using. You need a JCEKS keystore containing the specified JCEKS
secret referenced in <codeph>ai_api_key_jceks_secret</codeph>. To do this, set the
<codeph>hadoop.security.credential.provider.path</codeph> property in the
<codeph>core-site</codeph> configuration for both the executor and coordinator.</li>
<li><b>additional_params</b>: Additional parameters that the AI API offers that is provided
to the built-in function as a JSON object.</li>
</ul>
</section>
<section id="impala-built-in-ai-function">
<title>Examples of using the built-in AI function</title>
<p>The following example lists the steps needed to turn a prompt into a custom SQL function
using just the built-in function
<codeph>ai_generate_text_default</codeph>.<codeblock id="codeblock_ppy_dr4_tbc">> select ai_generate_text_default('hello');
Response:
Hello! How can I assist you today?
</codeblock></p>
<p>In the below example, a query is sent to the Amazon book reviews database for the book
titled Artificial Superintelligence. The large language model (LLM) is prompted to classify
the sentiment as positive, neutral, or
negative.<codeblock id="codeblock_hyp_4np_tbc">> select customer_id, star_rating, ai_generate_text_default(CONCAT('Classify the following review as positive, neutral, or negative', and only include the uncapitalized category in the response: ', review_body)) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id LIMIT 1;
Response:
+--+------------+------------+----------------+------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+------------------+
|1 |4343565 | 5 |positive |What is this book |
| | | | |all about ………… |
+--+------------+------------+----------------+------------------+
</codeblock></p>
</section>
<section id="impala-custom-udf">
<title>Examples of creating and using custom UDFs along with the built-in AI function</title>
<p>Instead of writing the prompts in a SQL query, you can build a UDF with your intended
prompt. Once you build your custom UDF, pass your desired prompt within your custom UDF into
the <codeph>ai_generate_text_default</codeph> built-in Impala function.</p>
<p>Example: Classify input customer reviews</p>
<p>The following UDF uses the Amazon book reviews database as the input and requests the LLM
to classify the sentiment.</p>
<p>Classify input customer reviews:</p>
<codeblock id="codeblock_h1z_yrp_tbc">
IMPALA_UDF_EXPORT
StringVal ClassifyReviews(FunctionContext* context, const StringVal&amp; input) {
std::string request =
std::string("Classify the following review as positive, neutral, or negative")
+ std::string(" and only include the uncapitalized category in the response: ")
+ std::string(reinterpret_cast&lt;const char*>(input.ptr), input.len);
StringVal prompt(request.c_str());
const StringVal endpoint("https://api.openai.com/v1/chat/completions");
const StringVal model("gpt-3.5-turbo");
const StringVal api_key_jceks_secret("open-ai-key");
const StringVal params("{\"temperature\": 0.9, \"model\": \"gpt-4\"}");
return context->Functions()->ai_generate_text(
context, endpoint, prompt, model, api_key_jceks_secret, params);
}
</codeblock>
<p>Now you can define these prompt building UDFs and build them in Impala. Once you have them
running, you can query your datasets using them.</p>
<p>Creating <codeph>analyze_reviews</codeph> function:</p>
<codeblock id="codeblock_b4l_ctp_tbc">> CREATE FUNCTION analyze_reviews(STRING)
RETURNS STRING
LOCATION ‘s3a://dw-...............’
SYMBOL=’ClassifyReviews’
</codeblock>
<p>Using SELECT query for Sentiment analysis to classify Amazon book reviews</p>
<codeblock id="codeblock_xdr_jtp_tbc">> SELECT customer_id, star_rating, analyze_reviews(review_body) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id;
Response:
+--+------------+------------+----------------+----------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+----------------------+
|1 |44254093 | 5 |positive |What is this book all |
| | | | |about? It is all about|
| | | | |a mind-blowing |
| | | | |universal law of |
| | | | |nature. Mind-blow… |
+--+------------+------------+----------------+----------------------+
|2 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|3 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|4 |52932308 | 1 |negative |This book is seriously|
| | | | |flawed. I could not |
| | | | |work out if the author|
| | | | |was a mathemetician |
| | | | |dabbi… |
+--+------------+------------+----------------+----------------------+
|5 |52971961 | 1 |negative |Abdoullaev's |
| | | | |exploration of |
| | | | |Al issues appears to |
| | | | |be very technological |
| | | | |and straightforward… |
+--+------------+------------+----------------+----------------------+
|6 |53008416 | 4 |positive |As Co Founder of |
| | | | |ArtilectWorld:ultra |
| | | | |intelligent machine, |
| | | | |I recommend reading |
| | | | |this book! |
+--+------------+------------+----------------+----------------------+
</codeblock>
</section>
</conbody>
</concept>