AI Function is a built-in function provided by Doris based on Artificial Intelligence (AI) capabilities. Users can directly call AI in SQL queries to perform various intelligent text tasks. AI Function connects to multiple mainstream AI providers (such as OpenAI, Anthropic, DeepSeek, Gemini, Ollama, MoonShot, etc.) through Doris's resource mechanism.
The AI used must be provided externally by Doris and support text analysis.
Before using AI Function, you need to create a Resource of type AI to centrally manage access information for the AI API.
CREATE RESOURCE "ai_resource_name" PROPERTIES ( 'type' = 'ai', 'ai.provider_type' = 'openai', 'ai.endpoint' = 'https://endpoint_example', 'ai.model_name' = 'model_example', 'ai.api_key' = 'sk-xxx', 'ai.temperature' = '0.7', 'ai.max_token' = '1024', 'ai.max_retries' = '3', 'ai.retry_delay_second' = '1', 'ai.dimensions' = '1024' );
type: Required, must be ai, used as the type identifier for ai.
ai.provider_type: Required, external AI provider type. Currently supported providers include: OpenAI, Anthropic, Gemini, DeepSeek, Local, MoonShot, MiniMax, Zhipu, QWen, Baichuan. If there are providers not listed above but their API format is the same as OpenAI/Anthropic/Gemini, you can directly fill in the corresponding provider.
ai.endpoint: Required, AI API endpoint.
ai.model_name: Required, model name.
ai_api_key: Required except when ai.provider_type = local, API key.
ai.temperature: Optional. Controls the randomness of generated content. Accepts a float value between 0 and 1. The default value is -1, which means this parameter is not set.
ai.max_tokens: Optional. Limits the maximum number of tokens for generated content. The default value is -1, which means this parameter is not set. The default value for Anthropic is 2048.
ai.max_retries: Optional. The maximum number of retries for a single request. The default value is 3.
ai.retry_delay_second: Optional. The delay time (in seconds) before retrying. The default value is 0.
ai.dimensions: Optional, control the vector dimension of EMBED output. Make sure to confirm that the model filled in ai.model_name supports custom vector dimension before setting, otherwise it may cause model call failure.
When users call AI-related functions, resources can be specified in the following two ways:
Set Session variable format:
SET default_ai_resource='resource_name';
Function call format:
SELECT AI_FUNCTION([<resource_name>], <args...>);
When calling an AI_Function, it determines which resource to use in the following order:
default_ai_resource)Example:
SET default_ai_resource='global_default_resource'; SELECT AI_SENTIMENT('this is a test'); -- Uses resource named 'global_default_resource' SELECT AI_SENTIMENT('invoke_resource', 'this is a test'); --Uses resource named 'invoke_resource'
Currently supported AI Functions in Doris include:
AI_CLASSIFY: Information classification
AI_EXTRACT: Information extraction
AI_FILTER:Filter information
AI_FIXGRAMMAR: Grammar correction
AI_GENERATE: Text generation
AI_MASK: Masking sensitive information
AI_SENTIMENT: Sentiment analysis
AI_SIMILARITY: Text semantic similarity comparison
AI_SUMMARIZE: Text summarization
AI_TRANSLATE: Translation
AI_AGG: Perform cross-line aggregation analysis on multiple texts
AI_TRANSLATESELECT AI_TRANSLATE('resource_name', 'this is a test', 'Chinese'); -- 这是一个测试
AI_SENTIMENTSET default_ai_resource = 'resource_name'; SELECT AI_SENTIMENT('Apache Doris is a great DBMS.');
For detailed function and usage, please refer to the documentation of each specific function.