Doris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined “Tools”. It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.
config/performance/query_timeout
. All SQL executions now use this value by default, with runtime override supported. This ensures consistent timeout behavior across all entry points (MCP tools, API, batch queries, etc.).QueryRequest
and DorisQueryExecutor
, preventing timeout failures due to missing parameters.🚀 Major Milestone: This release establishes v0.5.1 as a production-ready enterprise data governance platform with critical stability improvements (complete at_eof fix + intelligent logging + unified SQL timeout), 25 total tools (15 existing + 8 analytics + 2 ADBC tools), and enterprise-grade system reliability - representing a major advancement in both data intelligence capabilities and operational stability.
doris_mcp_server/tools/tools_manager.py
)doris_mcp_server/tools/resources_manager.py
)doris_mcp_server/tools/prompts_manager.py
)doris_mcp_server/utils/query_executor.py
)doris_mcp_server/utils/security.py
)doris_mcp_server/utils/schema_extractor.py
)doris_mcp_server/utils/analysis_tools.py
)config.py
with comprehensive validation, standardized parameter naming, and smart default database handling with automatic fallback to information_schema
# Install the latest version pip install doris-mcp-server # Install specific version pip install doris-mcp-server==0.5.0
💡 Command Compatibility: After installation, both
doris-mcp-server
commands are available for backward compatibility. You can use either command interchangeably.
The primary communication mode offering optimal performance and reliability:
# Full configuration with database connection doris-mcp-server \ --transport http \ --host 0.0.0.0 \ --port 3000 \ --db-host 127.0.0.1 \ --db-port 9030 \ --db-user root \ --db-password your_password
Standard input/output mode for direct integration with MCP clients:
# For direct integration with MCP clients like Cursor doris-mcp-server --transport stdio
# Check installation doris-mcp-server --help # Test HTTP mode (in another terminal) curl http://localhost:3000/health
Instead of command-line arguments, you can use environment variables:
export DORIS_HOST="127.0.0.1" export DORIS_PORT="9030" export DORIS_USER="root" export DORIS_PASSWORD="your_password" # Then start with simplified command doris-mcp-server --transport http --host 0.0.0.0 --port 3000
The doris-mcp-server
command supports the following arguments:
Argument | Description | Default | Required |
---|---|---|---|
--transport | Transport mode: http or stdio | http | No |
--host | HTTP server host (HTTP mode only) | 0.0.0.0 | No |
--port | HTTP server port (HTTP mode only) | 3000 | No |
--db-host | Doris database host | localhost | No |
--db-port | Doris database port | 9030 | No |
--db-user | Doris database username | root | No |
--db-password | Doris database password | - | Yes (unless in env) |
For developers who want to build from source:
# Replace with the actual repository URL if different git clone https://github.com/apache/doris-mcp-server.git cd doris-mcp-server
pip install -r requirements.txt
Copy the .env.example
file to .env
and modify the settings according to your environment:
cp .env.example .env
Key Environment Variables:
DORIS_HOST
: Database hostname (default: localhost)DORIS_PORT
: Database port (default: 9030)DORIS_USER
: Database username (default: root)DORIS_PASSWORD
: Database passwordDORIS_DATABASE
: Default database name (default: information_schema)DORIS_MIN_CONNECTIONS
: Minimum connection pool size (default: 5)DORIS_MAX_CONNECTIONS
: Maximum connection pool size (default: 20)DORIS_BE_HOSTS
: BE nodes for monitoring (comma-separated, optional - auto-discovery via SHOW BACKENDS if empty)DORIS_BE_WEBSERVER_PORT
: BE webserver port for monitoring tools (default: 8040)FE_ARROW_FLIGHT_SQL_PORT
: Frontend Arrow Flight SQL port for ADBC (New in v0.5.0)BE_ARROW_FLIGHT_SQL_PORT
: Backend Arrow Flight SQL port for ADBC (New in v0.5.0)AUTH_TYPE
: Authentication type (token/basic/oauth, default: token)TOKEN_SECRET
: Token secret keyENABLE_SECURITY_CHECK
: Enable/disable SQL security validation (default: true, New in v0.4.2)BLOCKED_KEYWORDS
: Comma-separated list of blocked SQL keywords (New in v0.4.2)ENABLE_MASKING
: Enable data masking (default: true)MAX_RESULT_ROWS
: Maximum result rows (default: 10000)ADBC_DEFAULT_MAX_ROWS
: Default maximum rows for ADBC queries (default: 100000)ADBC_DEFAULT_TIMEOUT
: Default ADBC query timeout in seconds (default: 60)ADBC_DEFAULT_RETURN_FORMAT
: Default return format - arrow/pandas/dict (default: arrow)ADBC_CONNECTION_TIMEOUT
: ADBC connection timeout in seconds (default: 30)ADBC_ENABLED
: Enable/disable ADBC tools (default: true)ENABLE_QUERY_CACHE
: Enable query caching (default: true)CACHE_TTL
: Cache time-to-live in seconds (default: 300)MAX_CONCURRENT_QUERIES
: Maximum concurrent queries (default: 50)MAX_RESPONSE_CONTENT_SIZE
: Maximum response content size for LLM compatibility (default: 4096, New in v0.4.0)LOG_LEVEL
: Log level (DEBUG/INFO/WARNING/ERROR, default: INFO)LOG_FILE_PATH
: Log file path (automatically organized by level)ENABLE_AUDIT
: Enable audit logging (default: true)ENABLE_LOG_CLEANUP
: Enable automatic log cleanup (default: true, Enhanced in v0.5.0)LOG_MAX_AGE_DAYS
: Maximum age of log files in days (default: 30, Enhanced in v0.5.0)LOG_CLEANUP_INTERVAL_HOURS
: Log cleanup check interval in hours (default: 24, Enhanced in v0.5.0)debug.log
, info.log
, warning.log
, error.log
, critical.log
audit.log
with separate retention managementThe following table lists the main tools currently available for invocation via an MCP client:
Tool Name | Description | Parameters |
---|---|---|
exec_query | Execute SQL query and return results. | sql (string, Required), db_name (string, Optional), catalog_name (string, Optional), max_rows (integer, Optional), timeout (integer, Optional) |
get_table_schema | Get detailed table structure information. | table_name (string, Required), db_name (string, Optional), catalog_name (string, Optional) |
get_db_table_list | Get list of all table names in specified database. | db_name (string, Optional), catalog_name (string, Optional) |
get_db_list | Get list of all database names. | catalog_name (string, Optional) |
get_table_comment | Get table comment information. | table_name (string, Required), db_name (string, Optional), catalog_name (string, Optional) |
get_table_column_comments | Get comment information for all columns in table. | table_name (string, Required), db_name (string, Optional), catalog_name (string, Optional) |
get_table_indexes | Get index information for specified table. | table_name (string, Required), db_name (string, Optional), catalog_name (string, Optional) |
get_recent_audit_logs | Get audit log records for recent period. | days (integer, Optional), limit (integer, Optional) |
get_catalog_list | Get list of all catalog names. | random_string (string, Required) |
get_sql_explain | Get SQL execution plan with configurable content truncation and file export for LLM analysis. | sql (string, Required), verbose (boolean, Optional), db_name (string, Optional), catalog_name (string, Optional) |
get_sql_profile | Get SQL execution profile with content management and file export for LLM optimization workflows. | sql (string, Required), db_name (string, Optional), catalog_name (string, Optional), timeout (integer, Optional) |
get_table_data_size | Get table data size information via FE HTTP API. | db_name (string, Optional), table_name (string, Optional), single_replica (boolean, Optional) |
get_monitoring_metrics_info | Get Doris monitoring metrics definitions and descriptions. | role (string, Optional), monitor_type (string, Optional), priority (string, Optional) |
get_monitoring_metrics_data | Get actual Doris monitoring metrics data from nodes with flexible BE discovery. | role (string, Optional), monitor_type (string, Optional), priority (string, Optional) |
get_realtime_memory_stats | Get real-time memory statistics via BE Memory Tracker with auto/manual BE discovery. | tracker_type (string, Optional), include_details (boolean, Optional) |
get_historical_memory_stats | Get historical memory statistics via BE Bvar interface with flexible BE configuration. | tracker_names (array, Optional), time_range (string, Optional) |
analyze_data_quality | Comprehensive data quality analysis combining completeness and distribution analysis. | table_name (string, Required), analysis_scope (string, Optional), sample_size (integer, Optional), business_rules (array, Optional) |
trace_column_lineage | End-to-end column lineage tracking through SQL analysis and dependency mapping. | target_columns (array, Required), analysis_depth (integer, Optional), include_transformations (boolean, Optional) |
monitor_data_freshness | Real-time data staleness monitoring with configurable freshness thresholds. | table_names (array, Optional), freshness_threshold_hours (integer, Optional), include_update_patterns (boolean, Optional) |
analyze_data_access_patterns | User behavior analysis and security anomaly detection with access pattern monitoring. | days (integer, Optional), include_system_users (boolean, Optional), min_query_threshold (integer, Optional) |
analyze_data_flow_dependencies | Data flow impact analysis and dependency mapping between tables and views. | target_table (string, Optional), analysis_depth (integer, Optional), include_views (boolean, Optional) |
analyze_slow_queries_topn | Performance bottleneck identification with top-N slow query analysis and patterns. | days (integer, Optional), top_n (integer, Optional), min_execution_time_ms (integer, Optional), include_patterns (boolean, Optional) |
analyze_resource_growth_curves | Capacity planning with resource growth analysis and trend forecasting. | days (integer, Optional), resource_types (array, Optional), include_predictions (boolean, Optional) |
exec_adbc_query | High-performance SQL execution using ADBC (Arrow Flight SQL) protocol. | sql (string, Required), max_rows (integer, Optional), timeout (integer, Optional), return_format (string, Optional) |
get_adbc_connection_info | ADBC connection diagnostics and status monitoring for Arrow Flight SQL. | No parameters required |
Note: All metadata tools support catalog federation for multi-catalog environments. Enhanced monitoring tools provide comprehensive memory tracking and metrics collection capabilities. New in v0.5.0: 7 advanced analytics tools for enterprise data governance and 2 ADBC tools for high-performance data transfer with 3-10x performance improvements for large datasets.
Execute the following command to start the server:
./start_server.sh
This command starts the FastAPI application with Streamable HTTP MCP service.
If you want to run only Doris MCP Server in docker:
cd doris-mcp-server docker build -t doris-mcp-server . docker run -d -p <port>:<port> -v /*your-host*/doris-mcp-server/.env:/app/.env --name <your-mcp-server-name> -it doris-mcp-server:latest
Service Endpoints:
http://<host>:<port>/mcp
(Primary MCP endpoint - supports GET, POST, DELETE, OPTIONS)http://<host>:<port>/health
Note: The server uses Streamable HTTP for web-based communication, providing unified request/response and streaming capabilities.
Interaction with the Doris MCP Server requires an MCP Client. The client connects to the server‘s Streamable HTTP endpoint and sends requests according to the MCP specification to invoke the server’s tools.
Main Interaction Flow:
initialize
method call to /mcp
(Streamable HTTP).tools/list
to get the list of supported tools, their descriptions, and parameter schemas.tools/call
request, specifying the name
and arguments
.name
: get_table_schema
arguments
: Include table_name
, db_name
, catalog_name
.content
or isError
.The Doris MCP Server supports catalog federation, enabling interaction with multiple data catalogs (internal Doris tables and external data sources like Hive, MySQL, etc.) within a unified interface.
get_db_list
, get_db_table_list
, get_table_schema
, etc.) support an optional catalog_name
parameter to query specific catalogs.get_catalog_list
to discover available catalogs and their types.All SQL queries MUST use three-part naming for table references:
internal.database_name.table_name
catalog_name.database_name.table_name
Get Available Catalogs:
{ "tool_name": "get_catalog_list", "arguments": {"random_string": "unique_id"} }
Get Databases in Specific Catalog:
{ "tool_name": "get_db_list", "arguments": {"random_string": "unique_id", "catalog_name": "mysql"} }
Query Internal Catalog:
{ "tool_name": "exec_query", "arguments": { "random_string": "unique_id", "sql": "SELECT COUNT(*) FROM internal.ssb.customer" } }
Query External Catalog:
{ "tool_name": "exec_query", "arguments": { "random_string": "unique_id", "sql": "SELECT COUNT(*) FROM mysql.ssb.customer" } }
Cross-Catalog Query:
{ "tool_name": "exec_query", "arguments": { "random_string": "unique_id", "sql": "SELECT i.c_name, m.external_data FROM internal.ssb.customer i JOIN mysql.test.user_info m ON i.c_custkey = m.customer_id" } }
The Doris MCP Server includes a comprehensive security framework that provides enterprise-level protection through authentication, authorization, SQL security validation, and data masking capabilities.
Configure authentication in your environment variables:
# Authentication Type (token/basic/oauth) AUTH_TYPE=token # Token Secret for JWT validation TOKEN_SECRET=your_secret_key_here # Session timeout (in seconds) SESSION_TIMEOUT=3600
# Client authentication with token auth_info = { "type": "token", "token": "your_jwt_token", "session_id": "unique_session_id" }
# Client authentication with username/password auth_info = { "type": "basic", "username": "analyst", "password": "secure_password", "session_id": "unique_session_id" }
The system supports four security levels with hierarchical access control:
Security Level | Access Scope | Typical Use Cases |
---|---|---|
Public | Unrestricted access | Public reports, general statistics |
Internal | Company employees | Internal dashboards, business metrics |
Confidential | Authorized personnel | Customer data, financial reports |
Secret | Senior management | Strategic data, sensitive analytics |
Configure user roles and permissions:
# Example role configuration role_permissions = { "data_analyst": { "security_level": "internal", "permissions": ["read_data", "execute_query"], "allowed_tables": ["sales", "products", "orders"] }, "data_admin": { "security_level": "confidential", "permissions": ["read_data", "execute_query", "admin"], "allowed_tables": ["*"] }, "executive": { "security_level": "secret", "permissions": ["read_data", "execute_query", "admin"], "allowed_tables": ["*"] } }
The system automatically validates SQL queries for security risks:
Configure blocked SQL operations using environment variables (New in v0.4.2):
# Enable/disable SQL security check (New in v0.4.2) ENABLE_SECURITY_CHECK=true # Customize blocked keywords via environment variable (New in v0.4.2) BLOCKED_KEYWORDS="DROP,DELETE,TRUNCATE,ALTER,CREATE,INSERT,UPDATE,GRANT,REVOKE,EXEC,EXECUTE,SHUTDOWN,KILL" # Maximum query complexity score MAX_QUERY_COMPLEXITY=100
Default Blocked Keywords (Unified in v0.4.2):
The system automatically detects and blocks:
UNION SELECT
attacksOR 1=1
patternsSLEEP()
, WAITFOR
functions--
, /**/
patterns;
# This query would be blocked dangerous_sql = "SELECT * FROM users WHERE id = 1; DROP TABLE users;" # This query would be allowed safe_sql = "SELECT name, email FROM users WHERE department = 'sales'"
Configure automatic data masking for sensitive information:
# Default masking rules masking_rules = [ { "column_pattern": r".*phone.*|.*mobile.*", "algorithm": "phone_mask", "parameters": { "mask_char": "*", "keep_prefix": 3, "keep_suffix": 4 }, "security_level": "internal" }, { "column_pattern": r".*email.*", "algorithm": "email_mask", "parameters": {"mask_char": "*"}, "security_level": "internal" }, { "column_pattern": r".*id_card.*|.*identity.*", "algorithm": "id_mask", "parameters": { "mask_char": "*", "keep_prefix": 6, "keep_suffix": 4 }, "security_level": "confidential" } ]
Algorithm | Description | Example |
---|---|---|
phone_mask | Masks phone numbers | 138****5678 |
email_mask | Masks email addresses | j***n@example.com |
id_mask | Masks ID card numbers | 110101****1234 |
name_mask | Masks personal names | 张*明 |
partial_mask | Partial masking with ratio | abc***xyz |
Add custom masking rules in your configuration:
# Custom masking rule custom_rule = { "column_pattern": r".*salary.*|.*income.*", "algorithm": "partial_mask", "parameters": { "mask_char": "*", "mask_ratio": 0.6 }, "security_level": "confidential" }
# .env file AUTH_TYPE=token TOKEN_SECRET=your_jwt_secret_key ENABLE_MASKING=true MAX_RESULT_ROWS=10000 BLOCKED_SQL_OPERATIONS=DROP,DELETE,TRUNCATE,ALTER MAX_QUERY_COMPLEXITY=100 ENABLE_AUDIT=true
# Configure sensitive tables with security levels sensitive_tables = { "user_profiles": "confidential", "payment_records": "secret", "employee_salaries": "secret", "customer_data": "confidential", "public_reports": "public" }
The system provides comprehensive security monitoring:
# Security audit log example { "timestamp": "2024-01-15T10:30:00Z", "user_id": "analyst_user", "action": "query_execution", "resource": "customer_data", "result": "blocked", "reason": "insufficient_permissions", "risk_level": "medium" }
⚠️ Important: Always test security configurations in a development environment before deploying to production. Regularly review and update security policies based on your organization's requirements.
You can connect Cursor to this MCP server using Stdio mode (recommended) or Streamable HTTP mode.
Stdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically ~/.cursor/mcp.json
or similar).
Install the package from PyPI and configure Cursor to use it:
pip install doris-mcp-server
Configure Cursor: Add an entry like the following to your Cursor MCP configuration:
{ "mcpServers": { "doris-stdio": { "command": "doris-mcp-server", "args": ["--transport", "stdio"], "env": { "DORIS_HOST": "127.0.0.1", "DORIS_PORT": "9030", "DORIS_USER": "root", "DORIS_PASSWORD": "your_db_password" } } } }
If you have uv
installed and want to run from source:
uv run --project /path/to/doris-mcp-server doris-mcp-server
Note: Replace /path/to/doris-mcp-server
with the actual absolute path to your project directory.
Configure Cursor: Add an entry like the following to your Cursor MCP configuration:
{ "mcpServers": { "doris-stdio": { "command": "uv", "args": ["run", "--project", "/path/to/your/doris-mcp-server", "doris-mcp-server"], "env": { "DORIS_HOST": "127.0.0.1", "DORIS_PORT": "9030", "DORIS_USER": "root", "DORIS_PASSWORD": "your_db_password" } } } }
Streamable HTTP mode requires you to run the MCP server independently first, and then configure Cursor to connect to it.
Configure .env
: Ensure your database credentials and any other necessary settings are correctly configured in the .env
file within the project directory.
Start the Server: Run the server from your terminal in the project's root directory:
./start_server.sh
This script reads the .env
file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is 0.0.0.0:3000
).
Configure Cursor: Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:
{ "mcpServers": { "doris-http": { "url": "http://127.0.0.1:3000/mcp" } } }
Note: Adjust the host/port if your server runs on a different address. The
/mcp
endpoint is the unified Streamable HTTP interface.
After configuring either mode in Cursor, you should be able to select the server (e.g., doris-stdio
or doris-http
) and use its tools.
doris-mcp-server/ ├── doris_mcp_server/ # Main server package │ ├── main.py # Main entry point and FastAPI app │ ├── tools/ # MCP tools implementation │ │ ├── tools_manager.py # Centralized tools management and registration │ │ ├── resources_manager.py # Resource management and metadata exposure │ │ ├── prompts_manager.py # Intelligent prompt templates for data analysis │ │ └── __init__.py │ ├── utils/ # Core utility modules │ │ ├── config.py # Configuration management with validation │ │ ├── db.py # Database connection management with pooling │ │ ├── query_executor.py # High-performance SQL execution with caching │ │ ├── security.py # Security management and data masking │ │ ├── schema_extractor.py # Metadata extraction with catalog federation │ │ ├── analysis_tools.py # Data analysis and performance monitoring │ │ ├── data_governance_tools.py # Data lineage and freshness monitoring (New in v0.5.0) │ │ ├── data_quality_tools.py # Comprehensive data quality analysis (New in v0.5.0) │ │ ├── data_exploration_tools.py # Advanced statistical analysis (New in v0.5.0) │ │ ├── security_analytics_tools.py # Access pattern analysis (New in v0.5.0) │ │ ├── dependency_analysis_tools.py # Impact analysis and dependency mapping (New in v0.5.0) │ │ ├── performance_analytics_tools.py # Query optimization and capacity planning (New in v0.5.0) │ │ ├── adbc_query_tools.py # High-performance Arrow Flight SQL operations (New in v0.5.0) │ │ ├── logger.py # Logging configuration │ │ └── __init__.py │ └── __init__.py ├── doris_mcp_client/ # MCP client implementation │ ├── client.py # Unified MCP client for testing and integration │ ├── README.md # Client documentation │ └── __init__.py ├── logs/ # Log files directory ├── README.md # This documentation ├── .env.example # Environment variables template ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration and entry points ├── uv.lock # UV package manager lock file ├── generate_requirements.py # Requirements generation script ├── start_server.sh # Server startup script └── restart_server.sh # Server restart script
This section outlines the process for adding new MCP tools to the Doris MCP Server, based on the unified modular architecture with centralized tool management.
The server provides comprehensive utility modules for common database operations:
doris_mcp_server/utils/db.py
: Database connection management with connection pooling and health monitoring.doris_mcp_server/utils/query_executor.py
: High-performance SQL execution with advanced caching, optimization, and performance monitoring.doris_mcp_server/utils/schema_extractor.py
: Metadata extraction with full catalog federation support.doris_mcp_server/utils/security.py
: Comprehensive security management, SQL validation, and data masking.doris_mcp_server/utils/analysis_tools.py
: Advanced data analysis and statistical tools.doris_mcp_server/utils/config.py
: Configuration management with validation.doris_mcp_server/utils/data_governance_tools.py
: Data lineage tracking and freshness monitoring (New in v0.5.0).doris_mcp_server/utils/data_quality_tools.py
: Comprehensive data quality analysis framework (New in v0.5.0).doris_mcp_server/utils/adbc_query_tools.py
: High-performance Arrow Flight SQL operations (New in v0.5.0).Add your new tool to the DorisToolsManager
class in doris_mcp_server/tools/tools_manager.py
. The tools manager provides a centralized approach to tool registration and execution with unified interfaces.
Example: Adding a new analysis tool:
# In doris_mcp_server/tools/tools_manager.py async def your_new_analysis_tool(self, arguments: Dict[str, Any]) -> List[Dict[str, Any]]: """ Your new analysis tool implementation Args: arguments: Tool arguments from MCP client Returns: List of MCP response messages """ try: # Use existing utilities result = await self.query_executor.execute_sql_for_mcp( sql="SELECT COUNT(*) FROM your_table", max_rows=arguments.get("max_rows", 100) ) return [{ "type": "text", "text": json.dumps(result, ensure_ascii=False, indent=2) }] except Exception as e: logger.error(f"Tool execution failed: {str(e)}", exc_info=True) return [{ "type": "text", "text": f"Error: {str(e)}" }]
Add your tool to the _register_tools
method in the same class:
# In the _register_tools method of DorisToolsManager @self.mcp.tool( name="your_new_analysis_tool", description="Description of your new analysis tool", inputSchema={ "type": "object", "properties": { "parameter1": { "type": "string", "description": "Description of parameter1" }, "parameter2": { "type": "integer", "description": "Description of parameter2", "default": 100 } }, "required": ["parameter1"] } ) async def your_new_analysis_tool_wrapper(arguments: Dict[str, Any]) -> List[Dict[str, Any]]: return await self.your_new_analysis_tool(arguments)
For more complex tools, you can leverage the comprehensive framework:
Test your new tool using the included MCP client:
# Using doris_mcp_client/client.py from doris_mcp_client.client import DorisUnifiedMCPClient async def test_new_tool(): client = DorisUnifiedMCPClient() result = await client.call_tool("your_new_analysis_tool", { "parameter1": "test_value", "parameter2": 50 }) print(result)
The project includes a unified MCP client (doris_mcp_client/
) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.
For detailed client documentation, see doris_mcp_client/README.md
.
Contributions are welcome via Issues or Pull Requests.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
A: This is a common issue. The main reason is that these models need more explicit guidance to correctly use MCP tools. It's recommended to add the following instruction prompt for the model:
<instruction> 尽可能使用MCP工具完成任务,仔细阅读每个工具的注解、方法名、参数说明等内容。请按照以下步骤操作: 1. 仔细分析用户的问题,从已有的Tools列表中匹配最合适的工具。 2. 确保工具名称、方法名和参数完全按照工具注释中的定义使用,不要自行创造工具名称或参数。 3. 传入参数时,严格遵循工具注释中规定的参数格式和要求。 4. 调用工具时,根据需要直接调用工具,但参数请求参考以下请求格式:{"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}} 5. 输出结果时,不要包含任何XML标签,仅返回纯文本内容。 <input> 用户问题:user_query </input> <output> 返回工具调用结果或最终答案,以及对结果的分析。 </output> </instruction>
<instruction> Use MCP tools to complete tasks as much as possible. Carefully read the annotations, method names, and parameter descriptions of each tool. Please follow these steps: 1. Carefully analyze the user's question and match the most appropriate tool from the existing Tools list. 2. Ensure tool names, method names, and parameters are used exactly as defined in the tool annotations. Do not create tool names or parameters on your own. 3. When passing parameters, strictly follow the parameter format and requirements specified in the tool annotations. 4. When calling tools, call them directly as needed, but refer to the following request format for parameters: {"mcp_sse_call_tool": {"tool_name": "$tools_name", "arguments": "{}"}} 5. When outputting results, do not include any XML tags, return plain text content only. <input> User question: user_query </input> <output> Return tool call results or final answer, along with analysis of the results. </output> </instruction>
If you have further requirements for the returned results, you can describe the specific requirements in the <output>
tag.
A: You can configure database connections in several ways:
Environment Variables (Recommended):
export DORIS_HOST="your_doris_host" export DORIS_PORT="9030" export DORIS_USER="root" export DORIS_PASSWORD="your_password"
Command Line Arguments:
doris-mcp-server --db-host your_host --db-port 9030 --db-user root --db-password your_password
Configuration File: Modify the corresponding configuration items in the .env
file.
A: Choose the appropriate configuration based on your deployment scenario:
External Network (Manual Configuration):
# Manually specify BE node addresses DORIS_BE_HOSTS=10.1.1.100,10.1.1.101,10.1.1.102 DORIS_BE_WEBSERVER_PORT=8040
Internal Network (Automatic Discovery):
# Leave BE_HOSTS empty for auto-discovery # DORIS_BE_HOSTS= # Not set or empty # System will use 'SHOW BACKENDS' command to get internal IPs
A: The tools provide both truncated content and complete files for LLM analysis:
Get Analysis Results:
{ "content": "Truncated plan for immediate review", "file_path": "/tmp/explain_12345.txt", "is_content_truncated": true }
LLM Analysis Workflow:
Configure Content Size:
MAX_RESPONSE_CONTENT_SIZE=4096 # Adjust as needed
A: Set the following configurations in your .env
file:
# Enable data masking ENABLE_MASKING=true # Set authentication type AUTH_TYPE=token # Configure token secret TOKEN_SECRET=your_secret_key # Set maximum result rows MAX_RESULT_ROWS=10000
A:
Recommendations:
A: Try the following solutions:
Increase timeout settings:
# Set in .env file QUERY_TIMEOUT=60 CONNECTION_TIMEOUT=30
Check network connectivity:
# Test database connection curl http://localhost:3000/health
Optimize connection pool configuration:
DORIS_MAX_CONNECTIONS=20
at_eof
connection errors? (Completely Fixed in v0.5.0)A: Version 0.5.0 has completely resolved the critical at_eof
connection errors through comprehensive connection pool redesign:
at_eof
errors occurred due to connection pool pre-creation and improper connection state managementConnection Pool Strategy Overhaul:
min_connections
from default to 0 to prevent pre-creation issuesEnhanced Health Monitoring:
Intelligent Recovery System:
Performance Optimizations:
# Monitor connection pool health in real-time tail -f logs/doris_mcp_server_info.log | grep -E "(pool|connection|at_eof)" # Check detailed connection diagnostics tail -f logs/doris_mcp_server_debug.log | grep "connection health" # View connection pool metrics curl http://localhost:8000/health # If running in HTTP mode
# Recommended connection pool settings in .env DORIS_MAX_CONNECTIONS=20 # Adjust based on workload CONNECTION_TIMEOUT=30 # Connection establishment timeout QUERY_TIMEOUT=60 # Query execution timeout # Health monitoring settings HEALTH_CHECK_INTERVAL=60 # Pool health check frequency
Result: 99.9% elimination of at_eof
errors with significantly improved connection stability and performance.
A: Version 0.4.2 introduced an intelligent MCP compatibility layer that supports both MCP 1.8.x and 1.9.x versions:
The Problem:
RequestContext
class (changed from 2 to 3 generic parameters)TypeError: Too few arguments for RequestContext
errorsThe Solution (v0.4.2):
mcp>=1.8.0,<2.0.0
in dependenciesSupported MCP Versions:
# Both versions now work seamlessly pip install mcp==1.8.0 # Stable version (recommended) pip install mcp==1.9.3 # Latest version with new features
Version Information:
# Check which MCP version is being used doris-mcp-server --transport stdio # The server will log: "Using MCP version: x.x.x"
If you encounter MCP-related startup errors:
# Recommended: Use stable version pip uninstall mcp pip install mcp==1.8.0 # Or upgrade to latest compatible version pip install --upgrade doris-mcp-server==0.5.0
A: ADBC (Arrow Flight SQL) provides 3-10x performance improvements for large datasets:
ADBC Dependencies (automatically included in v0.5.0+):
# ADBC dependencies are now included by default in doris-mcp-server>=0.5.0 # No separate installation required
Configure Arrow Flight SQL Ports:
# Add to your .env file FE_ARROW_FLIGHT_SQL_PORT=8096 BE_ARROW_FLIGHT_SQL_PORT=8097
Optional ADBC Customization:
# Customize ADBC behavior (optional) ADBC_DEFAULT_MAX_ROWS=200000 ADBC_DEFAULT_TIMEOUT=120 ADBC_DEFAULT_RETURN_FORMAT=pandas # arrow/pandas/dict
Test ADBC Connection:
# Use get_adbc_connection_info tool to verify setup # Should show "status": "ready" and port connectivity
A: The 7 new analytics tools provide comprehensive data governance capabilities:
Data Quality Analysis:
{ "tool_name": "analyze_data_quality", "arguments": { "table_name": "customer_data", "analysis_scope": "comprehensive", "sample_size": 100000 } }
Column Lineage Tracking:
{ "tool_name": "trace_column_lineage", "arguments": { "target_columns": ["users.email", "orders.customer_id"], "analysis_depth": 3 } }
Data Freshness Monitoring:
{ "tool_name": "monitor_data_freshness", "arguments": { "freshness_threshold_hours": 24, "include_update_patterns": true } }
Performance Analytics:
{ "tool_name": "analyze_slow_queries_topn", "arguments": { "days": 7, "top_n": 20, "include_patterns": true } }
A: Version 0.5.0 introduces a comprehensive logging system with automatic management and level-based organization:
logs/ ├── doris_mcp_server_debug.log # DEBUG level messages ├── doris_mcp_server_info.log # INFO level messages ├── doris_mcp_server_warning.log # WARNING level messages ├── doris_mcp_server_error.log # ERROR level messages ├── doris_mcp_server_critical.log # CRITICAL level messages ├── doris_mcp_server_all.log # Combined log (all levels) └── doris_mcp_server_audit.log # Audit trail (separate)
# View real-time logs by level tail -f logs/doris_mcp_server_info.log # General operational info tail -f logs/doris_mcp_server_error.log # Error tracking tail -f logs/doris_mcp_server_debug.log # Detailed debugging # View all activity in combined log tail -f logs/doris_mcp_server_all.log # Monitor specific operations tail -f logs/doris_mcp_server_info.log | grep -E "(query|connection|tool)" # View audit trail tail -f logs/doris_mcp_server_audit.log
# Enhanced logging configuration in .env LOG_LEVEL=INFO # Base log level ENABLE_AUDIT=true # Enable audit logging ENABLE_LOG_CLEANUP=true # Enable automatic cleanup LOG_MAX_AGE_DAYS=30 # Keep logs for 30 days LOG_CLEANUP_INTERVAL_HOURS=24 # Check for cleanup daily # Advanced settings LOG_FILE_PATH=logs # Log directory (auto-organized)
# Debug connection issues grep -E "(connection|pool|at_eof)" logs/doris_mcp_server_error.log # Monitor tool performance grep "execution_time" logs/doris_mcp_server_info.log # Check system health tail -20 logs/doris_mcp_server_warning.log # View recent critical issues cat logs/doris_mcp_server_critical.log
LOG_MAX_AGE_DAYS
For other issues, please check GitHub Issues or submit a new issue.