title: ai-proxy-multi keywords:
The ai-proxy-multi plugin simplifies access to LLM providers and models by defining a standard request format that allows key fields in plugin configuration to be embedded into the request.
This plugin adds additional features like load balancing and retries to the existing ai-proxy plugin.
Proxying requests to OpenAI is supported now. Other LLM services will be supported soon.
| Name | Type | Required | Description |
|---|---|---|---|
messages | Array | Yes | An array of message objects |
messages.role | String | Yes | Role of the message (system, user, assistant) |
messages.content | String | Yes | Content of the message |
| Name | Required | Type | Description | Default |
|---|---|---|---|---|
| providers | Yes | array | List of AI providers, each following the provider schema. | |
| provider.name | Yes | string | Name of the AI service provider. Allowed values: openai, deepseek. | |
| provider.model | Yes | string | Name of the AI model to execute. Example: gpt-4o. | |
| provider.priority | No | integer | Priority of the provider for load balancing. | 0 |
| provider.weight | No | integer | Load balancing weight. | |
| balancer.algorithm | No | string | Load balancing algorithm. Allowed values: chash, roundrobin. | roundrobin |
| balancer.hash_on | No | string | Defines what to hash on for consistent hashing (vars, header, cookie, consumer, vars_combinations). | vars |
| balancer.key | No | string | Key for consistent hashing in dynamic load balancing. | |
| provider.auth | Yes | object | Authentication details, including headers and query parameters. | |
| provider.auth.header | No | object | Authentication details sent via headers. Header name must match ^[a-zA-Z0-9._-]+$. | |
| provider.auth.query | No | object | Authentication details sent via query parameters. Keys must match ^[a-zA-Z0-9._-]+$. | |
| provider.override.endpoint | No | string | Custom host override for the AI provider. | |
| timeout | No | integer | Request timeout in milliseconds (1-60000). | 30000 |
| keepalive | No | boolean | Enables keepalive connections. | true |
| keepalive_timeout | No | integer | Timeout for keepalive connections (minimum 1000ms). | 60000 |
| keepalive_pool | No | integer | Maximum keepalive connections. | 30 |
| ssl_verify | No | boolean | Enables SSL certificate verification. | true |
Create a route with the ai-proxy-multi plugin like so:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ -H "X-API-KEY: ${ADMIN_API_KEY}" \ -d '{ "id": "ai-proxy-multi-route", "uri": "/anything", "methods": ["POST"], "plugins": { "ai-proxy-multi": { "providers": [ { "name": "openai", "model": "gpt-4", "weight": 1, "priority": 1, "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } }, "options": { "max_tokens": 512, "temperature": 1.0 } }, { "name": "deepseek", "model": "deepseek-chat", "weight": 1, "auth": { "header": { "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" } }, "options": { "max_tokens": 512, "temperature": 1.0 } } ] } }, "upstream": { "type": "roundrobin", "nodes": { "httpbin.org": 1 } } }'
In the above configuration, requests will be equally balanced among the openai and deepseek providers.
The priority attribute can be adjusted to implement the fallback and retry feature.
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ -H "X-API-KEY: ${ADMIN_API_KEY}" \ -d '{ "id": "ai-proxy-multi-route", "uri": "/anything", "methods": ["POST"], "plugins": { "ai-proxy-multi": { "providers": [ { "name": "openai", "model": "gpt-4", "weight": 1, "priority": 1, "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } }, "options": { "max_tokens": 512, "temperature": 1.0 } }, { "name": "deepseek", "model": "deepseek-chat", "weight": 1, "priority": 0, "auth": { "header": { "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" } }, "options": { "max_tokens": 512, "temperature": 1.0 } } ] } }, "upstream": { "type": "roundrobin", "nodes": { "httpbin.org": 1 } } }'
In the above configuration priority for the deepseek provider is set to 0. Which means if openai provider is unavailable then ai-proxy-multi plugin will retry sending request to deepseek in the second attempt.
Create a route with the ai-proxy-multi plugin with provider.name set to openai-compatible and the endpoint of the model set to provider.override.endpoint like so:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ -H "X-API-KEY: ${ADMIN_API_KEY}" \ -d '{ "id": "ai-proxy-multi-route", "uri": "/anything", "methods": ["POST"], "plugins": { "ai-proxy-multi": { "providers": [ { "name": "openai-compatible", "model": "qwen-plus", "weight": 1, "priority": 1, "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } }, "override": { "endpoint": "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions" } }, { "name": "deepseek", "model": "deepseek-chat", "weight": 1, "auth": { "header": { "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" } }, "options": { "max_tokens": 512, "temperature": 1.0 } } ], "passthrough": false } }, "upstream": { "type": "roundrobin", "nodes": { "httpbin.org": 1 } } }'