| --- |
| title: ai-proxy-multi |
| keywords: |
| - Apache APISIX |
| - API Gateway |
| - Plugin |
| - ai-proxy-multi |
| description: This document contains information about the Apache APISIX ai-proxy-multi Plugin. |
| --- |
| |
| <!-- |
| # |
| # Licensed to the Apache Software Foundation (ASF) under one or more |
| # contributor license agreements. See the NOTICE file distributed with |
| # this work for additional information regarding copyright ownership. |
| # The ASF licenses this file to You under the Apache License, Version 2.0 |
| # (the "License"); you may not use this file except in compliance with |
| # the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, software |
| # distributed under the License is distributed on an "AS IS" BASIS, |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| # See the License for the specific language governing permissions and |
| # limitations under the License. |
| # |
| --> |
| |
| ## Description |
| |
| The `ai-proxy-multi` plugin simplifies access to LLM providers and models by defining a standard request format |
| that allows key fields in plugin configuration to be embedded into the request. |
| |
| This plugin adds additional features like `load balancing` and `retries` to the existing `ai-proxy` plugin. |
| |
| Proxying requests to OpenAI is supported now. Other LLM services will be supported soon. |
| |
| ## Request Format |
| |
| ### OpenAI |
| |
| - Chat API |
| |
| | Name | Type | Required | Description | |
| | ------------------ | ------ | -------- | --------------------------------------------------- | |
| | `messages` | Array | Yes | An array of message objects | |
| | `messages.role` | String | Yes | Role of the message (`system`, `user`, `assistant`) | |
| | `messages.content` | String | Yes | Content of the message | |
| |
| ## Plugin Attributes |
| |
| | **Name** | **Required** | **Type** | **Description** | **Default** | |
| | ---------------------------- | ------------ | -------- | ------------------------------------------------------------------------------------------------------------- | ----------- | |
| | providers | Yes | array | List of AI providers, each following the provider schema. | | |
| | provider.name | Yes | string | Name of the AI service provider. Allowed values: `openai`, `deepseek`. | | |
| | provider.model | Yes | string | Name of the AI model to execute. Example: `gpt-4o`. | | |
| | provider.priority | No | integer | Priority of the provider for load balancing. | 0 | |
| | provider.weight | No | integer | Load balancing weight. | | |
| | balancer.algorithm | No | string | Load balancing algorithm. Allowed values: `chash`, `roundrobin`. | roundrobin | |
| | balancer.hash_on | No | string | Defines what to hash on for consistent hashing (`vars`, `header`, `cookie`, `consumer`, `vars_combinations`). | vars | |
| | balancer.key | No | string | Key for consistent hashing in dynamic load balancing. | | |
| | provider.auth | Yes | object | Authentication details, including headers and query parameters. | | |
| | provider.auth.header | No | object | Authentication details sent via headers. Header name must match `^[a-zA-Z0-9._-]+$`. | | |
| | provider.auth.query | No | object | Authentication details sent via query parameters. Keys must match `^[a-zA-Z0-9._-]+$`. | | |
| | provider.override.endpoint | No | string | Custom host override for the AI provider. | | |
| | timeout | No | integer | Request timeout in milliseconds (1-60000). | 30000 | |
| | keepalive | No | boolean | Enables keepalive connections. | true | |
| | keepalive_timeout | No | integer | Timeout for keepalive connections (minimum 1000ms). | 60000 | |
| | keepalive_pool | No | integer | Maximum keepalive connections. | 30 | |
| | ssl_verify | No | boolean | Enables SSL certificate verification. | true | |
| |
| ## Example usage |
| |
| Create a route with the `ai-proxy-multi` plugin like so: |
| |
| ```shell |
| curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ |
| -H "X-API-KEY: ${ADMIN_API_KEY}" \ |
| -d '{ |
| "id": "ai-proxy-multi-route", |
| "uri": "/anything", |
| "methods": ["POST"], |
| "plugins": { |
| "ai-proxy-multi": { |
| "providers": [ |
| { |
| "name": "openai", |
| "model": "gpt-4", |
| "weight": 1, |
| "priority": 1, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$OPENAI_API_KEY"'" |
| } |
| }, |
| "options": { |
| "max_tokens": 512, |
| "temperature": 1.0 |
| } |
| }, |
| { |
| "name": "deepseek", |
| "model": "deepseek-chat", |
| "weight": 1, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" |
| } |
| }, |
| "options": { |
| "max_tokens": 512, |
| "temperature": 1.0 |
| } |
| } |
| ] |
| } |
| }, |
| "upstream": { |
| "type": "roundrobin", |
| "nodes": { |
| "httpbin.org": 1 |
| } |
| } |
| }' |
| ``` |
| |
| In the above configuration, requests will be equally balanced among the `openai` and `deepseek` providers. |
| |
| ### Retry and fallback: |
| |
| The `priority` attribute can be adjusted to implement the fallback and retry feature. |
| |
| ```shell |
| curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ |
| -H "X-API-KEY: ${ADMIN_API_KEY}" \ |
| -d '{ |
| "id": "ai-proxy-multi-route", |
| "uri": "/anything", |
| "methods": ["POST"], |
| "plugins": { |
| "ai-proxy-multi": { |
| "providers": [ |
| { |
| "name": "openai", |
| "model": "gpt-4", |
| "weight": 1, |
| "priority": 1, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$OPENAI_API_KEY"'" |
| } |
| }, |
| "options": { |
| "max_tokens": 512, |
| "temperature": 1.0 |
| } |
| }, |
| { |
| "name": "deepseek", |
| "model": "deepseek-chat", |
| "weight": 1, |
| "priority": 0, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" |
| } |
| }, |
| "options": { |
| "max_tokens": 512, |
| "temperature": 1.0 |
| } |
| } |
| ] |
| } |
| }, |
| "upstream": { |
| "type": "roundrobin", |
| "nodes": { |
| "httpbin.org": 1 |
| } |
| } |
| }' |
| ``` |
| |
| In the above configuration `priority` for the deepseek provider is set to `0`. Which means if `openai` provider is unavailable then `ai-proxy-multi` plugin will retry sending request to `deepseek` in the second attempt. |
| |
| ### Send request to an OpenAI compatible LLM |
| |
| Create a route with the `ai-proxy-multi` plugin with `provider.name` set to `openai-compatible` and the endpoint of the model set to `provider.override.endpoint` like so: |
| |
| ```shell |
| curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ |
| -H "X-API-KEY: ${ADMIN_API_KEY}" \ |
| -d '{ |
| "id": "ai-proxy-multi-route", |
| "uri": "/anything", |
| "methods": ["POST"], |
| "plugins": { |
| "ai-proxy-multi": { |
| "providers": [ |
| { |
| "name": "openai-compatible", |
| "model": "qwen-plus", |
| "weight": 1, |
| "priority": 1, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$OPENAI_API_KEY"'" |
| } |
| }, |
| "override": { |
| "endpoint": "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions" |
| } |
| }, |
| { |
| "name": "deepseek", |
| "model": "deepseek-chat", |
| "weight": 1, |
| "auth": { |
| "header": { |
| "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'" |
| } |
| }, |
| "options": { |
| "max_tokens": 512, |
| "temperature": 1.0 |
| } |
| } |
| ], |
| "passthrough": false |
| } |
| }, |
| "upstream": { |
| "type": "roundrobin", |
| "nodes": { |
| "httpbin.org": 1 |
| } |
| } |
| }' |
| ``` |