title: ai-proxy-multi keywords:

  • Apache APISIX
  • API Gateway
  • Plugin
  • ai-proxy-multi description: This document contains information about the Apache APISIX ai-proxy-multi Plugin.

Description

The ai-proxy-multi plugin simplifies access to LLM providers and models by defining a standard request format that allows key fields in plugin configuration to be embedded into the request.

This plugin adds additional features like load balancing and retries to the existing ai-proxy plugin.

Proxying requests to OpenAI is supported now. Other LLM services will be supported soon.

Request Format

OpenAI

  • Chat API
NameTypeRequiredDescription
messagesArrayYesAn array of message objects
messages.roleStringYesRole of the message (system, user, assistant)
messages.contentStringYesContent of the message

Plugin Attributes

NameRequiredTypeDescriptionDefault
providersYesarrayList of AI providers, each following the provider schema.
provider.nameYesstringName of the AI service provider. Allowed values: openai, deepseek.
provider.modelYesstringName of the AI model to execute. Example: gpt-4o.
provider.priorityNointegerPriority of the provider for load balancing.0
provider.weightNointegerLoad balancing weight.
balancer.algorithmNostringLoad balancing algorithm. Allowed values: chash, roundrobin.roundrobin
balancer.hash_onNostringDefines what to hash on for consistent hashing (vars, header, cookie, consumer, vars_combinations).vars
balancer.keyNostringKey for consistent hashing in dynamic load balancing.
provider.authYesobjectAuthentication details, including headers and query parameters.
provider.auth.headerNoobjectAuthentication details sent via headers. Header name must match ^[a-zA-Z0-9._-]+$.
provider.auth.queryNoobjectAuthentication details sent via query parameters. Keys must match ^[a-zA-Z0-9._-]+$.
provider.override.endpointNostringCustom host override for the AI provider.
timeoutNointegerRequest timeout in milliseconds (1-60000).30000
keepaliveNobooleanEnables keepalive connections.true
keepalive_timeoutNointegerTimeout for keepalive connections (minimum 1000ms).60000
keepalive_poolNointegerMaximum keepalive connections.30
ssl_verifyNobooleanEnables SSL certificate verification.true

Example usage

Create a route with the ai-proxy-multi plugin like so:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-proxy-multi-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "providers": [
          {
            "name": "openai",
            "model": "gpt-4",
            "weight": 1,
            "priority": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
                "max_tokens": 512,
                "temperature": 1.0
            }
          },
          {
            "name": "deepseek",
            "model": "deepseek-chat",
            "weight": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
                "max_tokens": 512,
                "temperature": 1.0
            }
          }
        ]
      }
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org": 1
      }
    }
  }'

In the above configuration, requests will be equally balanced among the openai and deepseek providers.

Retry and fallback:

The priority attribute can be adjusted to implement the fallback and retry feature.

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-proxy-multi-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "providers": [
          {
            "name": "openai",
            "model": "gpt-4",
            "weight": 1,
            "priority": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
                "max_tokens": 512,
                "temperature": 1.0
            }
          },
          {
            "name": "deepseek",
            "model": "deepseek-chat",
            "weight": 1,
            "priority": 0,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
                "max_tokens": 512,
                "temperature": 1.0
            }
          }
        ]
      }
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org": 1
      }
    }
  }'

In the above configuration priority for the deepseek provider is set to 0. Which means if openai provider is unavailable then ai-proxy-multi plugin will retry sending request to deepseek in the second attempt.

Send request to an OpenAI compatible LLM

Create a route with the ai-proxy-multi plugin with provider.name set to openai-compatible and the endpoint of the model set to provider.override.endpoint like so:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-proxy-multi-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy-multi": {
        "providers": [
          {
            "name": "openai-compatible",
            "model": "qwen-plus",
            "weight": 1,
            "priority": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "override": {
              "endpoint": "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
            }
          },
          {
            "name": "deepseek",
            "model": "deepseek-chat",
            "weight": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
              }
            },
            "options": {
                "max_tokens": 512,
                "temperature": 1.0
            }
          }
        ],
        "passthrough": false
      }
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org": 1
      }
    }
  }'