Burr with an OpenAI API -compatible server

Connect to your Burr agent using your favorite open-source tooling! This includes:

Local chat frontends such as Ollama, Jan, text-generation-webui
Code assistants like Continue, llm-vscode by HuggingFace
Other productivity tools like the llm command line tool

Context

The OpenAI API allows you to send HTTP requests to large language models (LLMs) such as GPT-4. When interacting with ChatGPT, we're using the API endpoint v1/chat/completions, but there are many others:

v1/embeddings get an embedding, i.e., a vector representation of the input text
v1/audio/transcriptions uses the Whisper model to convert audio to text
v1/audio/speech convert text to audio
v1/images/generations uses DALL-E to generate images from a text prompt

Other LLM providers (e.g., Cohere, HuggingFace) have their own set of endpoints. But given the influence of OpenAI, many open-source tools include a “OpenAI API-compatible” version. By creating a server that implements endpoints respecting the request and response formats, we can directly interface with them!

OpenAI API compatible Burr application

This example contains a very simple Burr application (application.py) and a FastAPI server to deploy this agent behind the OpenAI v1/chat/completions endpoint. After starting the server with server.py, you should be able to interact with it from your other tools (Jan is easy and quick to install across platforms).

To run, execute:

python server.py

If you're using Jan, untoggle the Stream parameter (we will add an example of a stream-compatible application later).

This is great because we can quickly integrate our Burr Agent with high-quality UIs and tools. Simulaneously, you gain Burr's observability, logging, and persistence across your applications.

Most tools save state “in the frontend” because they don't have access to the official OpenAI backend. This means that each of your LLM applications are isolated. By using Burr and persisting state (e.g., chat history) on the backend, this means you can share and resume a conversation between your LLM frontend, CLI, or coding assistant!

Resources

There are multiple implementations of OpenAI API-compatible servers. Here are some notable examples: