Connect to your Burr agent using your favorite open-source tooling! This includes:
The OpenAI API allows you to send HTTP requests to large language models (LLMs) such as GPT-4. When interacting with ChatGPT, we're using the API endpoint v1/chat/completions, but there are many others:
v1/embeddings get an embedding, i.e., a vector representation of the input textv1/audio/transcriptions uses the Whisper model to convert audio to textv1/audio/speech convert text to audiov1/images/generations uses DALL-E to generate images from a text promptOther LLM providers (e.g., Cohere, HuggingFace) have their own set of endpoints. But given the influence of OpenAI, many open-source tools include a “OpenAI API-compatible” version. By creating a server that implements endpoints respecting the request and response formats, we can directly interface with them!
This example contains a very simple Burr application (application.py) and a FastAPI server to deploy this agent behind the OpenAI v1/chat/completions endpoint. After starting the server with server.py, you should be able to interact with it from your other tools (Jan is easy and quick to install across platforms).
To run, execute:
python server.py
If you're using Jan, untoggle the Stream parameter (we will add an example of a stream-compatible application later).
This is great because we can quickly integrate our Burr Agent with high-quality UIs and tools. Simulaneously, you gain Burr's observability, logging, and persistence across your applications.
Most tools save state “in the frontend” because they don't have access to the official OpenAI backend. This means that each of your LLM applications are isolated. By using Burr and persisting state (e.g., chat history) on the backend, this means you can share and resume a conversation between your LLM frontend, CLI, or coding assistant!
There are multiple implementations of OpenAI API-compatible servers. Here are some notable examples: