No description
| crates | ||
| docs | ||
| .env.example | ||
| .gitignore | ||
| Cargo.toml | ||
| Makefile | ||
| mcp_servers.example.json | ||
| mcp_servers.json | ||
| modelsconfig.yml | ||
| README.md | ||
AI Broker
A lightweight LLM request broker with an OpenAI-compatible API that intelligently routes requests to multiple LLM providers with cost-aware strategies.
Features
- OpenAI-Compatible API - Drop-in replacement for OpenAI clients
- Multi-Provider Support - OpenAI, OpenRouter, Groq, SambaNova
- Smart Routing - Automatic model selection based on cost or quality
- Cost Tracking - Per-request cost calculation and tracking
- Streaming Support - Real-time streaming responses via SSE
- MCP Broker - Aggregate tools from multiple MCP (Model Context Protocol) servers
- Rate Limiting - Per-IP rate limiting with configurable limits
- Audio APIs - Text-to-speech and speech-to-text support
- Embeddings - Vector embedding generation
Project Structure
aibroker/
├── crates/
│ ├── llmbroker/ # Main server library and binary
│ ├── llmbroker-cli/ # Command-line interface
│ ├── mcp-common/ # Shared MCP utilities
│ ├── mcp-ping/ # MCP ping test server
│ ├── mcp-serpapi/ # SerpAPI search MCP server
│ ├── mcp-serper/ # Serper search MCP server
│ ├── mcp-exa/ # Exa search MCP server
│ ├── mcp-scraperapi/ # ScraperAPI MCP server
│ └── mcp-scrapfly/ # Scrapfly MCP server
├── modelsconfig.yml # Model definitions and pricing
└── mcp_servers.json # MCP server configuration
Quick Start
Prerequisites
- Rust 1.70 or later
- At least one LLM provider API key
Installation
# Clone the repository
git clone https://github.com/your-org/aibroker
cd aibroker
# Copy example configuration
cp .env.example .env
# Edit .env with your API keys
# Build and run
cargo run --release
The server will start on http://127.0.0.1:8080 by default.
Configuration
Configure the broker using environment variables or a .env file:
# Server settings
HOST=127.0.0.1
PORT=8080
# LLM Provider API Keys
OPENROUTER_API_KEY=sk-or-v1-...
GROQ_API_KEY=gsk_...
SAMBANOVA_API_KEY=...
# Routing strategy: "cheapest" or "best"
ROUTING_STRATEGY=cheapest
# MCP Configuration (optional)
MCP_CONFIG_PATH=mcp_servers.json
# MCP Search Tool API Keys (optional)
SERPER_API_KEY=...
SERPAPI_API_KEY=...
EXA_API_KEY=...
SCRAPERAPI_API_KEY=...
SCRAPFLY_API_KEY=...
API Reference
Chat Completions
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Text-to-Speech
curl http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, world!",
"voice": "alloy"
}' \
--output speech.mp3
Speech-to-Text
curl http://localhost:8080/v1/audio/transcriptions \
-F "file=@audio.mp3" \
-F "model=whisper-1"
Embeddings
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Hello, world!"
}'
List Models
curl http://localhost:8080/v1/models
MCP Tools
# List all available tools
curl http://localhost:8080/mcp/tools
# Call a specific tool
curl http://localhost:8080/mcp/tools/search \
-H "Content-Type: application/json" \
-d '{"query": "rust programming"}'
Client Examples
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # Key is configured on the server
)
response = client.chat.completions.create(
model="gpt4o",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Streaming
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
stream = client.chat.completions.create(
model="gpt4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
CLI Usage
# Interactive chat
cargo run --bin llmbroker-cli -- chat --model gpt4o
# List available models
cargo run --bin llmbroker-cli -- models
# List MCP tools
cargo run --bin llmbroker-cli -- tools
# Check server health
cargo run --bin llmbroker-cli -- health
Model Configuration
Models are configured in modelsconfig.yml:
models:
gpt4o:
display_name: "GPT-4o"
tier: premium
capabilities:
- tool_calling
- vision
context_window: 128000
backends:
- provider: openai
model_id: gpt-4o
priority: 1
input_cost: 2.5 # per million tokens
output_cost: 10.0
Auto Model Selection
Use special model names for automatic selection:
| Model Name | Description |
|---|---|
auto |
Use the configured routing strategy |
autocheapest |
Select the cheapest available model |
autobest |
Select the best premium model |
MCP Integration
The broker can aggregate tools from multiple MCP (Model Context Protocol) servers. Configure servers in mcp_servers.json:
{
"mcpServers": [
{
"name": "search",
"command": "cargo",
"args": ["run", "--bin", "mcp-serper"],
"transport": "stdio"
},
{
"name": "scraper",
"url": "http://localhost:3001/sse",
"transport": "sse"
}
]
}
MCP Endpoints
| Endpoint | Description |
|---|---|
GET /mcp/tools |
List all aggregated tools |
POST /mcp/tools/:name |
Call a specific tool |
GET /mcp/sse |
SSE endpoint for MCP clients |
Included MCP Servers
- mcp-serper - Web search via Serper API
- mcp-serpapi - Web search via SerpAPI
- mcp-exa - Semantic search via Exa
- mcp-scraperapi - Web scraping via ScraperAPI
- mcp-scrapfly - Web scraping via Scrapfly
- mcp-ping - Simple ping server for testing
Architecture
┌─────────────────────────────────────────────────────┐
│ API Layer │
│ (OpenAI-compatible endpoints: chat, tts, stt, etc) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Service Layer │
│ (Routing logic, model selection, cost calculation) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Registry Layer │
│ (Model catalog, backend resolution, pricing) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Provider Layer │
│ (OpenAI, Groq, SambaNova, OpenRouter adapters) │
└─────────────────────────────────────────────────────┘
Development
Building
# Debug build
cargo build
# Release build
cargo build --release
# Build specific crate
cargo build -p llmbroker
cargo build -p llmbroker-cli
Running Tests
# Run all tests
cargo test
# Run tests for specific crate
cargo test -p llmbroker
Running Individual MCP Servers
# Run the Serper search server
SERPER_API_KEY=your-key cargo run --bin mcp-serper
# Run the ping test server
cargo run --bin mcp-ping
Documentation
Comprehensive documentation is available in the docs/ directory:
| Document | Description |
|---|---|
| Architecture | System architecture and design principles |
| Technical Specs | Requirements and specifications |
| Component Design | Detailed component documentation |
| API Reference | Complete API documentation |
| MCP Integration | MCP tool integration guide |
| Data Flow | Request/response data flows |
| Deployment Guide | Production deployment guide |
License
MIT License