- Rust 70.1%
- Shell 14.6%
- HTML 14%
- Makefile 1.3%
|
Some checks failed
Build macOS / build-macos (macos-arm64, aarch64-apple-darwin) (push) Waiting to run
Build macOS / build-macos (macos-amd64, x86_64-apple-darwin) (push) Waiting to run
Build Linux / build-linux (linux-amd64, false, x86_64-unknown-linux-musl) (push) Failing after 14s
Build Linux / build-linux (linux-arm64, true, aarch64-unknown-linux-gnu) (push) Failing after 17s
Build and Test / build-and-test (push) Successful in 3m15s
Reviewed-on: #9 |
||
|---|---|---|
| .forgejo/workflows | ||
| crates | ||
| docs | ||
| scripts | ||
| .env.example | ||
| .gitignore | ||
| buildenv.sh | ||
| Cargo.toml | ||
| Makefile | ||
| mcp_servers.example.json | ||
| mcp_servers.json | ||
| modelsconfig.yml | ||
| openrpc.json | ||
| README.md | ||
AI Broker
A lightweight LLM request broker with an OpenAI-compatible API that intelligently routes requests to multiple LLM providers with cost-aware strategies.
Features
- OpenAI-Compatible API - Drop-in replacement for OpenAI clients
- Multi-Provider Support - OpenAI, OpenRouter, Groq, SambaNova
- Smart Routing - Automatic model selection based on cost or quality
- Cost Tracking - Per-request cost calculation and tracking
- Request Tracking - Detailed per-IP request tracking with timestamps and durations
- Streaming Support - Real-time streaming responses via SSE
- MCP Broker - Aggregate tools from multiple MCP (Model Context Protocol) servers
- Rate Limiting - Per-IP rate limiting with configurable limits
- Audio APIs - Text-to-speech and speech-to-text support (Groq, SambaNova, OpenAI)
- Config-Based Audio Models - STT/TTS models defined in
modelsconfig.ymlwith automatic fallback - Embeddings - Vector embedding generation
- 37 Chat Models - Latest Claude 4.x, Gemini 3, GPT-5.2, o3-mini, Grok 4.1, Kimi K2.5
- 2 Audio Models - Whisper STT (Groq/SambaNova/OpenAI), OpenAI TTS
- Persistent Billing - SQLite-based request logging for billing and analytics
- API Key Support - Optional API key authentication system
Project Structure (HERO Architecture)
aibroker/
├── crates/
│ ├── hero_aibroker_sdk/ # SDK library - shared types, protocols, RPC client
│ ├── llmbroker/ # Server library & binary (hero_aibroker_server)
│ ├── llmbroker_cli/ # CLI client (hero_aibroker binary)
│ ├── hero_aibroker_ui/ # Admin UI - Axum web dashboard
│ ├── hero_aibroker_rhai/ # Rhai scripting bindings
│ ├── mcp-common/ # Shared MCP utilities
│ ├── mcp-ping/ # MCP ping test server
│ ├── mcp-serpapi/ # SerpAPI search MCP server
│ ├── mcp-serper/ # Serper search MCP server
│ ├── mcp-exa/ # Exa search MCP server
│ ├── mcp-scraperapi/ # ScraperAPI MCP server
│ └── mcp-scrapfly/ # Scrapfly MCP server
├── modelsconfig.yml # Model definitions and pricing
└── mcp_servers.json # MCP server configuration
Dependency Graph
hero_aibroker_sdk (no internal dependencies)
↑ ↑ ↑ ↑
| | | |
server CLI UI rhai
Architecture Follows HERO Crate Standards:
- SDK: pure library with types, protocols, and client
- Server: binary exposing JSON-RPC/OpenAI API on port 8080
- CLI: command-line client for interactive use
- UI: admin dashboard (separate binary)
- Rhai: scripting integration for automation
Quick Start
Prerequisites
- Rust 1.70 or later
- At least one LLM provider API key
Environment Variables
This project requires API keys to be set as environment variables. Source your env file before running:
source ~/.config/env.sh # or wherever you keep your secrets
Required variables (at least one provider):
Optional variables:
SAMBANOVA_API_KEY— SambaNova API key (get one)OPENAI_API_KEY— OpenAI API key
See .env.example for the full list of supported variables.
Run
source ~/.config/env.sh
make run
The server will start on http://127.0.0.1:3385 by default.
Configuration
All configuration is via environment variables (no .env files loaded by the application):
| Variable | Default | Description |
|---|---|---|
HOST |
127.0.0.1 |
Server bind address |
PORT |
3385 |
Server port |
ROUTING_STRATEGY |
cheapest |
cheapest or best |
MCP_CONFIG_PATH |
— | Path to MCP server config JSON |
ADMIN_TOKEN |
— | Simple admin auth token |
HERO_SECRET |
— | Hero Auth JWT secret |
Both singular (GROQ_API_KEY) and plural (GROQ_API_KEYS) env var names are accepted. Use comma-separated values for multiple keys per provider.
Multiple API Keys
The broker supports multiple API keys per provider for load distribution, higher rate limits, and automatic failover. When multiple keys are configured, the broker creates separate provider instances (e.g., openai-0, openai-1) and distributes requests across them.
API Reference
Chat Completions
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Text-to-Speech
Generate speech from text using OpenAI TTS models:
curl http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, world!",
"voice": "alloy"
}' \
--output speech.mp3
Available TTS Models:
tts-1- Standard quality (OpenAI only)tts-1-hd- High definition (OpenAI only)
Note: TTS requires OPENAI_API_KEY set in your environment
Speech-to-Text
Transcribe audio using Whisper models with automatic provider fallback:
curl http://localhost:8080/v1/audio/transcriptions \
-F "file=@audio.mp3" \
-F "model=whisper-1"
Available STT Models:
whisper-1- Standard Whisper model with multi-provider support- Priority 1: Groq (whisper-large-v3) - $0.111/hr
- Priority 2: SambaNova (whisper-large-v3) - FREE
- Priority 3: OpenAI (whisper-1) - $0.006/min
whisper-large-v3- Direct access to Whisper Large v3- Priority 1: Groq - $0.111/hr
- Priority 2: SambaNova - FREE
The system automatically tries providers in priority order (cheapest first) with fallback support.
Embeddings
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Hello, world!"
}'
List Models
curl http://localhost:8080/v1/models
Metrics
# Basic metrics
curl http://localhost:8080/metrics
# Detailed metrics with per-IP tracking
curl http://localhost:8080/metrics/detailed
The detailed metrics endpoint provides:
- Total requests and errors per IP
- First and last request timestamps
- Currently active (in-flight) requests
- Recent request history (last 10 per IP) including:
- Request start and finish timestamps
- Model used
- Request duration
- Success/error status
Billing & Usage
# View all IP usage and costs
curl http://localhost:8080/billing/usage
# View specific IP usage
curl http://localhost:8080/billing/usage/127.0.0.1
All requests are persisted to SQLite (requests.db) with:
- IP address and model used
- Token usage (input/output)
- Costs in USD (calculated per-request)
- Timestamps and duration
- Success/error status
Export billing data:
# Export to CSV
sqlite3 -header -csv requests.db "SELECT * FROM request_logs;" > billing.csv
# Query specific IP
sqlite3 requests.db "SELECT * FROM request_logs WHERE ip='X.X.X.X';"
MCP Tools
# List all available tools
curl http://localhost:8080/mcp/tools
# Call a specific tool
curl http://localhost:8080/mcp/tools/search \
-H "Content-Type: application/json" \
-d '{"query": "rust programming"}'
Client Examples
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # Key is configured on the server
)
response = client.chat.completions.create(
model="gpt4o",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Streaming
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
stream = client.chat.completions.create(
model="gpt4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Running Services
Start the Server
# Release build (production) - TCP mode (default)
make run
# Debug build with logging
make rundev
# Or manually on TCP
cargo run --release --bin hero_aibroker_server -- --port 8080
# Or with Unix socket (creates ~/hero/var/sockets/hero_aibroker_server.sock)
SOCKET_PATH=~/hero/var/sockets/hero_aibroker_server.sock \
cargo run --release --bin hero_aibroker_server
Server Binding Modes:
- TCP (default):
http://127.0.0.1:8080- use when you want HTTP access - Unix Socket:
~/hero/var/sockets/hero_aibroker_server.sock- use for local-only access via socket
Set SOCKET_PATH environment variable to enable Unix socket mode.
Exposed Endpoints:
- OpenAI-compatible API endpoints (
/v1/*) - JSON-RPC admin interface (
/rpc) - Health check (
/health) - Metrics (
/metrics) - OpenRPC specification (
/openrpc.json)
Start the Admin UI
# Terminal 1: Start the server
cargo run --release --bin hero_aibroker_server
# Terminal 2: Start the UI (connects to server via HTTP)
BROKER_URL=http://localhost:8080 cargo run --release --bin hero_aibroker_ui
The admin UI will be available at http://127.0.0.1:3000 and provides:
- Chat interface
- Model management
- MCP tool integration
- Request/usage metrics
- Real-time logs
CLI Usage
# Interactive chat
cargo run --bin hero_aibroker -- chat --model gpt4o
# With global model option
cargo run --bin hero_aibroker -- --model deepseek-chat chat
# With custom retry attempts
cargo run --bin hero_aibroker -- --max-retries 5 chat
# List available models
cargo run --bin hero_aibroker -- models
# List MCP tools
cargo run --bin hero_aibroker -- tools
# Check server health
cargo run --bin hero_aibroker -- health
# Or use the make target
make cli
CLI Options
Global Options:
-u, --url <URL>- LLM Broker server URL (default: http://localhost:8080)-m, --model <MODEL>- Model to use (can be overridden per command)--max-retries <MAX_RETRIES>- Maximum number of retries on failure (default: 3)
Chat Command:
-m, --model <MODEL>- Model to use (overrides global --model)-s, --stream- Enable streaming (default: true)
The CLI automatically retries failed requests with exponential backoff, making it resilient to temporary network issues or server errors.
Model Configuration
Models are configured in modelsconfig.yml:
models:
gpt4o:
display_name: "GPT-4o"
tier: premium
capabilities:
- tool_calling
- vision
context_window: 128000
backends:
- provider: openai
model_id: gpt-4o
priority: 1
input_cost: 2.5 # per million tokens
output_cost: 10.0
Auto Model Selection
Use special model names for automatic selection:
| Model Name | Description |
|---|---|
auto |
Use the configured routing strategy |
autocheapest |
Select the cheapest available model |
autobest |
Select the best premium model |
MCP Integration
The broker can aggregate tools from multiple MCP (Model Context Protocol) servers. Configure servers in mcp_servers.json:
{
"mcpServers": [
{
"name": "search",
"command": "cargo",
"args": ["run", "--bin", "mcp-serper"],
"transport": "stdio"
},
{
"name": "scraper",
"url": "http://localhost:3001/sse",
"transport": "sse"
}
]
}
MCP Endpoints
| Endpoint | Description |
|---|---|
GET /mcp/tools |
List all aggregated tools |
POST /mcp/tools/:name |
Call a specific tool |
GET /mcp/sse |
SSE endpoint for MCP clients |
Included MCP Servers
- mcp-serper - Web search via Serper API
- mcp-serpapi - Web search via SerpAPI
- mcp-exa - Semantic search via Exa
- mcp-scraperapi - Web scraping via ScraperAPI
- mcp-scrapfly - Web scraping via Scrapfly
- mcp-ping - Simple ping server for testing
Architecture
┌─────────────────────────────────────────────────────┐
│ API Layer │
│ (OpenAI-compatible endpoints: chat, tts, stt, etc) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Service Layer │
│ (Routing logic, model selection, cost calculation) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Registry Layer │
│ (Model catalog, backend resolution, pricing) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────┐
│ Provider Layer │
│ (OpenAI, Groq, SambaNova, OpenRouter adapters) │
└─────────────────────────────────────────────────────┘
Development
Building
# Debug build
cargo build
# Release build
cargo build --release
# Build specific crate
cargo build -p llmbroker
cargo build -p llmbroker_cli
Running Tests
# Run all tests
cargo test
# Run tests for specific crate
cargo test -p llmbroker
Running Individual MCP Servers
# Run the Serper search server
SERPER_API_KEY=your-key cargo run --bin mcp-serper
# Run the ping test server
cargo run --bin mcp-ping
Documentation
Comprehensive documentation is available in the docs/ directory:
| Document | Description |
|---|---|
| Architecture | System architecture and design principles |
| Technical Specs | Requirements and specifications |
| Component Design | Detailed component documentation |
| API Reference | Complete API documentation |
| MCP Integration | MCP tool integration guide |
| Data Flow | Request/response data flows |
| Deployment Guide | Production deployment guide |
License
MIT License