No description
  • Rust 77.3%
  • HTML 21.1%
  • JavaScript 1.3%
  • CSS 0.3%
Find a file
zaelgohary 7ef4f299b2
Some checks failed
Test / test (push) Failing after 3s
fix: migrate /css /js /fonts wildcards to axum 0.8 syntax
2026-05-10 16:47:47 +02:00
.forgejo/workflows ci: D-05 ONNX cross-compile pattern (build-linux.yaml + buildenv.sh) (#36) 2026-05-06 14:44:23 +00:00
.hero chore: add hero_builder artifacts and fix dep version strings to policy minimum 2026-05-10 13:39:45 +02:00
crates fix: migrate /css /js /fonts wildcards to axum 0.8 syntax 2026-05-10 16:47:47 +02:00
docs feat: remove auth module and add ensure_deps ONNX Runtime setup 2026-03-20 18:11:56 +01:00
.gitignore chore(deps): commit Cargo.lock and bump herolib to dev tip with logger 2026-05-03 15:04:09 +02:00
Cargo.lock chore: add hero_builder artifacts and fix dep version strings to policy minimum 2026-05-10 13:39:45 +02:00
Cargo.toml chore: add hero_builder artifacts and fix dep version strings to policy minimum 2026-05-10 13:39:45 +02:00
Cargo.toml.hero_builder_backup chore: add hero_builder artifacts and fix dep version strings to policy minimum 2026-05-10 13:39:45 +02:00
favicon.svg fix: update favicon.svg to match navbar search-heart icon 2026-02-10 16:17:28 -05:00
MACOS_ONNX_FIX.md docs: Add macOS ONNX Runtime library path fix documentation 2026-02-08 13:20:08 +04:00
MAKEFILE_ROBUSTNESS.md docs: Add Makefile robustness and validation documentation 2026-02-08 13:46:59 +04:00
OAUTH_DEBUG.md fix: Correct OAuth2 parameters for hero_auth integration 2026-02-10 22:31:10 +04:00
openrpc.json fix: absolute binary paths, graceful shutdown, rename client to SDK 2026-02-28 18:42:47 +03:00
PURPOSE.md fix: socket naming compliance, add PURPOSE.md 2026-05-07 11:42:25 +02:00
README.md docs: add deployment modes and daemon configuration documentation 2026-04-19 13:42:39 +02:00

HeroEmbedder

A fast, local embedding server for RAG applications. Provides dense vector embeddings, similarity search, and reranking via a JSON-RPC 2.0 API with namespace support for isolated document collections.

Architecture

hero_embedder/
├── crates/
│   ├── hero_embedder_lib/         # Library: server internals (ML, storage, retrieval)
│   ├── hero_embedder_server/      # Binary: JSON-RPC daemon (Unix socket)
│   ├── hero_embedder_sdk/         # Library: JSON-RPC client and types
│   ├── hero_embedder/             # Binary: CLI using the SDK
│   ├── hero_embedder_ui/          # Binary: Axum web dashboard using the SDK
│   └── hero_embedder_examples/    # Examples: SDK usage demonstrations
├── scripts/                       # Build and deployment scripts
├── Cargo.toml                     # Workspace root
├── Makefile                       # Build orchestration
└── buildenv.sh                    # Environment configuration

Dependency Graph

hero_embedder_server
        ↑
hero_embedder_lib (server internals)

hero_embedder_sdk (JSON-RPC client)
        ↑        ↑        ↑
        │        │        │
hero_embedder   hero_embedder_ui   hero_embedder_examples

Deployment Modes

hero_embedder can be deployed in three modes controlled by flags on the service_embedder Nu script. Choose the mode that fits your setup.

All-in-one (default)

All three processes run under the same user in a single hero_proc service (hero_embedder). Good for a standalone development machine or single-user node where memory is not a concern.

service_embedder start           # register + start all three
service_embedder status          # query "hero_embedder" service
service_embedder stop            # stop + unregister

--embedderd — ONNX daemon only

Starts only hero_embedderd (TCP, loads all ONNX models) under service hero_embedderd. Run as root to share the loaded models across every tenant process on the host and to minimise total RAM usage.

service_embedder start --embedderd --root    # daemon only, root's hero_proc
service_embedder status --embedderd --root
service_embedder stop  --embedderd --root

After the daemon comes up the script:

  1. Polls http://127.0.0.1:<port>/health (up to 60 s) to confirm the models finished loading.
  2. Reads the node's mycelium IPv6 address via mycelium address --root and prints the external URL other mycelium peers can use through hero_router.

Note: hero_embedderd currently binds 127.0.0.1 only. For cross-machine mycelium access, configure hero_router to proxy the TCP port and use the printed URL as --embedderd-url on the client node.

Environment variable set by the action:

Variable Value
HERO_EMBEDDERD_PORT TCP port (default 8092)

--userspace — server + UI only

Starts hero_embedder_server + hero_embedder_ui under service hero_embedder_userspace, delegating all embed/rerank work to an already- running hero_embedderd. No ONNX models are loaded in this process — memory footprint is a fraction of the full stack.

# Same machine — delegates to root's embedderd on 127.0.0.1:8092
service_embedder start --userspace

# Cross-machine — embedderd reachable via mycelium through hero_router
service_embedder start --userspace \
    --embedderd-url http://[<mycelium_ipv6>]:8092

service_embedder status --userspace
service_embedder stop   --userspace

Environment variable set by the action:

Variable Value
HERO_EMBEDDERD_URL URL of the running hero_embedderd (default http://127.0.0.1:8092)

Typical split-mode deployment on one host

Root layer (heavy, shared):
  service_embedder start --embedderd --root
    └─ hero_embedderd  binds 127.0.0.1:8092, loads all ONNX models

Userspace layer (lightweight, per-tenant):
  service_embedder start --userspace
    ├─ hero_embedder_server  Unix socket, delegates embed/rerank to root
    └─ hero_embedder_ui      Unix socket, admin dashboard

This pattern lets you run many tenant instances while paying the model-load cost only once.


Sockets

Service Socket Path Type
Server $HERO_SOCKET_DIR/hero_embedder/rpc.sock Unix Socket (OpenRPC / JSON-RPC 2.0)
UI $HERO_SOCKET_DIR/hero_embedder/ui.sock Unix Socket (HTTP admin dashboard)
Daemon TCP 127.0.0.1:8092 (configurable) HTTP JSON-RPC + /health

All server/UI sockets are Unix sockets only. External access is provided by hero_proxy. The daemon TCP port is intended for loopback use; cross-node access goes through hero_router.

Features

  • Embedding Generation: BGE models (small/base) with INT8/FP32 options
  • Semantic Search: Fast cosine similarity search
  • Reranking: Cross-encoder model for improved accuracy
  • Namespaces: Isolated document collections for multi-tenant use
  • Persistence: Documents stored in redb databases
  • Web UI: Bootstrap-based admin dashboard with live updates

Quick Start

# Full setup: install deps, download models, build, install
make setup

# Run server + UI
make run

# CLI health check
hero_embedder health

Quality Levels

Quality is set per namespace when creating it. All 4 models are loaded at startup.

Level Name Model Weights Embeddings Dimensions Use Case
1 Fast bge-small INT8 INT8 384 Real-time, low latency
2 Balanced bge-small FP32 FP16 384 Default, good balance
3 Quality bge-base INT8 INT8 768 Better accuracy
4 Best bge-base FP32 FP16 768 Maximum quality

API

JSON-RPC 2.0 endpoint at POST /rpc

Server Info

{"jsonrpc": "2.0", "id": 1, "method": "info", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "health", "params": []}

Embedding

{"jsonrpc": "2.0", "id": 1, "method": "embed", "params": [["hello world", "another text"]]}

Index Management

{"jsonrpc": "2.0", "id": 1, "method": "index.add", "params": [[{"id": "doc1", "text": "hello"}], "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.get", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.delete", "params": ["doc1", "namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.count", "params": ["namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "index.clear", "params": ["namespace"]}
{"jsonrpc": "2.0", "id": 1, "method": "search", "params": ["query text", 10, "namespace", true]}

Rerank

{"jsonrpc": "2.0", "id": 1, "method": "rerank", "params": ["query", [{"id": "1", "text": "..."}], 5]}

Namespaces

{"jsonrpc": "2.0", "id": 1, "method": "namespace.list", "params": []}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.create", "params": ["my-docs", 2]}
{"jsonrpc": "2.0", "id": 1, "method": "namespace.delete", "params": ["my-docs"]}

CLI Client

hero_embedder health
hero_embedder stats
hero_embedder embed "hello world"
hero_embedder search "query" -k 10
hero_embedder add doc1 "document text"
hero_embedder ns-list
hero_embedder ns-create my-docs

SDK Usage (Rust)

use hero_embedder_sdk::HeroEmbedderClient;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let socket = format!("{}/hero/var/sockets/hero_embedder/rpc.sock",
        std::env::var("HOME")?);
    let client = HeroEmbedderClient::new(format!("unix://{socket}"));

    let results = client.search("hello", 10, None, None).await?;
    Ok(())
}

Environment Variables

Variable Default Description
EMBEDDER_MODELS ~/hero/var/embedder/models Models directory
EMBEDDER_DATA ~/hero/var/embedder/data Data directory
HERO_EMBEDDERD_PORT 8092 TCP port hero_embedderd listens on
HERO_EMBEDDERD_URL http://127.0.0.1:8092 URL hero_embedder_server uses to reach the daemon
HERO_SOCKET_DIR ~/hero/var/sockets Base directory for Unix sockets

Data Storage

~/hero/var/embedder/
├── models/
│   ├── bge-small/
│   ├── bge-base/
│   └── bge-reranker-base/
└── data/
    ├── default/
    │   └── q2/
    │       └── rag.redb
    └── corpus.redb

Building

make build          # Release build
make check          # Fast code check
make test           # Unit tests
make lint           # Clippy linter
make run            # Full stack (server + UI)
make run-server     # Server only
make run-ui         # UI only
make stop           # Stop all services

License

MIT