No description
  • Rust 47.5%
  • JavaScript 26.5%
  • HTML 14%
  • CSS 11.5%
  • Python 0.5%
Find a file
despiegk 7851fbdca0 chore: migrate to hero_admin_lib shared assets
Replace duplicated Bootstrap, Bootstrap Icons, Unpoly, highlight.js, marked,
ansi_up, Chart.js and connection-status.js with shared versions from
hero_admin_lib. Add /static/shared/{*path} route. Remove ~10-13 MB of
duplicated static files per crate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 16:19:59 +02:00
.hero chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
crates chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
Cargo.lock chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
Cargo.toml chore: migrate to hero_admin_lib shared assets 2026-05-10 16:19:59 +02:00
Cargo.toml.hero_builder_backup Complete hero_runner_py implementation with documentation and admin UI 2026-05-08 23:14:15 +02:00
PURPOSE.md Complete hero_runner_py implementation with documentation and admin UI 2026-05-08 23:14:15 +02:00
README.md Update README with runtime lifecycle, admin UI, and testing details 2026-05-08 23:31:59 +02:00
SPEC.md Initial hero_runner_py project structure 2026-05-08 22:30:16 +02:00

hero_runner_py

A pre-fork Python script execution service for the Hero stack. It manages isolated Python virtual environments (via uv) and executes scripts with live log streaming, persistent session tracking, and timeout enforcement — all with zero interpreter startup overhead thanks to a pre-forked worker pool.


Table of Contents


What it does

hero_runner_py lets callers submit a Python script (inline code or a file path) and get back:

  • Live stdout/stderr streamed as Server-Sent Events while the script runs
  • A final result (exit code, stdout, stderr, success flag) stored persistently
  • Session history browsable via the JSON-RPC API or the admin UI
  • Timeout enforcement — scripts are hard-killed after timeout_ms milliseconds, even across a POSIX fork boundary

All of this is delivered over a Unix Domain Socket using JSON-RPC 2.0.


Architecture

                ┌─────────────────────────────────────┐
                │          parent process              │
                │  (multi-threaded tokio/axum server)  │
                │                                      │
                │  JSON-RPC 2.0 over UDS               │
                │  SessionManager  RuntimeManager      │
                │  SSE fan-out     hero_log persist     │
                └──────────────┬──────────────────────┘
                               │  UnixStream socket pair
                 ┌─────────────▼──────────────┐
                 │       worker process        │  ← pre-forked, single-threaded
                 │  (one per HERO_RUNNER_PY_WORKERS)  │
                 └──────────────┬─────────────┘
                                │  fork per job
                  ┌─────────────▼──────────────┐
                  │      grandchild process      │  ← isolated per script
                  │  runner::execute()           │
                  │  select(2) I/O loop          │
                  │  SIGALRM timeout handler     │
                  └──────────────┬──────────────┘
                                 │  spawn
                   ┌─────────────▼──────────────┐
                   │       Python process         │  ← own process group (setpgid)
                   │  uv venv interpreter         │
                   └─────────────────────────────┘

Three-level fork hierarchy

Level Process Responsibility
Parent tokio/axum Accept connections, route RPC, manage sessions, fan-out SSE
Worker single-threaded Hold a socket pair; fork a grandchild per job; forward frames
Grandchild single-threaded Run runner::execute; stream WorkerFrame records back

IPC protocol

All pipes and UnixStream channels use length-prefixed JSON:

[4 bytes: body length as u32 little-endian][JSON body bytes]

The grandchild emits a sequence of WorkerFrame records per job:

Started  { pid: i32 }             ← first frame
Log      { line: LogLine }        ← zero or more, as stdout/stderr arrives
Result   { result: ScriptResult } ← final, terminal frame

Runtime layout

$HERO_PYTHON_RUNTIMES_DIR/
  <name>/
    runtime.toml   ← metadata (name, python_version, modules, paths, timestamps,
                               installing, operational, smoke_session_id)
    bin/python     ← uv-created venv interpreter
    lib/           ← installed packages
    ...

Each runtime is a complete uv virtual environment created in its own named directory.

The runtime.toml tracks three lifecycle flags:

Field Meaning
installing true while uv is downloading Python / creating the venv / installing packages
operational true after a successful smoke-test (print('hello world'))
smoke_session_id Session ID of a failed smoke-test, kept for log inspection

Socket layout

$HERO_SOCKET_DIR/hero_runner_py/
  rpc.sock       ← JSON-RPC 2.0 endpoint
  discovery.sock ← service discovery (name, version)

Crates

Crate Binary Purpose
hero_runner_py_server hero_runner_py_server Core service: worker pool, JSON-RPC, SSE, runtime management
hero_runner_py_admin hero_runner_py_admin Web UI proxy — serves a dashboard over HTTP
hero_runner_py_tests Integration test suite

Source map

hero_runner_py_server/src/

File Purpose
main.rs Entry point: uv check → pool fork → tokio runtime → default runtime init → smoke test → axum serve
lib.rs Crate root, module declarations, public re-exports
types.rs ScriptRequest, ScriptResult, WorkerFrame, LogLine, LogKind, ScriptKind
runner.rs Fork-safe Python subprocess execution (select(2) loop, SIGALRM, process group kill)
worker.rs Worker process loop: receive requests, fork grandchildren, forward frames
pool.rs WorkerPool — manages worker sockets, dispatches jobs, collects results
session.rs SessionManager — session lifecycle, hero_db IDs, tokio broadcast for SSE, delete support
ipc.rs write_msg / read_msg — length-prefixed JSON framing
runtime.rs RuntimeManager — stateless reader/writer of HERO_PYTHON_RUNTIMES_DIR; tracks installing/operational state
uv.rs uv wrappers: create_venv, pip_install, ensure_python, python_interpreter
openrpc.rs AppState, rpc_router, all JSON-RPC method handlers
sockets.rs UDS helpers: socket_dir, service_socket_dir, socket_path, bind_unix_socket
sse.rs GET /sse/session/{id} — SSE fan-out from SessionManager broadcast channel
proc_log.rs Forwards LogLine records to hero_proc / hero_log
discovery.rs GET /discovery — service metadata endpoint
assets.rs rust-embed wrapper for compiled-in assets

Environment variables

Variable Default Effect
HERO_RUNNER_PY_WORKERS 4 Number of worker processes to pre-fork
HERO_RUNNER_PY_FORWARD_LOGS false Forward each log line to hero_proc (set 1/true/yes/on)
HERO_SOCKET_DIR ~/hero/var/sockets Root directory for all Hero UDS sockets
HERO_PYTHON_RUNTIMES_DIR ~/hero/python_runtimes Where Python venvs are stored
UV_PATH (searched in PATH) Override path to the uv binary

Building and running

# Build everything
cargo build --release

# Run the server (uv must be in PATH or UV_PATH set)
./target/release/hero_runner_py_server

# Optional: run the admin web UI (connects to the server via UDS)
./target/release/hero_runner_py_admin

On first startup the server automatically creates the default runtime (Python 3.12 with a standard set of packages) if it does not already exist, then runs a smoke test to confirm the runtime is operational. The smoke test session is cleaned up automatically on success; on failure it is kept so you can inspect its logs.


Quick start

Check health

echo '{"jsonrpc":"2.0","id":1,"method":"health","params":{}}' \
  | nc -U ~/hero/var/sockets/hero_runner_py/rpc.sock
{"jsonrpc":"2.0","id":1,"result":{"status":"ok","service":"hero_runner_py","version":"0.1.0"}}

Create a Python runtime

echo '{"jsonrpc":"2.0","id":1,"method":"runtime_init","params":{"name":"default","python_version":"3.12","modules":["requests","numpy"]}}' \
  | nc -U ~/hero/var/sockets/hero_runner_py/rpc.sock

This installs Python 3.12 (via uv python install) and creates a venv with requests and numpy. The runtime.toml is written immediately with installing: true so callers can see progress; installing is cleared to false once all packages are installed.

Run a script

echo '{"jsonrpc":"2.0","id":2,"method":"session_start","params":{"script":"print(\"hello from Python!\")","runtime_name":"default","timeout_ms":5000}}' \
  | nc -U ~/hero/var/sockets/hero_runner_py/rpc.sock
{"jsonrpc":"2.0","id":2,"result":{"session_id":1}}

If the runtime is still installing, session_start fails immediately with:

Runtime 'default' is still being installed. Please wait for it to finish.

Stream live output

curl --unix-socket ~/hero/var/sockets/hero_runner_py/rpc.sock \
  http://localhost/sse/session/1
data: {"kind":"log","line":{"session_id":1,"timestamp_ms":1715000000000,"seq":1,"kind":"stdout","line":"hello from Python!"}}

data: {"kind":"result","result":{"success":true,"exit_code":0,"stdout":"hello from Python!","stderr":"","error":null}}

Get the final result

echo '{"jsonrpc":"2.0","id":3,"method":"session_result","params":{"session_id":1}}' \
  | nc -U ~/hero/var/sockets/hero_runner_py/rpc.sock

JSON-RPC API

All requests follow JSON-RPC 2.0 over the UDS at $HERO_SOCKET_DIR/hero_runner_py/rpc.sock.

health

Liveness probe.

{"jsonrpc":"2.0","id":1,"method":"health","params":{}}

Response: {"status":"ok","service":"hero_runner_py","version":"0.1.0"}


session_start

Start a Python script execution. Returns a session_id immediately; execution runs asynchronously.

Param Type Required Default Description
script string yes Python source code or file path
script_kind "code" | "file" no "code" Inline code or path to .py file
runtime_name string no "default" Which runtime venv to use
working_dir string no "." Working directory for the Python process
env_vars object no {} Extra environment variables
timeout_ms integer no 0 (no limit) Hard kill timeout in milliseconds
forward_logs boolean no server default Override per-session log forwarding
{
  "jsonrpc":"2.0","id":1,"method":"session_start",
  "params":{
    "script": "import time\nfor i in range(5):\n    print(i)\n    time.sleep(0.1)",
    "runtime_name": "default",
    "timeout_ms": 10000
  }
}

Response: {"session_id": 42}


session_stop

Request cancellation of a running session. The Python process group is killed immediately.

{"jsonrpc":"2.0","id":2,"method":"session_stop","params":{"session_id":42}}

Response: {"ok": true}


session_delete

Stop a session if still running, then remove it from the session list entirely.

{"jsonrpc":"2.0","id":2,"method":"session_delete","params":{"session_id":42}}

Response: {"ok": true}


session_list

List all known sessions (running and finished).

{"jsonrpc":"2.0","id":3,"method":"session_list","params":{}}

Response: {"sessions": [{"id":1,"status":"succeeded",...}, ...]}


session_get

Get metadata for a single session.

{"jsonrpc":"2.0","id":4,"method":"session_get","params":{"session_id":42}}

Response: SessionInfo object with id, status, runtime_name, started_at_ms, ended_at_ms.


session_logs

Page through the captured log history for a session. Supports cursor-based pagination.

Param Type Required Description
session_id integer yes Session to query
after_ts_ms integer no Return only lines after this timestamp
after_seq integer no Return only lines after this sequence number
limit integer no Maximum lines to return (default: 5000)
{
  "jsonrpc":"2.0","id":5,"method":"session_logs",
  "params":{"session_id":42,"limit":50}
}

Response:

{
  "lines": [
    {"session_id":42,"timestamp_ms":1715000001000,"seq":1,"kind":"stdout","line":"0"},
    {"session_id":42,"timestamp_ms":1715000001100,"seq":2,"kind":"stdout","line":"1"}
  ],
  "next_ts_ms": 1715000001100,
  "next_seq": 2
}

session_result

Get the final ScriptResult for a completed session. Returns null if the session is still running.

{"jsonrpc":"2.0","id":6,"method":"session_result","params":{"session_id":42}}

Response:

{
  "success": true,
  "exit_code": 0,
  "stdout": "0\n1\n2\n3\n4",
  "stderr": "",
  "error": null
}

runtime_init

Create a new Python runtime (venv + packages). Installs the requested Python version if not already present. Writes runtime.toml immediately with installing: true; clears it once all steps complete.

Param Type Required Description
name string yes Unique runtime name
python_version string yes Python version, e.g. "3.12" or "3.12.3"
modules string[] no pip packages to install immediately
{
  "jsonrpc":"2.0","id":7,"method":"runtime_init",
  "params":{"name":"ml","python_version":"3.11","modules":["torch","numpy"]}
}

Response: RuntimeInfo object.


runtime_list

List all registered runtimes.

{"jsonrpc":"2.0","id":8,"method":"runtime_list","params":{}}

Response: {"runtimes": [{"name":"default","python_version":"3.12","installing":false,"operational":true,...}, ...]}


runtime_get

Get metadata for a single runtime.

{"jsonrpc":"2.0","id":9,"method":"runtime_get","params":{"name":"default"}}

Response: RuntimeInfo with name, python_version, modules, venv_path, created_at, updated_at, installing, operational, smoke_session_id.


runtime_install

Install additional pip packages into an existing runtime. Sets installing: true during the install, clears it on completion.

{
  "jsonrpc":"2.0","id":10,"method":"runtime_install",
  "params":{"name":"default","modules":["pandas","matplotlib"]}
}

Response: {"modules": ["requests","pandas","matplotlib"]} (full updated module list).


runtime_delete

Delete a runtime and its virtual environment directory.

{"jsonrpc":"2.0","id":11,"method":"runtime_delete","params":{"name":"old-runtime"}}

Response: {"ok": true}


runtime_test

Run a smoke test against a runtime (print('hello world') + sys.version). Waits for completion, cleans up the internal session, and marks the runtime operational on success.

{"jsonrpc":"2.0","id":12,"method":"runtime_test","params":{"name":"default"}}

Response:

{
  "success": true,
  "exit_code": 0,
  "stdout": "hello world\npython: 3.12.3 ...",
  "stderr": ""
}

SSE streaming

Subscribe to live output from a session:

GET /sse/session/{session_id}

The response is a standard text/event-stream. Each event carries a JSON-encoded WorkerFrame:

data: {"kind":"log","line":{"session_id":1,"timestamp_ms":...,"seq":3,"kind":"stdout","line":"hello"}}

data: {"kind":"result","result":{"success":true,"exit_code":0,...}}

The stream closes after the Result frame is delivered. If the session is already complete when you subscribe, all stored log lines are replayed first, then the result is sent, and the stream closes.


Runtime management

Runtimes are isolated Python virtual environments managed by uv. Each runtime:

  1. Has a unique name used in session_start as runtime_name
  2. Uses a pinned Python version installed by uv python install
  3. Has its own pip-installed packages that do not affect other runtimes
  4. Is stored on disk and survives server restarts (metadata in runtime.toml)

Lifecycle states

runtime_init called
      │
      ▼
installing: true   ← visible immediately via runtime_list
      │  uv downloads Python, creates venv, installs packages
      ▼
installing: false, operational: false
      │
      ▼  smoke test runs automatically (or manually via runtime_test)
      │
   ┌──┴──────────────────┐
   │ passed              │ failed
   ▼                     ▼
operational: true    smoke_session_id set  ← session kept for log inspection

Default runtime

On first startup, the server automatically initialises a runtime named "default" (Python 3.12, standard packages) if it doesn't already exist, then runs the smoke test. Subsequent restarts skip init if the runtime already exists.

The default module set:

ipython, requests, httpx, pydantic, rich, pandas, numpy, python-dotenv, toml

Admin UI

hero_runner_py_admin serves a dashboard at the configured HTTP port (default: proxied through hero_router).

Sessions tab

  • Live table of all sessions with status badges, runtime name, duration
  • Per-row: view detail, view logs, stop (running only)
  • Bulk selection → Stop selected or Delete selected (stops if running, removes from list)
  • Search by ID or status

Runtimes tab

  • Table with Name, Python Version, Modules, Status, Created
  • Status badges: installing… (animated spinner), operational, smoke failed (with link to session logs), pending
  • Right-click context menu on any row:
    • Test it works — opens a modal that runs runtime_test and shows exact script + output
    • Install modules — opens the install modal
    • Delete runtime — confirms then deletes
  • Action buttons in each row (test, install modules, delete)
  • New Runtime button opens a creation modal

Admin → Maintenance tab

  • Stop all running sessions
  • Clear local UI state
  • Performance benchmark — configurable N sessions, concurrency level, and target runtime:
    • Runs print('hello world') N times
    • Live progress bar + elapsed timer
    • Results: total time, avg/session (ms), sessions/sec
    • Stop button to abort mid-run

Timeout behavior

Timeout is enforced at the POSIX level with zero threading:

  1. The grandchild process calls alarm(secs) before starting I/O
  2. A custom SIGALRM handler runs in the grandchild:
    • Calls kill(-pgid, SIGKILL) to kill the entire Python process group
    • Calls kill(pid, SIGKILL) to ensure the direct child is killed
    • Resets SIGALRM to SIG_DFL so the grandchild itself can die on the next alarm
  3. The Python interpreter is placed in its own process group via setpgid(0, 0) in pre_exec, so the kill signal reaches all child processes the script may have spawned

This design is fork-safe on macOS: no threads are spawned inside the grandchild, avoiding the pthread_create / mutex deadlock that can occur when a multithreaded parent is forked.


Logging

Live stream

While a script runs, each stdout/stderr line is emitted as a WorkerFrame::Log frame on the worker → parent pipe, broadcast to all SSE subscribers via a tokio::broadcast channel.

Persistent storage

Each log line is also written to hero_log using the source path:

runner.session.<session_id>

Content format per line:

<kind>:<timestamp_ms>:<seq>|<text>

For example:

stdout:1715000001000:3|processing item 7
stderr:1715000001050:4|warning: low memory

This format allows exact cursor-based pagination in session_logs using after_ts_ms + after_seq.

Process logging

Service-level events (startup, socket bind, runtime init, smoke test results) are emitted to herolib_core::Logger with source "hero_runner_py" and forwarded to hero_log in the usual Hero stack way.


Testing

The hero_runner_py_tests crate contains integration tests split into two groups:

Group Requires Run condition
Pure-Rust (IPC, types, runner basics) Nothing Always
uv-dependent (pool, sessions, runtimes) uv + Python --include-ignored

Running tests

# Pure-Rust tests only
cargo test -p hero_runner_py_tests

# All tests including uv-dependent ones
cargo test -p hero_runner_py_tests -- --include-ignored --test-threads=1

--test-threads=1 is required because tests set HERO_PYTHON_RUNTIMES_DIR via std::env::set_var and fork a WorkerPool per test — concurrent tests would race on the env var.


Hero stack integration

hero_runner_py integrates with the standard Hero stack services:

Service Integration
hero_log Stores session log lines at runner.session.<id>, service events at hero_runner_py
hero_db INCR for persistent, monotonically increasing session IDs that survive restarts
hero_proc Optional live log forwarding per line when HERO_RUNNER_PY_FORWARD_LOGS=true
hero_sockets UDS conventions: $HERO_SOCKET_DIR/hero_runner_py/rpc.sock, mode 0o660
hero_runner_py_admin Dashboard UI — proxies API calls and renders session/runtime/benchmark views