initial setup & implementation #1

Closed
opened 2026-04-26 10:19:35 +00:00 by despiegk · 3 comments
Owner

purpose

  • make openrpc compatible server which is lauer on top of sglang.io to allow people to talk to GPU LLM

the crates

  • hero_gpu_api (layer which implements known standards and maps to openrpc backend)
    • openai endpoint
    • enthropic endpoing
  • hero_gpu_server
    • openrpc server uses /hero_sockets format
    • accepts all LLM requests and forwards to python backend
    • allows configuration of the backend (which models to load, which params, ...)
  • hero_gpu_installer
    • install all required components (python) using uv
    • check status, if not all components then install more
    • check health
    • update feature
    • 100% sandboxed
  • hero_gpu_sdk
    • thin layer using openrpc proc see /hero_proc_openrpc skill
    • convenience features if needed
  • hero_gpu_ui
    • implement ui dashboard see skill /hero_ui_dashboard to demonstrate all functionality

implementation

  • use https://docs.sglang.io/ to talk to GPU
  • make good crate & makefile see skill /makefile_helper
  • make sure we use /hero_proc_service_selfstart to start/stop the services

this story is to implement all basics, so we can further develop and test, it should not be tested nor complete, its to have all basics there which is the API's, the openrpc complete interface, the UI

purpose - make openrpc compatible server which is lauer on top of sglang.io to allow people to talk to GPU LLM the crates - hero_gpu_api (layer which implements known standards and maps to openrpc backend) - openai endpoint - enthropic endpoing - hero_gpu_server - openrpc server uses /hero_sockets format - accepts all LLM requests and forwards to python backend - allows configuration of the backend (which models to load, which params, ...) - hero_gpu_installer - install all required components (python) using uv - check status, if not all components then install more - check health - update feature - 100% sandboxed - hero_gpu_sdk - thin layer using openrpc proc see /hero_proc_openrpc skill - convenience features if needed - hero_gpu_ui - implement ui dashboard see skill /hero_ui_dashboard to demonstrate all functionality implementation - use https://docs.sglang.io/ to talk to GPU - make good crate & makefile see skill /makefile_helper - make sure we use /hero_proc_service_selfstart to start/stop the services this story is to implement all basics, so we can further develop and test, it should not be tested nor complete, its to have all basics there which is the API's, the openrpc complete interface, the UI
Author
Owner

Implementation Spec for Issue #1

Objective

Scaffold the hero_gpu repository as a Hero-conventions-compliant Cargo workspace that exposes an OpenRPC-based gateway in front of an SGLang (https://docs.sglang.io) Python backend, providing OpenAI- and Anthropic-compatible HTTP endpoints, an installer that uses uv to set up the Python runtime, an SDK, an admin UI dashboard, and a CLI binary that self-registers with hero_proc. This is a basics-only scaffold: every file, route, RPC method, UI tab, and Makefile target must exist and compile, but handlers may return placeholder/stub data. No tests or runtime correctness required.

Requirements

  • Multi-crate workspace at /Volumes/T7/code0/hero_gpu with the five crates from the issue, plus the mandatory hero_gpu_examples crate per hero_crates_best_practices_check.
  • hero_gpu_server exposes JSON-RPC 2.0 over ~/hero/var/sockets/hero_gpu/rpc.sock (no TCP) and ships a complete openrpc.json covering: backend lifecycle, model management, chat/completions, embeddings, health/status, and configuration.
  • hero_gpu_api mounts OpenAI-compatible (/v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings) and Anthropic-compatible (/v1/messages, /v1/models) HTTP routes, mapped onto the OpenRPC backend, served on ~/hero/var/sockets/hero_gpu/openapi.sock.
  • hero_gpu_installer provides install, status, health, update operations using uv to provision the Python runtime + SGLang dependencies, in a sandboxed directory under ~/hero/var/hero_gpu/.
  • hero_gpu_sdk is a thin client generated from crates/hero_gpu_server/openrpc.json using the openrpc_client! macro.
  • hero_gpu_ui is an Askama + Bootstrap 5.3.3 admin dashboard following hero_ui_dashboard, served on ~/hero/var/sockets/hero_gpu/ui.sock, with all standard tabs (Models, Backend, Playground, Logs, Stats, Admin, Docs) wired to stub data.
  • hero_gpu CLI binary handles --start/--stop and registers all components (server, UI, installer-managed SGLang process) with hero_proc via hero_proc_sdk, per hero_proc_service_selfstart.
  • Workspace conforms to naming_convention (snake_case throughout), cargo_deps (git URLs with branch = "development", no [patch] committed), rust_toolchain (edition 2024, no rust-version pin), and herolib_import (version 0.5.0 pinned for hero crates).
  • Root-level Makefile + buildenv.sh + scripts/build_lib.sh + scripts/install.sh + scripts/download-assets.sh, all binaries installed to ~/hero/bin/.

Files to Modify/Create

Workspace root

  • Cargo.toml — workspace manifest, members, shared dependencies
  • Makefile — build, install, run, stop, status, logs targets
  • buildenv.sh — INSTALL_DIR, BINARIES, RELEASE_DIR
  • scripts/build_lib.sh, scripts/install.sh, scripts/download-assets.sh, scripts/sglang_runner.sh
  • .gitignore, README.md, .forgejo/workflows/build.yaml

crates/hero_gpu_server/ — server binary + OpenRPC spec
crates/hero_gpu_api/ — library crate for OpenAI + Anthropic adapters
crates/hero_gpu_sdk/ — generated client via openrpc_client!
crates/hero_gpu_installer/ — uv-based sandbox installer (lib + thin CLI)
crates/hero_gpu_ui/ — Askama admin dashboard binary with all required tabs
crates/hero_gpu/ — CLI binary handling --start/--stop + hero_proc registration
crates/hero_gpu_examples/ — examples + integration test skeleton

Implementation Plan

Step 1: Workspace skeleton, root configuration, and build automation

Files: workspace Cargo.toml, Makefile, buildenv.sh, scripts/*.sh, .gitignore, README.md, .forgejo/workflows/build.yaml.

  • Workspace Cargo.toml: resolver = "3", edition = "2024", version = "0.5.0", members for all six crates, full [workspace.dependencies] block (hero_rpc_derive, hero_rpc_openrpc, hero_proc_sdk, tokio, axum, hyper, tower, tower-http, serde, serde_json, anyhow, thiserror, tracing, reqwest, askama, askama_axum, rust-embed, dirs, clap, etc.). Internal path entries for the three library crates. NO [patch] sections.
  • buildenv.sh: BINARIES="hero_gpu hero_gpu_server hero_gpu_ui hero_gpu_installer", INSTALL_DIR=$HOME/hero/bin.
  • Makefile: standard targets (help, version, status, build, builddev, check, fmt, lint, test, download-assets, install, installdev, run, stop, restart, logs, clean, all).
  • scripts/sglang_runner.sh: bash entrypoint launched by hero_proc — exec uv run --project ~/hero/var/hero_gpu/python python -m sglang.launch_server --host 127.0.0.1 --port 30000 "$@".
  • Forgejo workflow: cargo check --workspace, clippy, build.
    Dependencies: none

Step 2: hero_gpu_sdk — generated client crate

Files: crates/hero_gpu_sdk/Cargo.toml, src/lib.rs.

  • Deps on hero_rpc_derive, hero_rpc_openrpc (transport feature).
  • lib.rs: openrpc_client!("../hero_gpu_server/openrpc.json"); + pub fn default_socket_path() resolving $HERO_SOCKET_DIR/hero_gpu/rpc.sock.
    Dependencies: Step 1, Step 3 (needs openrpc.json to exist for macro expansion).

Step 3: hero_gpu_server — JSON-RPC core, OpenRPC spec, sglang client, sockets

Files: Cargo.toml, openrpc.json, src/main.rs, src/lib.rs, src/state.rs, src/socket.rs, src/discovery.rs, src/sglang/{mod,types}.rs, src/rpc/mod.rs, src/rpc/methods/{health,models,chat,completions,embeddings,backend}.rs, src/openapi/mod.rs.

  • Full OpenRPC 1.3.0 spec with methods: rpc.discover, system.health, system.stats, models.list, models.load, models.unload, models.info, chat.completions, chat.completions_stream, text.completions, embeddings.create, backend.start/stop/status/config_get/config_set. All result schemas wrapped in objects (never bare arrays — herolib_openrpc requirement).
  • Bind two UDS: rpc.sock and openapi.sock under ~/hero/var/sockets/hero_gpu/.
  • SGLang HTTP client targeting http://127.0.0.1:30000, with typed request/response shapes mirroring SGLang docs.
  • Stub handlers may return placeholder responses matching the schemas. SIGTERM-aware shutdown.
    Dependencies: Step 1

Step 4: hero_gpu_api — OpenAI + Anthropic compatible HTTP routers

Files: crates/hero_gpu_api/Cargo.toml, src/lib.rs, src/openai/{mod,types,handlers}.rs, src/anthropic/{mod,types,handlers}.rs, src/mapping.rs.

  • Library only. Defines a Backend trait that hero_gpu_server::AppState implements → keeps dependency arrow one-directional (no cycle).
  • OpenAI routes: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models, /v1/models/:id.
  • Anthropic routes: /v1/messages, /v1/models.
  • Streaming endpoints return SSE skeleton (single "not yet implemented" event).
    Dependencies: Step 3 (Backend trait must exist before the server can implement it)

Step 5: hero_gpu_installer — uv-based sandbox installer

Files: Cargo.toml, src/lib.rs, src/sandbox.rs, src/uv.rs, src/components.rs, src/health.rs, src/bin/hero_gpu_installer.rs.

  • Sandbox rooted at ~/hero/var/hero_gpu/python/ (env override HERO_GPU_SANDBOX_ROOT).
  • Uv wrapper shelling out via tokio::process::Command: uv venv, uv pip install "sglang[all]", uv run.
  • Components: Uv, Python, Sglang, Cuda — each with is_installed() and install().
  • Health check: uv present, venv exists, sglang importable, port reachable.
  • CLI subcommands: install, status, health, update.
    Dependencies: Step 1

Step 6: hero_gpu_ui — admin dashboard binary

Files: Cargo.toml, src/{main,state,assets,routes}.rs, src/handlers/{index,fragments}.rs, templates/{base,index}.html, templates/partials/sidebar.html, templates/partials/tabs/{models,backend,playground,logs,stats,admin,docs}.html, static/css/dashboard.css, static/js/dashboard.js, static/favicon.svg, docs/{overview,cli,sdk,api}.md.

  • Binds ~/hero/var/sockets/hero_gpu/ui.sock.
  • Server-side aggregation via hero_gpu_sdk client; /fragments/* Unpoly partial endpoints.
  • All standard Hero dashboard chrome (status dot, theme toggle, sidebar widgets, tabs).
  • BASE_PATH resolved from X-Forwarded-Prefix for hero_router embedding.
    Dependencies: Step 2 (SDK), Step 3 (openrpc.json)

Step 7: hero_gpu CLI — service selfstart + hero_proc registration

Files: crates/hero_gpu/Cargo.toml, src/main.rs, src/service.rs.

  • clap with --start / --stop flags only (per hero_proc_service_selfstart).
  • build_service_definition() registers three actions:
    • hero_gpu_server — exec sibling binary, kill_other cleans rpc.sock + openapi.sock, openrpc_socket health check.
    • hero_gpu_ui — exec sibling binary, kill_other cleans ui.sock, openrpc_socket health check.
    • hero_gpu_sglang — bash interpreter running ~/hero/var/hero_gpu/scripts/sglang_runner.sh, env vars (HERO_GPU_MODEL, HERO_GPU_PORT=30000), kill_other.port = vec![30000], http health check at http://127.0.0.1:30000/health.
  • --start calls hero_proc_factory().restart_service(...), --stop calls stop_service(...).
    Dependencies: Step 1, Step 2, Step 3

Step 8: hero_gpu_examples — examples + integration test skeleton

Files: Cargo.toml, examples/{health,list_models,openai_chat}.rs, tests/integration.rs.

  • Examples connect via SDK (default_socket_path()) and exercise the basic methods.
  • openai_chat.rs posts to the OpenAI-compat route via hero_router URL.
  • tests/integration.rs per hero_crates_best_practices_check Check 13 — setup/teardown with ~/hero/bin/hero_gpu --start/--stop, three #[ignore] tests (test_server_health, test_rpc_discover, test_basic_api_operations).
    Dependencies: Step 2, Step 3, Step 7

Acceptance Criteria

  • cargo check --workspace succeeds.
  • cargo build --release --workspace produces four binaries: hero_gpu, hero_gpu_server, hero_gpu_ui, hero_gpu_installer.
  • cargo clippy --workspace -- -D warnings passes (allow dead_code / unused_variables on stub handlers if necessary).
  • make install copies all four binaries to ~/hero/bin/.
  • ~/hero/bin/hero_gpu --help lists --start and --stop.
  • crates/hero_gpu_server/openrpc.json is valid OpenRPC 1.3.0 (jq parses) and lists every implemented method.
  • hero_gpu_sdk::HeroGpuClient compiles (proves macro expansion).
  • hero_gpu_ui serves HTML on ~/hero/var/sockets/hero_gpu/ui.sock with all standard tabs in the rendered DOM.
  • hero_gpu_server serves POST /rpc, GET /openrpc.json, GET /health, GET /.well-known/heroservice.json on rpc.sock; OpenAI/Anthropic routes on openapi.sock.
  • hero_gpu_installer status/health exit cleanly when components missing (report "not installed", no panic).
  • No [patch] sections committed; no TCP listeners in any Rust binary (only the Python SGLang process binds TCP).
  • crates/hero_gpu_examples/tests/integration.rs exists with the three required #[ignore] test stubs.

Notes

  • Issue scope discipline. The issue explicitly states "should not be tested nor complete, its to have all basics there." Stubs may return placeholder data or RpcError::not_implemented(...). The OpenRPC spec, route table, UI tab structure, Makefile targets, and CLI lifecycle wiring are the load-bearing artifacts.
  • hero_gpu_api as a separate crate. Honored per the issue. A Backend trait inside hero_gpu_api, implemented by hero_gpu_server::AppState, keeps the dependency arrow one-directional and avoids cycles.
  • SGLang process model. SGLang is Python; launched via uv run as a third hero_proc action managed by the hero_gpu CLI. Rust talks to SGLang over 127.0.0.1:30000 (SGLang doesn't speak UDS). This is the documented exception — only the external Python process binds TCP.
  • OpenRPC array result wrapping. Every list result MUST be wrapped in an object ({"models": [...]}) — bare arrays break openrpc_client! macro expansion.
  • Naming. Service directory under ~/hero/var/sockets/ is hero_gpu/ (the service name). All binaries share the directory with sockets rpc.sock, ui.sock, openapi.sock.
  • Self-start pattern. Only the hero_gpu CLI handles --start/--stop. Server, UI, and installer binaries are plain foreground processes (the installer CLI's install/status/health/update subcommands are unrelated to lifecycle).
## Implementation Spec for Issue #1 ### Objective Scaffold the `hero_gpu` repository as a Hero-conventions-compliant Cargo workspace that exposes an OpenRPC-based gateway in front of an SGLang (https://docs.sglang.io) Python backend, providing OpenAI- and Anthropic-compatible HTTP endpoints, an installer that uses `uv` to set up the Python runtime, an SDK, an admin UI dashboard, and a CLI binary that self-registers with `hero_proc`. This is a basics-only scaffold: every file, route, RPC method, UI tab, and Makefile target must exist and compile, but handlers may return placeholder/stub data. No tests or runtime correctness required. ### Requirements - Multi-crate workspace at `/Volumes/T7/code0/hero_gpu` with the five crates from the issue, plus the mandatory `hero_gpu_examples` crate per `hero_crates_best_practices_check`. - `hero_gpu_server` exposes JSON-RPC 2.0 over `~/hero/var/sockets/hero_gpu/rpc.sock` (no TCP) and ships a complete `openrpc.json` covering: backend lifecycle, model management, chat/completions, embeddings, health/status, and configuration. - `hero_gpu_api` mounts OpenAI-compatible (`/v1/chat/completions`, `/v1/completions`, `/v1/models`, `/v1/embeddings`) and Anthropic-compatible (`/v1/messages`, `/v1/models`) HTTP routes, mapped onto the OpenRPC backend, served on `~/hero/var/sockets/hero_gpu/openapi.sock`. - `hero_gpu_installer` provides `install`, `status`, `health`, `update` operations using `uv` to provision the Python runtime + SGLang dependencies, in a sandboxed directory under `~/hero/var/hero_gpu/`. - `hero_gpu_sdk` is a thin client generated from `crates/hero_gpu_server/openrpc.json` using the `openrpc_client!` macro. - `hero_gpu_ui` is an Askama + Bootstrap 5.3.3 admin dashboard following `hero_ui_dashboard`, served on `~/hero/var/sockets/hero_gpu/ui.sock`, with all standard tabs (Models, Backend, Playground, Logs, Stats, Admin, Docs) wired to stub data. - `hero_gpu` CLI binary handles `--start`/`--stop` and registers all components (server, UI, installer-managed SGLang process) with `hero_proc` via `hero_proc_sdk`, per `hero_proc_service_selfstart`. - Workspace conforms to `naming_convention` (snake_case throughout), `cargo_deps` (git URLs with `branch = "development"`, no `[patch]` committed), `rust_toolchain` (edition 2024, no `rust-version` pin), and `herolib_import` (version `0.5.0` pinned for hero crates). - Root-level `Makefile` + `buildenv.sh` + `scripts/build_lib.sh` + `scripts/install.sh` + `scripts/download-assets.sh`, all binaries installed to `~/hero/bin/`. ### Files to Modify/Create **Workspace root** - `Cargo.toml` — workspace manifest, members, shared dependencies - `Makefile` — build, install, run, stop, status, logs targets - `buildenv.sh` — INSTALL_DIR, BINARIES, RELEASE_DIR - `scripts/build_lib.sh`, `scripts/install.sh`, `scripts/download-assets.sh`, `scripts/sglang_runner.sh` - `.gitignore`, `README.md`, `.forgejo/workflows/build.yaml` **`crates/hero_gpu_server/`** — server binary + OpenRPC spec **`crates/hero_gpu_api/`** — library crate for OpenAI + Anthropic adapters **`crates/hero_gpu_sdk/`** — generated client via `openrpc_client!` **`crates/hero_gpu_installer/`** — uv-based sandbox installer (lib + thin CLI) **`crates/hero_gpu_ui/`** — Askama admin dashboard binary with all required tabs **`crates/hero_gpu/`** — CLI binary handling `--start`/`--stop` + hero_proc registration **`crates/hero_gpu_examples/`** — examples + integration test skeleton ### Implementation Plan #### Step 1: Workspace skeleton, root configuration, and build automation Files: workspace `Cargo.toml`, `Makefile`, `buildenv.sh`, `scripts/*.sh`, `.gitignore`, `README.md`, `.forgejo/workflows/build.yaml`. - Workspace `Cargo.toml`: `resolver = "3"`, `edition = "2024"`, `version = "0.5.0"`, members for all six crates, full `[workspace.dependencies]` block (hero_rpc_derive, hero_rpc_openrpc, hero_proc_sdk, tokio, axum, hyper, tower, tower-http, serde, serde_json, anyhow, thiserror, tracing, reqwest, askama, askama_axum, rust-embed, dirs, clap, etc.). Internal path entries for the three library crates. NO `[patch]` sections. - `buildenv.sh`: `BINARIES="hero_gpu hero_gpu_server hero_gpu_ui hero_gpu_installer"`, `INSTALL_DIR=$HOME/hero/bin`. - `Makefile`: standard targets (`help`, `version`, `status`, `build`, `builddev`, `check`, `fmt`, `lint`, `test`, `download-assets`, `install`, `installdev`, `run`, `stop`, `restart`, `logs`, `clean`, `all`). - `scripts/sglang_runner.sh`: bash entrypoint launched by hero_proc — `exec uv run --project ~/hero/var/hero_gpu/python python -m sglang.launch_server --host 127.0.0.1 --port 30000 "$@"`. - Forgejo workflow: `cargo check --workspace`, clippy, build. Dependencies: none #### Step 2: `hero_gpu_sdk` — generated client crate Files: `crates/hero_gpu_sdk/Cargo.toml`, `src/lib.rs`. - Deps on `hero_rpc_derive`, `hero_rpc_openrpc` (transport feature). - `lib.rs`: `openrpc_client!("../hero_gpu_server/openrpc.json");` + `pub fn default_socket_path()` resolving `$HERO_SOCKET_DIR/hero_gpu/rpc.sock`. Dependencies: Step 1, Step 3 (needs openrpc.json to exist for macro expansion). #### Step 3: `hero_gpu_server` — JSON-RPC core, OpenRPC spec, sglang client, sockets Files: `Cargo.toml`, `openrpc.json`, `src/main.rs`, `src/lib.rs`, `src/state.rs`, `src/socket.rs`, `src/discovery.rs`, `src/sglang/{mod,types}.rs`, `src/rpc/mod.rs`, `src/rpc/methods/{health,models,chat,completions,embeddings,backend}.rs`, `src/openapi/mod.rs`. - Full OpenRPC 1.3.0 spec with methods: `rpc.discover`, `system.health`, `system.stats`, `models.list`, `models.load`, `models.unload`, `models.info`, `chat.completions`, `chat.completions_stream`, `text.completions`, `embeddings.create`, `backend.start/stop/status/config_get/config_set`. All result schemas wrapped in objects (never bare arrays — herolib_openrpc requirement). - Bind two UDS: `rpc.sock` and `openapi.sock` under `~/hero/var/sockets/hero_gpu/`. - SGLang HTTP client targeting `http://127.0.0.1:30000`, with typed request/response shapes mirroring SGLang docs. - Stub handlers may return placeholder responses matching the schemas. SIGTERM-aware shutdown. Dependencies: Step 1 #### Step 4: `hero_gpu_api` — OpenAI + Anthropic compatible HTTP routers Files: `crates/hero_gpu_api/Cargo.toml`, `src/lib.rs`, `src/openai/{mod,types,handlers}.rs`, `src/anthropic/{mod,types,handlers}.rs`, `src/mapping.rs`. - Library only. Defines a `Backend` trait that `hero_gpu_server::AppState` implements → keeps dependency arrow one-directional (no cycle). - OpenAI routes: `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, `/v1/models`, `/v1/models/:id`. - Anthropic routes: `/v1/messages`, `/v1/models`. - Streaming endpoints return SSE skeleton (single "not yet implemented" event). Dependencies: Step 3 (Backend trait must exist before the server can implement it) #### Step 5: `hero_gpu_installer` — uv-based sandbox installer Files: `Cargo.toml`, `src/lib.rs`, `src/sandbox.rs`, `src/uv.rs`, `src/components.rs`, `src/health.rs`, `src/bin/hero_gpu_installer.rs`. - `Sandbox` rooted at `~/hero/var/hero_gpu/python/` (env override `HERO_GPU_SANDBOX_ROOT`). - `Uv` wrapper shelling out via `tokio::process::Command`: `uv venv`, `uv pip install "sglang[all]"`, `uv run`. - Components: Uv, Python, Sglang, Cuda — each with `is_installed()` and `install()`. - Health check: uv present, venv exists, sglang importable, port reachable. - CLI subcommands: `install`, `status`, `health`, `update`. Dependencies: Step 1 #### Step 6: `hero_gpu_ui` — admin dashboard binary Files: `Cargo.toml`, `src/{main,state,assets,routes}.rs`, `src/handlers/{index,fragments}.rs`, `templates/{base,index}.html`, `templates/partials/sidebar.html`, `templates/partials/tabs/{models,backend,playground,logs,stats,admin,docs}.html`, `static/css/dashboard.css`, `static/js/dashboard.js`, `static/favicon.svg`, `docs/{overview,cli,sdk,api}.md`. - Binds `~/hero/var/sockets/hero_gpu/ui.sock`. - Server-side aggregation via `hero_gpu_sdk` client; `/fragments/*` Unpoly partial endpoints. - All standard Hero dashboard chrome (status dot, theme toggle, sidebar widgets, tabs). - `BASE_PATH` resolved from `X-Forwarded-Prefix` for hero_router embedding. Dependencies: Step 2 (SDK), Step 3 (openrpc.json) #### Step 7: `hero_gpu` CLI — service selfstart + hero_proc registration Files: `crates/hero_gpu/Cargo.toml`, `src/main.rs`, `src/service.rs`. - clap with `--start` / `--stop` flags only (per hero_proc_service_selfstart). - `build_service_definition()` registers three actions: - `hero_gpu_server` — exec sibling binary, `kill_other` cleans rpc.sock + openapi.sock, openrpc_socket health check. - `hero_gpu_ui` — exec sibling binary, `kill_other` cleans ui.sock, openrpc_socket health check. - `hero_gpu_sglang` — bash interpreter running `~/hero/var/hero_gpu/scripts/sglang_runner.sh`, env vars (HERO_GPU_MODEL, HERO_GPU_PORT=30000), `kill_other.port = vec![30000]`, http health check at `http://127.0.0.1:30000/health`. - `--start` calls `hero_proc_factory().restart_service(...)`, `--stop` calls `stop_service(...)`. Dependencies: Step 1, Step 2, Step 3 #### Step 8: `hero_gpu_examples` — examples + integration test skeleton Files: `Cargo.toml`, `examples/{health,list_models,openai_chat}.rs`, `tests/integration.rs`. - Examples connect via SDK (`default_socket_path()`) and exercise the basic methods. - `openai_chat.rs` posts to the OpenAI-compat route via hero_router URL. - `tests/integration.rs` per `hero_crates_best_practices_check` Check 13 — `setup`/`teardown` with `~/hero/bin/hero_gpu --start/--stop`, three `#[ignore]` tests (`test_server_health`, `test_rpc_discover`, `test_basic_api_operations`). Dependencies: Step 2, Step 3, Step 7 ### Acceptance Criteria - [ ] `cargo check --workspace` succeeds. - [ ] `cargo build --release --workspace` produces four binaries: `hero_gpu`, `hero_gpu_server`, `hero_gpu_ui`, `hero_gpu_installer`. - [ ] `cargo clippy --workspace -- -D warnings` passes (allow `dead_code` / `unused_variables` on stub handlers if necessary). - [ ] `make install` copies all four binaries to `~/hero/bin/`. - [ ] `~/hero/bin/hero_gpu --help` lists `--start` and `--stop`. - [ ] `crates/hero_gpu_server/openrpc.json` is valid OpenRPC 1.3.0 (`jq` parses) and lists every implemented method. - [ ] `hero_gpu_sdk::HeroGpuClient` compiles (proves macro expansion). - [ ] `hero_gpu_ui` serves HTML on `~/hero/var/sockets/hero_gpu/ui.sock` with all standard tabs in the rendered DOM. - [ ] `hero_gpu_server` serves `POST /rpc`, `GET /openrpc.json`, `GET /health`, `GET /.well-known/heroservice.json` on rpc.sock; OpenAI/Anthropic routes on openapi.sock. - [ ] `hero_gpu_installer status`/`health` exit cleanly when components missing (report "not installed", no panic). - [ ] No `[patch]` sections committed; no TCP listeners in any Rust binary (only the Python SGLang process binds TCP). - [ ] `crates/hero_gpu_examples/tests/integration.rs` exists with the three required `#[ignore]` test stubs. ### Notes - **Issue scope discipline.** The issue explicitly states "should not be tested nor complete, its to have all basics there." Stubs may return placeholder data or `RpcError::not_implemented(...)`. The OpenRPC spec, route table, UI tab structure, Makefile targets, and CLI lifecycle wiring are the load-bearing artifacts. - **`hero_gpu_api` as a separate crate.** Honored per the issue. A `Backend` trait inside `hero_gpu_api`, implemented by `hero_gpu_server::AppState`, keeps the dependency arrow one-directional and avoids cycles. - **SGLang process model.** SGLang is Python; launched via `uv run` as a third hero_proc action managed by the `hero_gpu` CLI. Rust talks to SGLang over `127.0.0.1:30000` (SGLang doesn't speak UDS). This is the documented exception — only the external Python process binds TCP. - **OpenRPC array result wrapping.** Every list result MUST be wrapped in an object (`{"models": [...]}`) — bare arrays break `openrpc_client!` macro expansion. - **Naming.** Service directory under `~/hero/var/sockets/` is `hero_gpu/` (the service name). All binaries share the directory with sockets `rpc.sock`, `ui.sock`, `openapi.sock`. - **Self-start pattern.** Only the `hero_gpu` CLI handles `--start`/`--stop`. Server, UI, and installer binaries are plain foreground processes (the installer CLI's `install`/`status`/`health`/`update` subcommands are unrelated to lifecycle).
Author
Owner

Build & Check Results

Per the issue scope ("basics only — should not be tested nor complete"), no functional tests were run. Verification was limited to type-checking and release build of the full workspace.

cargo check --workspace

  • Result: pass
  • Time: 29.8s (cold sccache-warmed cache; 0.07s on subsequent re-check)

cargo build --release --workspace

  • Result: pass
  • Time: 3m 22s (wall clock)
  • Binaries produced:
    • target/release/hero_gpu (4.4 MB)
    • target/release/hero_gpu_server (4.8 MB)
    • target/release/hero_gpu_ui (4.0 MB)
    • target/release/hero_gpu_installer (1.5 MB)

Crates checked

  • hero_gpu_api (library)
  • hero_gpu_sdk (library, generated from openrpc.json)
  • hero_gpu_installer (lib + bin)
  • hero_gpu_server (lib + bin)
  • hero_gpu_ui (bin)
  • hero_gpu (bin — CLI)
  • hero_gpu_examples (examples + ignored integration tests)

Notes

  • Clean build: zero warnings, zero errors across the entire workspace under both cargo check and cargo build --release.
  • All four expected release binaries were produced and are executable.
  • No functional / integration tests were executed, in line with the issue scope. The hero_gpu_examples integration tests remain #[ignore]-gated and were not run.
## Build & Check Results Per the issue scope ("basics only — should not be tested nor complete"), no functional tests were run. Verification was limited to type-checking and release build of the full workspace. ### `cargo check --workspace` - Result: pass - Time: 29.8s (cold sccache-warmed cache; 0.07s on subsequent re-check) ### `cargo build --release --workspace` - Result: pass - Time: 3m 22s (wall clock) - Binaries produced: - `target/release/hero_gpu` (4.4 MB) - `target/release/hero_gpu_server` (4.8 MB) - `target/release/hero_gpu_ui` (4.0 MB) - `target/release/hero_gpu_installer` (1.5 MB) ### Crates checked - `hero_gpu_api` (library) - `hero_gpu_sdk` (library, generated from openrpc.json) - `hero_gpu_installer` (lib + bin) - `hero_gpu_server` (lib + bin) - `hero_gpu_ui` (bin) - `hero_gpu` (bin — CLI) - `hero_gpu_examples` (examples + ignored integration tests) ### Notes - Clean build: zero warnings, zero errors across the entire workspace under both `cargo check` and `cargo build --release`. - All four expected release binaries were produced and are executable. - No functional / integration tests were executed, in line with the issue scope. The `hero_gpu_examples` integration tests remain `#[ignore]`-gated and were not run.
Author
Owner

Implementation Summary

Scaffolded the hero_gpu workspace per the spec — every artifact required for further development is in place, all crates type-check, and all four release binaries build cleanly. Stub data is returned where SGLang would normally answer; the surface area (RPC methods, HTTP routes, UI tabs, lifecycle hooks) is complete.

Crates created (7 workspace members)

Crate Kind Purpose
hero_gpu bin CLI handling --start/--stop, registers all actions with hero_proc
hero_gpu_server lib + bin JSON-RPC 2.0 server, OpenRPC spec, SGLang HTTP client
hero_gpu_api lib OpenAI + Anthropic compatible HTTP routers via Backend trait
hero_gpu_sdk lib Generated client (HeroGPUClient) via openrpc_client! macro
hero_gpu_installer lib + bin uv-based sandboxed Python/SGLang installer (install/status/health/update)
hero_gpu_ui bin Askama + Bootstrap admin dashboard (Models/Backend/Playground/Logs/Stats/Admin/Docs tabs)
hero_gpu_examples bin Examples + #[ignore] integration test skeleton

OpenRPC interface (16 methods)

All method results are wrapped objects (no bare arrays — required for openrpc_client! macro expansion).

  • rpc.discover, system.health, system.stats
  • models.list, models.info, models.load, models.unload
  • chat.completions, chat.completions_stream, text.completions, embeddings.create
  • backend.start, backend.stop, backend.status, backend.config_get, backend.config_set

HTTP API surface (openapi.sock)

OpenAI-compatible:

  • POST /v1/chat/completions, POST /v1/completions, POST /v1/embeddings
  • GET /v1/models, GET /v1/models/{id}

Anthropic-compatible:

  • POST /v1/messages, POST /anthropic/v1/messages
  • GET /anthropic/v1/models

Discovery:

  • GET /health, GET /.well-known/heroservice.json

Sockets (under ~/hero/var/sockets/hero_gpu/)

  • rpc.sock — JSON-RPC 2.0 (consumed by hero_gpu_sdk)
  • openapi.sock — OpenAI/Anthropic HTTP
  • ui.sock — admin dashboard

No Rust binary opens a TCP listener. The Python SGLang process is the sole exception (binds 127.0.0.1:30000).

Service lifecycle (hero_gpu --start/--stop)

Three hero_proc actions registered by the CLI:

  1. hero_gpu_server — exec sibling binary, kill_other cleans rpc.sock + openapi.sock, openrpc_socket health check.
  2. hero_gpu_ui — exec sibling binary, kill_other cleans ui.sock, openrpc_socket health check.
  3. hero_gpu_sglang — bash interpreter running ~/hero/var/hero_gpu/scripts/sglang_runner.sh, env HERO_GPU_MODEL + HERO_GPU_PORT=30000, kill_other.port = vec![30000], http health check on :30000/health.

Per the hero_proc_service_selfstart pattern: only the hero_gpu CLI handles --start/--stop. The server, UI, and installer binaries are plain foreground processes (the installer CLI's own subcommands are unrelated to lifecycle).

Build automation

  • Makefilehelp, version, status, build, builddev, check, fmt, lint, test, download-assets, install, installdev, run, stop, restart, logs, logs-ui, clean, all
  • buildenv.shBINARIES, INSTALL_DIR=$HOME/hero/bin
  • scripts/install.sh, scripts/build_lib.sh, scripts/download-assets.sh (Bootstrap 5.3.3, Bootstrap Icons 1.13.1, Unpoly 3.12.1), scripts/sglang_runner.sh
  • .forgejo/workflows/build.yaml — workspace check + clippy + release build

Verification

Command Result Time
cargo check --workspace pass 29.8s
cargo build --release --workspace pass 3m 22s

Four release binaries produced (hero_gpu 4.4 MB, hero_gpu_server 4.8 MB, hero_gpu_ui 4.0 MB, hero_gpu_installer 1.5 MB).

Notes / out-of-scope

  • No functional tests run — issue scope states "should not be tested nor complete, its to have all basics there". Integration tests in crates/hero_gpu_examples/tests/integration.rs are present but #[ignore]d.
  • Stub data on the happy path — when SGLang isn't reachable, RPC handlers and HTTP endpoints return placeholder responses that match the schema. Production wiring for streaming, real model load/unload, and live config-restart cycle is left for follow-ups.
  • Backend/installer integrationbackend.start / backend.stop RPCs currently return guidance to use hero_proc directly; tying them into the hero_proc_sdk factory from inside the server is a follow-up.
  • OpenAPI streaming — OpenAI/Anthropic stream: true paths compile but emit a single "streaming not yet implemented" SSE event. Real token streaming will plug in once the SGLang-side wiring is done.

Ready for further development.

## Implementation Summary Scaffolded the `hero_gpu` workspace per the spec — every artifact required for further development is in place, all crates type-check, and all four release binaries build cleanly. Stub data is returned where SGLang would normally answer; the surface area (RPC methods, HTTP routes, UI tabs, lifecycle hooks) is complete. ### Crates created (7 workspace members) | Crate | Kind | Purpose | |---|---|---| | `hero_gpu` | bin | CLI handling `--start`/`--stop`, registers all actions with hero_proc | | `hero_gpu_server` | lib + bin | JSON-RPC 2.0 server, OpenRPC spec, SGLang HTTP client | | `hero_gpu_api` | lib | OpenAI + Anthropic compatible HTTP routers via `Backend` trait | | `hero_gpu_sdk` | lib | Generated client (`HeroGPUClient`) via `openrpc_client!` macro | | `hero_gpu_installer` | lib + bin | uv-based sandboxed Python/SGLang installer (`install`/`status`/`health`/`update`) | | `hero_gpu_ui` | bin | Askama + Bootstrap admin dashboard (Models/Backend/Playground/Logs/Stats/Admin/Docs tabs) | | `hero_gpu_examples` | bin | Examples + `#[ignore]` integration test skeleton | ### OpenRPC interface (16 methods) All method results are wrapped objects (no bare arrays — required for `openrpc_client!` macro expansion). - `rpc.discover`, `system.health`, `system.stats` - `models.list`, `models.info`, `models.load`, `models.unload` - `chat.completions`, `chat.completions_stream`, `text.completions`, `embeddings.create` - `backend.start`, `backend.stop`, `backend.status`, `backend.config_get`, `backend.config_set` ### HTTP API surface (`openapi.sock`) **OpenAI-compatible:** - `POST /v1/chat/completions`, `POST /v1/completions`, `POST /v1/embeddings` - `GET /v1/models`, `GET /v1/models/{id}` **Anthropic-compatible:** - `POST /v1/messages`, `POST /anthropic/v1/messages` - `GET /anthropic/v1/models` **Discovery:** - `GET /health`, `GET /.well-known/heroservice.json` ### Sockets (under `~/hero/var/sockets/hero_gpu/`) - `rpc.sock` — JSON-RPC 2.0 (consumed by `hero_gpu_sdk`) - `openapi.sock` — OpenAI/Anthropic HTTP - `ui.sock` — admin dashboard No Rust binary opens a TCP listener. The Python SGLang process is the sole exception (binds `127.0.0.1:30000`). ### Service lifecycle (`hero_gpu --start`/`--stop`) Three hero_proc actions registered by the CLI: 1. `hero_gpu_server` — exec sibling binary, `kill_other` cleans `rpc.sock` + `openapi.sock`, openrpc_socket health check. 2. `hero_gpu_ui` — exec sibling binary, `kill_other` cleans `ui.sock`, openrpc_socket health check. 3. `hero_gpu_sglang` — bash interpreter running `~/hero/var/hero_gpu/scripts/sglang_runner.sh`, env `HERO_GPU_MODEL` + `HERO_GPU_PORT=30000`, `kill_other.port = vec![30000]`, http health check on `:30000/health`. Per the `hero_proc_service_selfstart` pattern: only the `hero_gpu` CLI handles `--start`/`--stop`. The server, UI, and installer binaries are plain foreground processes (the installer CLI's own subcommands are unrelated to lifecycle). ### Build automation - `Makefile` — `help`, `version`, `status`, `build`, `builddev`, `check`, `fmt`, `lint`, `test`, `download-assets`, `install`, `installdev`, `run`, `stop`, `restart`, `logs`, `logs-ui`, `clean`, `all` - `buildenv.sh` — `BINARIES`, `INSTALL_DIR=$HOME/hero/bin` - `scripts/install.sh`, `scripts/build_lib.sh`, `scripts/download-assets.sh` (Bootstrap 5.3.3, Bootstrap Icons 1.13.1, Unpoly 3.12.1), `scripts/sglang_runner.sh` - `.forgejo/workflows/build.yaml` — workspace check + clippy + release build ### Verification | Command | Result | Time | |---|---|---| | `cargo check --workspace` | pass | 29.8s | | `cargo build --release --workspace` | pass | 3m 22s | Four release binaries produced (`hero_gpu` 4.4 MB, `hero_gpu_server` 4.8 MB, `hero_gpu_ui` 4.0 MB, `hero_gpu_installer` 1.5 MB). ### Notes / out-of-scope - **No functional tests run** — issue scope states "should not be tested nor complete, its to have all basics there". Integration tests in `crates/hero_gpu_examples/tests/integration.rs` are present but `#[ignore]`d. - **Stub data on the happy path** — when SGLang isn't reachable, RPC handlers and HTTP endpoints return placeholder responses that match the schema. Production wiring for streaming, real model load/unload, and live config-restart cycle is left for follow-ups. - **Backend/installer integration** — `backend.start` / `backend.stop` RPCs currently return guidance to use `hero_proc` directly; tying them into the `hero_proc_sdk` factory from inside the server is a follow-up. - **OpenAPI streaming** — OpenAI/Anthropic `stream: true` paths compile but emit a single "streaming not yet implemented" SSE event. Real token streaming will plug in once the SGLang-side wiring is done. Ready for further development.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_gpu#1
No description provided.