[hero_team_box] Hero Team Box — the team's shared dev environment (story + v1 spec) #232
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hero Team Box — the team's shared dev environment
Executive Summary
A single shared Hero box where every developer gets a Linux user slot with a fully-provisioned
~/hero, develops on the box itself via mosh + tmux, and binds each service to a specific(repo, branch)they care about. Pushes trigger automatic rebuilds in the matching slot(s). Every successful build publishes a content-hashed archive as a Forgejo release on the source repo, so any other slot — or any external machine — can run that exact binary without compiling. Team membership lives in a new GitOps repo (lhumina_code/hero_team); per-user secrets stay in each dev's existing private secrets repo. Claude credentials follow a two-layer model: personalclaude loginper user (optional) plus an org pool of N Claude Max keys assigned to users declaratively viahero_team.The point: today five people each fight their own laptop. Tomorrow five people share one canonical Hero, see each other's running services through per-user URLs, and the integration slot continuously builds
developmentso the team always knows whether the trunk works.This is v1. Single host (DO droplet, validated end-to-end). Multi-machine farms are explicitly v2.
Why this matters — the "5 > 0" framing
Today: every dev fights their own machine. Setup drift, version skew, "works on my laptop." Five Heros, zero shared ground.
Tomorrow: one box, five user slots, one Hero. SSH in, you have a working environment. See someone else's running service in your browser. Merge to
development, see it picked up by the integration slot. Lose the box, spin a new one in 30 minutes from a forge-tracked manifest. The shared box converts the team from N parallel single-player setups into one cooperative environment.The build automation is plumbing. The shared environment is the prize.
What this supersedes
This story is the v1 elaboration of home#121 — HeroOS Development Environment Setup, Mahmoud's v0 runbook for the current shared box (manual user creation,
init sync,service_complete --update,a 2, manual secrets setup). #121 stays open as the operational runbook until v1 lands; the bugs Sameh filed at hero_skills#106–112 are explicitly part of v1's acceptance.The v1 upgrade:
useradd→ GitOps vialhumina_code/hero_teamservice_complete(broken per hero_skills#106) →hero_dev start(subsumes + fixes)(service, repo, branch)hero_dev peekand the fleet view at/adminThe complete vision — UX by role
Joining the team (new hire flow)
Mik opens a PR against
lhumina_code/hero_teamaddingusers/sara.toml:After merge, the
hero_team_syncdaemon on the box reconciles: Sara's Linux account is created, SSH keys installed, groups set,~/herois provisioned from thedevelopertemplate (pinning core services todevelopment from: release), her private secrets repolhumina_code-private/secrets_sarais created on forge via API.Sara installs mosh on her laptop. Runs
mosh sara@team.hero.box. First-login script prompts for her Forge token, runssecrets_syncto clone her secrets repo, openssecrets.tomlfor her to fill in API keys, runssecrets push. Then either:ORG_CLAUDE_KEY_2—a 2is ready to use."claude loginso she can use her personal account.claude loginAND has an org key; she can toggle later viahero_dev claude use.Then
hero_dev statusshows her stack running, with reachable URLs:Total elapsed: ~10 minutes. She's productive.
Daily flow (existing dev)
Alice mosh-ins, lands in her tmux session (preserved across mosh disconnects). Runs
hero_dev status:She works on hero_slides in
~/hero/code/hero_slides/worktrees/development_alice/(worktree created earlier byhero_dev switch). Edits, commits. Runshero_dev rebuild hero_slides— hero_builder runs, binary swaps, hero_proc restarts. Refreshhttps://team.hero.box/alice/slides; change is live.When happy, pushes
development_aliceto forge, opens a PR. Once merged todevelopment, the integration slot rebuilds. Everyone withhero_slides @ development from: releaseautomatically gets the new binary on next reconcile (orhero_dev pull).Testing someone else's branch
Pulls Mik's already-published archive from forge (no compile), swaps Alice's hero_router binary, restarts. To revert:
hero_dev switch hero_router development --from release.If Alice wants to compile it instead:
--from build. A worktree is created at~/hero/code/hero_router/worktrees/development_mik/, hero_builder compiles, hero_proc restarts.Peeking at the team
Shows Thabet's manifest + state.
https://team.hero.box/thabet/slidesshows what he's running. No screen-shares, no "what version" Slack threads.Running an agent
a 2spawns ahero_claude_rustsession in Alice's slot at effort level 2 (alias forhero_dev agent --effort 2). It uses whichever Claude credential the resolution order picks — see §Specifications/AI API key model. Plan mode, Ralph loop, Forgejo-issue auto-clone-and-comment per existing hero_claude_rust surface, scoped to her user.Integrator's view (Mik)
https://team.hero.box/admin— augmentedhero_codescalers_admin. Grid of users × services. Integration row at the top: green =developmentis shippable. Each cell shows branch, hash, build status, run status, last-rebuild-at. Plus a per-user "claude credential" column showingorg-key-N,personal, ornot configured. Click integration row → latest forge release tags + archive download links. One screen, full-stack truth.Founder / visitor
Same dashboard. Sees who's working on what, whether the merged stack works, what's been released, who's using which Claude key.
When the box dies
Spin a new DO droplet, run
bootstrap_droplet_source.sh(s84-validated per home#230), point atlhumina_code/hero_team. Reconciler recreates every user account, SSH keys included; each user's~/heroprovisions from template; secrets repos pull fresh on first login. Org Claude keys are restored from the driver'shero_aibrokersecret store (separate backup discipline — see Open Questions). Team back online in ~30 minutes. The only thing not preserved is unpushed work in worktrees — exactly what nobody should be holding hostage on a single machine.Architecture
Data flow — push to forge, integration slot rebuilds
Data flow — push to hero_team
Data flow — Claude session spawn
Data flow — local rebuild in a slot
The binary / release layer
This is the structural piece that makes "5 > 0" durable.
Every
hero_builderrun does two things:~/hero/bin/for their running services.--publish: packages a content-hashed archive and publishes it as a Forgejo release on the source repo. Tag format:<branch>-<short-hash>(e.g.development_mik-a1b2c3d). Archive:<binary>-<platform>.tar.gz(e.g.hero_router-linux-musl-x86_64.tar.gz).This single mechanism delivers:
--from release).developmentproduces a release.cross: per hero_builder SPEC §6.5/6.6. One binary on any glibc/musl Linux of the matching arch.linux-musl-x86_64,linux-musl-arm64,macos-arm64, etc.The build IS the release. No separate CI/CD.
Specifications
Dev manifest —
~/hero/cfg/dev_manifest.tomlSource of truth for what a single slot is running. Owned by the user (the
developertemplate seeds it onuser.create; never overwritten by reconciler).Worktree convention: each binding implies
~/hero/code/<repo-basename>/worktrees/<branch>/. The reconciler ensures the worktree exists;hero_devoperates inside it.lhumina_code/hero_teamrepo schemausers/<name>.toml:manifest_templates/<role>.tomlis a[[bind]]list, same schema asdev_manifest.tomlminus[state.*]blocks.hero_devCLI surfaceImplementation notes:
hero_codescalers.build.submitfor cross-slot operations;hero_procdirectly for local restart.--from releaseresolves to latest release matching<branch>-*(lexical sort by hash);--from hash <h>pins exactly.hero_dev forkrequiresFORGEJO_TOKEN(read from secrets repo).hero_codescalers, never direct.ais a shell alias forwarding tohero_dev agent.hero_codescalers.build.submitRPCImplementation: dispatches a
nu_execjob into the target user's hero_proc with tagbuild-<repo>-<branch>. Job runshero_dev syncthenhero_builder --release [--publish]in the worktree, thenhero_proc service restart <name>if requested.hero_team_syncdaemon contractURL routing convention
hero_routerrouteshttps://<host>/<user>/<service>/...to~<user>/hero/var/sockets/hero_<service>/web.sock(orui.sockper service convention). Already supported via thehero_web_prefixskill; this story makes it the default for every user-provisioned service.The
<service>prefix in the URL is the short name (e.g.slides, nothero_slides).Worktree layout convention
hero_dev switch <service> <branch>ensures the worktree exists, then operates inside it. Branch-switching never destroys uncommitted work because each branch gets its own directory.forge worktree(per home#121) is the underlying mechanism;hero_devis the higher-level UX on top.hero_builder --publishsemanticsAfter a successful
--releasebuild:<branch>-<short-hash>(first 8 chars).<binary>-<platform>.tar.gz(or.zipfor windows).Auth:
FORGEJO_TOKENfrom environment (secrets repo).hero_claude_rust integration (
acommand)The v0
a 2becomeshero_dev agent --effort 2(withaaliased). Internally:Each user has their own hero_claude_rust instance per their manifest. Sessions don't cross slots.
AI API key model
Two coexisting auth flows on the team box.
Layer 1 — Personal (optional, per user). Each dev runs
claude loginif they have their own Claude Code-capable account (Claude Max or equivalent). State persists in~<user>/.config/claude/. Org doesn't manage it. Goes with them when they leave.Layer 2 — Org pool (org-owned, N keys). The org provisions N Claude Max accounts (e.g. 4 to start). The keys live in
hero_aibroker's hero_proc secret store as named slots:ORG_CLAUDE_KEY_1,ORG_CLAUDE_KEY_2, ... Driver account owns them. The org can audit, rotate, or revoke without touching anyone's personal accounts.Assignment is declarative in
hero_teamvia the optional[claude_org_key]block inusers/<name>.toml. A PR is how org keys get assigned or reassigned. Auditable, GitOps, no shell required.User-level toggle stored in their
dev_manifest.tomlat[claude] preference:Default resolution order when
a/hero_dev agentspawns a session:personalororg.hero_teamif present (autoresolves here when assignment exists).~/.config/claude/ifclaude loginwas done.claude loginor ask Mik to assign you an org key."Org-key access via
hero_aibroker— new RPC:Authorization: broker reads the freshest reconciled
hero_team'susers/<user>.tomland returns the key only if[claude_org_key].enabled = trueAND the requestedslotmatcheskey_secret_name(or is inkey_secret_names). hero_claude_rust calls this on session spawn; the key is never persisted in the user's slot. Security property: an unprivileged user on the box cannot read another user's org key, because the broker enforces assignment and the slot's hero_claude_rust runs as that user's UID.Multi-key-per-slot (deferred to v1.x). The schema reserves
key_secret_names = [...]as a list. v1.x will teach hero_claude_rust to round-robin keys across parent + subagent processes to dodge per-account rate limits. v1 implementation honours only the first element if a list is given;key_secret_name(singular) is sugar for a one-element list and is the canonical form for v1.Other AI calls (non-Claude-Code) keep going through
hero_aibrokerdirectly —hero_agent,hero_embedder, anything readingAIBROKER_API_ENDPOINT. Org-shared OpenAI / OpenRouter / Anthropic-direct keys live in the same broker secret store. Per-usersecrets.tomlmay carry personal keys for non-Claude-Code providers if a dev wants them for their own experiments — never org-shared keys.Repo-by-repo work breakdown
Each item should become a child issue once this story is approved.
NEW repo:
lhumina_code/hero_teamusers/<founder>.tomlfor each existing dev (port from current/etc/passwdon the v0 box).manifest_templates/developer.tomlandmanifest_templates/integration.toml.users/integration.toml.lhumina_code/hero_codescalershero_team_sync(or sibling repo per open question 1). Daemon per the contract above.build.submitper the contract above. Wires into existingnu_execdispatch.user.createenhancements: provision~/hero/cfg/dev_manifest.tomlfrom template; create per-user secrets repo via Forgejo API.service team_synclifecycle tohero_skills.lhumina_code/hero_aibrokeraibroker.claude_key.getper the contract above.hero_team(cached in memory, refreshed onhero_team_syncevents) for authorization.ORG_CLAUDE_KEY_*values in hero_proc secret store; admin RPCs toset/rotate/revoke.lhumina_code/hero_code(workspace home ofhero_builderandhero_dev)hero_devper the CLI surface above.hero_builder: implement--publishflag per spec (uses Forgejo release API).hero_builder: implement--from release/--from hashmodes (download release asset, install to~/hero/bin/, no-compile path).lhumina_code/hero_claude_rusthero_devcan inject the resolved Claude credential.claude loginflow in the README as Layer-1 fallback.lhumina_code/hero_router/<user>/<service>/...prefix routing is wired and discovered automatically as users are provisioned. May need a small enhancement to read socket paths from hero_codescalers' user list rather than static config./hooks/forgefor inbound forge webhooks (HMAC-validated) → forwards tohero_codescalers.webhook.handle.lhumina_code/hero_skillsservice_completeto delegate tohero_dev startor be removed (closes hero_skills#106).init syncsimilarly (closes hero_skills#107 if related; assess against #108–112).nutools/modules/services/service_*.numodules to delegate tohero_builderper D-08 (currently still uses plaincargo build).lhumina_code/hero_demobootstrap_droplet_source.shto (a) clonehero_team, (b) starthero_team_sync, (c) provision the integration slot first, (d) verify webhooks reach the box, (e) prompt operator to seedORG_CLAUDE_KEY_*into hero_aibroker.docs/ops/DEPLOYMENT.mdwith the new flow.Total: ~20 dev-days. Two-and-a-bit calendar weeks if focused.
Migration from #121 — command mapping
init synchero_dev syncservice_completehero_dev startservice_complete --updatehero_dev sync && hero_dev startsecrets_synchero_dev secrets syncsecrets_edithero_dev secrets editsecrets pushhero_dev secrets pushclaude logina 2hero_dev agent --effort 2aaliased; resolves credential per AI key modellhumina_code/hero_teamforge worktreehero_dev switchcalls under the hood)All v0 commands keep working as aliases for at least one quarter post-v1 to avoid breaking muscle memory.
Acceptance criteria
A fresh DO droplet, bootstrapped via
bootstrap_droplet_source.sh, with a freshly-seededlhumina_code/hero_teamcontaining 2-3 user files andORG_CLAUDE_KEY_1seeded intohero_aibroker, must satisfy:mosh alice@<box>works afterhero_team_syncreconciles;~alice/heroexists perdevelopertemplate.hero_dev statusshows alice's manifest with the template defaults; services running and reachable at/<alice>/...; claude column shows the resolved credential.hero_dev switch hero_router development_mik --from releaseswaps to a previously-published artifact without compiling.git commitin alice's worktree +hero_dev rebuild hero_slidesrebuilds in <30s.developmentof any tracked repo triggers integration slot rebuild AND publishes a Forgejo release archive./adminshows the correct grid for all slots; integration row green; claude column populated.users/sara.tomltohero_team(with[claude_org_key].key_secret_name = "ORG_CLAUDE_KEY_2") → after merge, sara's slot exists, mosh-able, with starter manifest,lhumina_code-private/secrets_saraexists on forge, anda 2works for her without localclaude login.claude logingets a clear error froma 2("runclaude loginor ask Mik to assign you an org key").hero_dev claude use personalthena 2uses Layer-1 (verifiable viahero_dev claude status).~/heroarchived to~/hero/var/team_sync/archive/.from: releaseartifacts immediately, org keys restored from operator-provided seed; total elapsed ≤45 minutes.Validation plan
After the work breakdown lands on
developmentacross the relevant repos:bootstrap_droplet_source.sh.ORG_CLAUDE_KEY_1andORG_CLAUDE_KEY_2into hero_aibroker.Out of scope for v1 (explicit non-goals)
iroh-docsP2P state replication. Stay onsled(single-host KV) until cross-machine becomes a hard requirement. The hero_codescalers README's iroh-docs claim is aspirational; we do not deliver it here.key_secret_names = [...]); implementation v1.x.hero_claude_rustextensions. Hooks, in-process MCP servers, bidirectional control protocol, permission callbacks — separate story. v1 keeps the existing one-shot session model plus credential injection.claudeCLI today.development.Open questions / decisions to make during implementation
These are deliberately not locked in this story; PR authors decide as they go.
hero_team_synchome: dedicated crate insidelhumina_code/hero_codescalers/crates/, or sibling repolhumina_code/hero_team_sync? Recommendation: co-located with hero_codescalers (tightly coupled to its RPCs, shares theheroUnix group prerequisite).linux-musl-x86_64only for v1, or alsolinux-musl-arm64from day 1? Recommendation: x86_64 only — matches D-07; arm64 added when first arm64 dev/CI machine appears.hero_teamrepo, or per-source-repo? Recommendation: per-source-repo HMAC for build webhooks; shared secret for thehero_teamreconciler webhook.auto_rebuild = true) vs always-on. Recommendation: opt-in — predictable; avoids accidental builds during interactive rebases.ORG_CLAUDE_KEY_*seeded intohero_aibrokeron box rebuild? Manual operator step (paste the keys back in), or encrypted-at-rest backup? Recommendation: manual operator step for v1 — keys are highest-sensitivity org assets, no automated backup; documented in the runbook.[claude_org_key].key_secret_namebe free-form, or must match a value the org has explicitly created inhero_aibroker? Recommendation: must match —hero_team_syncvalidates on PR merge and refuses unknown slot names.Standing rules in force
development(excepthero_demodoc-only). Per-PR squash-merge gate.development_<user>(no topic suffix). Thebranch_suffixinusers/<name>.tomlmatches.mik-tf.hero_builderis the canonical build tool. v1 leans on this hard.cargo fmt --check && cargo clippy --workspace --all-targets -- -D warnings && cargo build --workspace --releasebefore merging non-trivial PRs.Linked issues
Predecessor:
Bug list explicitly in v1 acceptance:
Related:
Decisions:
Signed-off-by: mik-tf
Execution plan — 4 devs, one working week
Assuming this gets green-lit, ships in ~5 working days plus 1 buffer day. Each dev owns a vertical (one or two related crates, end-to-end), not a horizontal slice.
Four ownership lanes
Lane 1 — Foundation + integration + validation.
lhumina_code/hero_team(new repo seed),hero_team_syncdaemon,bootstrap_droplet_source.shextension, end-to-end integration debugging, DO droplet validation. Front-loads the work everyone else depends on; tail-loads into the integration role.Lane 2 — Build pipeline.
hero_builder --publishand--from release|hash,hero_codescalers.build.submitRPC,hero_router /hooks/forge+ manifest-aware webhook fan-out. The "publish → pull → run" path needs to compose seamlessly; one owner across the whole pipeline.Lane 3 —
hero_devCLI. Thehero_devcrate end-to-end: every verb (status / switch / bind / unbind / rebuild / pull / peek / fork / logs / agent / claude use|status / doctor), every error message, the first-login script, v0-command aliases. The surface every dev touches every day — one owner, one voice.Lane 4 — AI keys + UI polish.
hero_aibroker.claude_key.get(plus admin set/rotate/revoke),hero_claude_rustcredential injection, fleet view inhero_codescalers_admin,hero_skillsmigration that closes hero_skills#106–#112. Lighter raw load, broader surface.The week
hero_devMVPhero_devverbshero_devpolishDay 1 exit gate: PR adding
users/test_user.tomlproduces a Linux account with~/heroprovisioned. Day 4 exit gates per lane: Lane 2 —hero_dev switch X Y --from releaseswaps a binary without compiling. Lane 3 — every verb works on a single-user slot. Lane 4 — a user with[claude_org_key]runsa 2without localclaude login. Day 5: fresh DO droplet, acceptance criteria walked top to bottom. Day 6: buffer for fix-forward — home#230/s84 precedent says first DO from-nothing run typically surfaces 2–3 unrelated bootstrap bugs.Two forced coordination touchpoints
Day 0 kickoff (~2h). Lock the seven open questions in the issue body. Walk the manifest schema,
users/<name>.toml, and the three RPC contracts line by line. Catches "wait, what doesbuild.submitreturn?" before it costs a day.End of Day 3 sync (~1h). Each lane reports against its exit gate. Specifically verify cross-lane shapes:
aibroker.claude_key.getmatcheshero_dev's call site;build.submitresponse shape matches webhook fan-out consumer.Beyond those two, the spec above is the contract — devs read the issue, not each other.
Watchpoints
claudeCLI auth surface is opaque from outside; Lane 4's credential injection should be the first thing that lane verifies, not the last — unknown unknowns can cost half a day there.hero_dev fork(Lane 3) needs Forgejo write access to createdevelopment_<user>branches programmatically; token-scope rules should be confirmed before designing the verb.forge_release,hero_web_prefix,hero_proc_secrets_and_meta) are mature — cite them, don't reinvent.Bottom line
End of day 5 ships, day 6 buffers, week 2 opens with this issue closed. Biggest leverage on the timeline is whether the Day 0 kickoff is sharp — if everyone leaves it with the same picture in their head, the rest is execution.
Signed-off-by: mik-tf
needs to be redone