lhumina_code/hero_skills

Fork 0

[infra] build perf quick wins — default -j auto, wire sccache, conditional nice/ionice #188

New issue

Closed

opened 2026-05-01 19:40:12 +00:00 by mik-tf · 1 comment

mik-tf commented

2026-05-01 19:40:12 +00:00

Owner

Summary

Three small changes in tools/modules/services/lib.nu that significantly speed up service_install_all for everyone. None of them are structural — just defaults that haven't kept up with the box's actual capacity. Together they cut typical build cycles by 2-3× before any work on the bigger CI-artifacts story (hero_demo#54).

Verified live during the 2026-05-01 herodemo deploy: setting HERO_CARGO_JOBS=0 mid-deploy went from load avg 1.14 (1 active rustc) to 15.22 (10+ active rustc), with iowait staying at 0-4%. The box is CPU-bound, and we were leaving ~75% of CPU on the floor.

A. Default `HERO_CARGO_JOBS` to 0 (auto = nproc)

File: tools/modules/services/lib.nu (svc_install helper, around line ~620 where HERO_CARGO_JOBS is read).

Current:

let jobs = ($env | get -o HERO_CARGO_JOBS | default "4" | into int)

Proposed:

# Default to 0 = let cargo auto-detect (nproc). Override via HERO_CARGO_JOBS=N
# to clamp on smaller machines or to leave headroom for other workloads.
let jobs = ($env | get -o HERO_CARGO_JOBS | default "0" | into int)

Rationale: the existing 4 was set when hero_proc had a chatty SQLite log that fought cargo for I/O bandwidth. With the SQLite log backend replaced by hero_log (a2eff7c), that contention is gone. Modern Hero deploy targets (herodemo: 16 CPU; CI runners: similar) underutilise at -j 4. Defaulting to 0 (cargo auto = nproc) lets every box use all its cores. Operators on smaller VMs can still cap with HERO_CARGO_JOBS=4 explicitly.

Win: 2-4× build speedup on 16-core boxes. Free.

B. Wire `sccache` into `cargo build` env

Goal: shared compile cache across all 22+ services. Today every repo recompiles its own copy of axum, tokio, serde, hyper, rustls, etc. — 50+ shared deps × 22 services = enormous redundant work.

Mechanism: set RUSTC_WRAPPER=sccache in the env passed to ^nice in svc_install. sccache transparently caches each rustc invocation by input hash; identical compiles short-circuit to a copy.

Prerequisites (mostly already done in this org):

sccache binary installed on deploy targets — already present per sccache.nu skill in hero_skills
A cache backend — local disk default (~/.cache/sccache) is fine for single-host; can move to redis/s3 later
~/.config/sccache/config with reasonable size cap (e.g. 50G — much smaller than the 87G cargo target dir we've seen)

File: tools/modules/services/lib.nu, in svc_install where the build is invoked. Add RUSTC_WRAPPER=sccache to the env record passed to ^nice. Also condition on which sccache so the build still works on hosts without sccache.

# Add sccache wrapper if available — shared compile cache across services
let sccache_env = if (which sccache | is-not-empty) {
    {RUSTC_WRAPPER: "sccache"}
} else {
    {}
}

Win on first deploy after population: ~60-80% reduction on shared deps. The first build that populates sccache pays the same cost as today; every subsequent build benefits. Win compounds across services within a single service_install_all run since each service shares ~80% of its dep tree with neighbours.

Caveats:

sccache and RUSTC_WRAPPER interact with build.rs / proc-macros in subtle ways for some crates. Test on a few services before defaulting on.
Cache disk usage grows. Hero already has ~/.cache/sccache size capping per sccache.nu; verify the cap is reasonable.

D. Make `nice` / `ionice` flags conditional

Today: every cargo build is wrapped in nice -n 19 ionice -c 3 cargo build ... unconditionally. This is correct for "deploying onto a live production box without disrupting users" but pointless for "fresh deploy where nothing else is running on the box."

Verified live tonight: with no other heavy workloads, ionice -c 3 is essentially a no-op (no I/O contention to yield to), and nice 19 slightly slows the build (yields to even minor processes). Neither hurts, but they're cosmetic in the fresh-deploy case.

Proposed: a --low-priority flag on service_install_all (and propagated to svc_install) that wraps with nice/ionice. Default OFF for service_install_all (fresh deploy assumption); ON for service_complete --update if the operator passes it explicitly.

File: tools/modules/services/lib.nu and tools/modules/services/packages.nu.

# in svc_install signature:
def svc_install [..., --low-priority] {
    ...
    let nice_prefix = if $low_priority {
        ["nice", "-n", $nice_n, "ionice", "-c", $ionice_c]
    } else {
        []
    }
    # use nice_prefix as the prefix to ^cargo, empty list = direct invocation
}

Win: small. Mostly a clarity-of-intent change — explicit about when we're being polite to running services vs when we're going as fast as possible.

Combined ROI

For a fresh service_install_all on a 16-core box with cargo cache cold:

Today:                    ~30-60 min (j=4, no sccache, nice 19, ionice 3)
After A:                   ~10-20 min (j=auto, no sccache)
After A+B (1st run):       ~10-20 min (sccache populating, no win yet)
After A+B (2nd+ run):       ~3-8  min (sccache hits 60-80%)
After A+B+D:                ~3-8  min (same as A+B; D is for live deploys)

For typical "redeploy single service" cycles, after sccache is warm: ~30 sec to 2 min per service vs today's 30-60 sec to 5-10 min. Order of magnitude better dev-iteration cycle.

Out of scope

Relaxing lto = true + codegen-units = 1 in release profile — covered by a separate proposed --debug install path issue (the next one I'm filing). Those flags are correct for production binaries; relaxing them is for dev-iteration only.
CI-built artifacts — covered by hero_demo#54. That's the structural fix; this issue is intermediate wins on the way there.

Cross-refs

hero_demo#54 — full CI-artifacts story (structural)
hero_demo#55 — post-deploy verify scripts
tools/modules/sccache.nu skill — already exists in hero_skills, just not wired into builds yet

Validation

Live demonstration of A from tonight's herodemo deploy:

Before (HERO_CARGO_JOBS=4):
  load avg: 1.14, 2.05, 2.77
  active rustc workers: 1
  user CPU: 6%

After (HERO_CARGO_JOBS=0):
  load avg: 15.22, 8.20, 5.02
  active rustc workers: 10+
  user CPU: 74-79%
  iowait: 0-4% (confirms not I/O bound)

Same change as a default would benefit every Hero deploy.

## Summary Three small changes in `tools/modules/services/lib.nu` that significantly speed up `service_install_all` for everyone. None of them are structural — just defaults that haven't kept up with the box's actual capacity. Together they cut typical build cycles by **2-3×** before any work on the bigger CI-artifacts story ([hero_demo#54](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54)). Verified live during the 2026-05-01 herodemo deploy: setting `HERO_CARGO_JOBS=0` mid-deploy went from load avg 1.14 (1 active rustc) to 15.22 (10+ active rustc), with iowait staying at 0-4%. The box is CPU-bound, and we were leaving ~75% of CPU on the floor. ## A. Default `HERO_CARGO_JOBS` to 0 (auto = nproc) **File**: `tools/modules/services/lib.nu` (svc_install helper, around line ~620 where `HERO_CARGO_JOBS` is read). **Current**: ```nu let jobs = ($env | get -o HERO_CARGO_JOBS | default "4" | into int) ``` **Proposed**: ```nu # Default to 0 = let cargo auto-detect (nproc). Override via HERO_CARGO_JOBS=N # to clamp on smaller machines or to leave headroom for other workloads. let jobs = ($env | get -o HERO_CARGO_JOBS | default "0" | into int) ``` **Rationale**: the existing `4` was set when hero_proc had a chatty SQLite log that fought cargo for I/O bandwidth. With the SQLite log backend replaced by `hero_log` (a2eff7c), that contention is gone. Modern Hero deploy targets (herodemo: 16 CPU; CI runners: similar) underutilise at `-j 4`. Defaulting to 0 (cargo auto = nproc) lets every box use all its cores. Operators on smaller VMs can still cap with `HERO_CARGO_JOBS=4` explicitly. **Win**: 2-4× build speedup on 16-core boxes. Free. ## B. Wire `sccache` into `cargo build` env **Goal**: shared compile cache across all 22+ services. Today every repo recompiles its own copy of axum, tokio, serde, hyper, rustls, etc. — 50+ shared deps × 22 services = enormous redundant work. **Mechanism**: set `RUSTC_WRAPPER=sccache` in the env passed to `^nice` in `svc_install`. sccache transparently caches each rustc invocation by input hash; identical compiles short-circuit to a copy. **Prerequisites** (mostly already done in this org): - `sccache` binary installed on deploy targets — already present per `sccache.nu` skill in hero_skills - A cache backend — local disk default (~/.cache/sccache) is fine for single-host; can move to redis/s3 later - `~/.config/sccache/config` with reasonable size cap (e.g. 50G — much smaller than the 87G cargo target dir we've seen) **File**: `tools/modules/services/lib.nu`, in `svc_install` where the build is invoked. Add `RUSTC_WRAPPER=sccache` to the env record passed to `^nice`. Also condition on `which sccache` so the build still works on hosts without sccache. ```nu # Add sccache wrapper if available — shared compile cache across services let sccache_env = if (which sccache | is-not-empty) { {RUSTC_WRAPPER: "sccache"} } else { {} } ``` **Win on first deploy after population**: ~60-80% reduction on shared deps. The first build that populates sccache pays the same cost as today; every subsequent build benefits. Win compounds across services within a single `service_install_all` run since each service shares ~80% of its dep tree with neighbours. **Caveats**: - `sccache` and `RUSTC_WRAPPER` interact with build.rs / proc-macros in subtle ways for some crates. Test on a few services before defaulting on. - Cache disk usage grows. Hero already has `~/.cache/sccache` size capping per `sccache.nu`; verify the cap is reasonable. ## D. Make `nice` / `ionice` flags conditional **Today**: every cargo build is wrapped in `nice -n 19 ionice -c 3 cargo build ...` unconditionally. This is correct for "deploying onto a live production box without disrupting users" but pointless for "fresh deploy where nothing else is running on the box." Verified live tonight: with no other heavy workloads, `ionice -c 3` is essentially a no-op (no I/O contention to yield to), and `nice 19` slightly slows the build (yields to even minor processes). Neither hurts, but they're cosmetic in the fresh-deploy case. **Proposed**: a `--low-priority` flag on `service_install_all` (and propagated to `svc_install`) that wraps with nice/ionice. Default OFF for `service_install_all` (fresh deploy assumption); ON for `service_complete --update` if the operator passes it explicitly. **File**: `tools/modules/services/lib.nu` and `tools/modules/services/packages.nu`. ```nu # in svc_install signature: def svc_install [..., --low-priority] { ... let nice_prefix = if $low_priority { ["nice", "-n", $nice_n, "ionice", "-c", $ionice_c] } else { [] } # use nice_prefix as the prefix to ^cargo, empty list = direct invocation } ``` **Win**: small. Mostly a clarity-of-intent change — explicit about when we're being polite to running services vs when we're going as fast as possible. ## Combined ROI For a fresh `service_install_all` on a 16-core box with cargo cache cold: ``` Today: ~30-60 min (j=4, no sccache, nice 19, ionice 3) After A: ~10-20 min (j=auto, no sccache) After A+B (1st run): ~10-20 min (sccache populating, no win yet) After A+B (2nd+ run): ~3-8 min (sccache hits 60-80%) After A+B+D: ~3-8 min (same as A+B; D is for live deploys) ``` For typical "redeploy single service" cycles, after sccache is warm: **~30 sec to 2 min per service** vs today's 30-60 sec to 5-10 min. Order of magnitude better dev-iteration cycle. ## Out of scope - **Relaxing `lto = true` + `codegen-units = 1`** in release profile — covered by a separate proposed `--debug` install path issue (the next one I'm filing). Those flags are correct for production binaries; relaxing them is for dev-iteration only. - **CI-built artifacts** — covered by [hero_demo#54](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54). That's the structural fix; this issue is intermediate wins on the way there. ## Cross-refs - [hero_demo#54](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54) — full CI-artifacts story (structural) - [hero_demo#55](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/55) — post-deploy verify scripts - `tools/modules/sccache.nu` skill — already exists in hero_skills, just not wired into builds yet ## Validation Live demonstration of A from tonight's herodemo deploy: ``` Before (HERO_CARGO_JOBS=4): load avg: 1.14, 2.05, 2.77 active rustc workers: 1 user CPU: 6% After (HERO_CARGO_JOBS=0): load avg: 15.22, 8.20, 5.02 active rustc workers: 10+ user CPU: 74-79% iowait: 0-4% (confirms not I/O bound) ``` Same change as a default would benefit every Hero deploy.

mik-tf referenced this issue from a commit

2026-05-01 19:48:34 +00:00

fix(services/lib): defaults favour throughput; opt-in politeness; add sccache (#188)

mik-tf referenced this issue

2026-05-01 19:51:32 +00:00

fix(services/lib): defaults favour throughput; opt-in politeness; add sccache (#188) #190

mik-tf referenced this issue from a commit

2026-05-01 19:53:08 +00:00

Merge pull request 'fix(services/lib): defaults favour throughput; opt-in politeness; add sccache (#188)' (#190) from development_mik_build_perf_quickwins into development

mik-tf commented

2026-05-01 19:53:40 +00:00

Author

Owner

Merged. Closing — defaults are now HERO_CARGO_JOBS=0 (auto = nproc), HERO_CARGO_NICE=0, HERO_CARGO_IONICE_C="", plus HERO_CARGO_SCCACHE=auto. Operators on live boxes opt back into politeness via HERO_CARGO_NICE=19 HERO_CARGO_IONICE_C=3.

Next deploy after the current herodemo run will be the first to use the new defaults — should reproduce the 13× load-avg gain we observed live during this session.

Merged. Closing — defaults are now `HERO_CARGO_JOBS=0` (auto = nproc), `HERO_CARGO_NICE=0`, `HERO_CARGO_IONICE_C=""`, plus `HERO_CARGO_SCCACHE=auto`. Operators on live boxes opt back into politeness via `HERO_CARGO_NICE=19 HERO_CARGO_IONICE_C=3`. Next deploy after the current herodemo run will be the first to use the new defaults — should reproduce the 13× load-avg gain we observed live during this session.

mik-tf closed this issue

2026-05-01 19:53:40 +00:00

mik-tf referenced this issue from a commit

2026-05-01 20:37:20 +00:00

fix(services/lib): drop stale-binary heuristic that defeats cargo's no-op behaviour

mik-tf referenced this issue from a commit

2026-05-01 20:39:39 +00:00

fix(services/lib): drop stale-binary heuristic that defeats cargo's no-op behaviour