feat/shim-and-cgroups #13

Closed
thabeta wants to merge 3 commits from feat/shim-and-cgroups into development
Owner
No description provided.
Port from mos_init to my_init codebase.

Introduce zinit-shim, a long-lived per-service process monitor that
holds stdout/stderr pipes across zinit-server restarts. When the server
restarts, it reconnects to running shims via Unix sockets and resumes
log streaming.

Add cgroup v2 process tracking for ALL service types. The server creates
a per-service cgroup before spawning, assigns the child to it, and uses
cgroup.kill for reliable stop escalation. Solves double-fork escape and
provides atomic kill.

New crate: zinit_shim (crates/zinit_shim/)
New module: cgroup (crates/zinit_server/src/cgroup.rs)
New module: shim protocol (crates/zinit_sdk/src/shim.rs)

Config additions:
- service.shell (bool, default true): skip /bin/sh -c wrapper
- [resources] section: memory_max, cpu_weight (cgroup v2 limits)
- Port ~40 dependency graph edge-case tests from mos_init covering after,
  wants, conflicts, requires, mixed deps, cycles, removal, deep/wide
  graphs, diamond patterns, state transitions, and missing soft deps
- Add mkroot.sh and run.sh scripts for rootfs building and Docker/virtiofs
  launching
- Update Dockerfile: Alpine 3.23, zinit_shim binary, shim socket dir
- Rename zinit-shim binary to zinit_shim for naming consistency across
  all workspace binaries
feat: integrate youki libcgroups stats for detailed process observability
Some checks failed
Build and Test / build (pull_request) Failing after 1m21s
Tests / test (pull_request) Failing after 1m24s
780283e283
Vendor cgroup v2 stats-reading code from youki/libcgroups (Apache-2.0)
to provide comprehensive per-service observability: CPU usage/throttling/PSI,
memory current/peak/swap/cache/OOM events/PSI, block IO per-device bytes/ops/PSI,
and PID counts — all without pulling in youki's heavy dependency chain.

- Add cgroup_stats.rs: ~450 lines of pure parsing, 14 unit tests
- Add service.stats_detailed RPC method and GET /api/services/{name}/stats endpoint
- Add DetailedServiceStats SDK response types with PSI support
- Add stats_detailed() to both sync and async client libraries
- Refactor read_usage() to delegate to cgroup_stats::read_all_stats()
Author
Owner

done in f30b1c3d94

done in https://forge.ourworld.tf/geomind_code/my_init/commit/f30b1c3d94e0f359bd20a1d7717cc3aa8c6474dd
thabeta closed this pull request 2026-03-19 21:36:27 +00:00
Some checks failed
Build and Test / build (pull_request) Failing after 1m21s
Tests / test (pull_request) Failing after 1m24s

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
geomind_code/my_init!13
No description provided.