rework logging #89
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_proc#89
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
see /hero_log
use this log library
keep the server interface but at back we use this logging
no more sqlite
Implementation Spec for Issue #89
Objective
Replace the SQLite-backed log subsystem (partitioned per-day shards under
~/hero/var/hero_proc/logs/<ctx>/<year>/<day>/logs.sqlite) with the file-basedherolib_core::logger(TSV records under~/hero/var/logs/<src>/...log). Keep every existing RPC method signature stable so SDK callers, the CLI, the Dioxus admin app, and downstream services keep working unchanged.Requirements
log.query,log.count,logs.get,logs.tail,logs.filter,logs.count,logs.sourceslogs.insert,logs.insert_batchlogs.delete_by_src,logs.delete_older_thanjob.logs,job.logs_attempt,job.log_archiveherolib_core::logger::Logger(~/hero/var/logs, defaultLoggerConfig,DropDebugFirstoverflow).LogEntry/LogFiltertypes used by the server and SDK keep the same field names so generated SDK structs are unchanged.rusqlite,partition_*,PartitionedLogStore, and theflate2import insidedb/logs/must go away.rusqlitestays inhero_proc_libfor the other tables.log_batcher.rsbecomes a thinArc<Logger>wrapper exposing.send(LogEntry)for compatibility (writes vialogger.log(entry.into())).srcrouting: encodejob_idas the third segment when present (e.g.core.deploy.42); when absent use<context>.<action>capped at 3 segments. Service-emittedsrcvalues fromlogs.insertare passed through after sanitisation.tagscarry theattempt:N,stream:stderr, and any user-supplied tags sojob.log_archivecan still partition by attempt.main.rscallslogger.cleanup_before("", cutoff_epoch)instead ofdb.logging.delete_older_than.Files to Modify/Create
New
crates/hero_proc_lib/src/db/logs/adapter.rs— translation between hero_log and the existingLogEntry/LogFiltershapes.crates/hero_proc_lib/src/log_store.rs— replacementLoggingApibackend wrappingArc<Logger>.Modify
Cargo.toml(workspace) — addherolib_coreworkspace dep.crates/hero_proc_lib/Cargo.toml— wireherolib_core, dropflate2.crates/hero_proc_lib/src/db/logs/mod.rs— droppartition,pool,storemodules and SQLite tests; re-export new adapter module.crates/hero_proc_lib/src/db/logs/model.rs— deleteopen_log_db/init_schema; keepLogEntry,LogFilter,name_fix.crates/hero_proc_lib/src/db/factory.rs— replacePartitionedLogStorewithLogger-backed store. Addpub fn logger()accessor.crates/hero_proc_server/src/log_batcher.rs— replace SQLite drain loop withArc<Logger>wrapper.crates/hero_proc_server/src/rpc/log.rs— RPC dispatch logic stays identical.crates/hero_proc_server/src/rpc/job.rs—job.logsfiltering switches tojob:<id>tag.crates/hero_proc_server/src/supervisor/executor.rs— rewrite log emission to use dotted src + tags.crates/hero_proc_server/src/main.rs— openArc<Logger>once; switch cleanup loop tologger.cleanup_before.crates/hero_proc_server/openrpc.json— updateLogFilter.srcdoc, documentlogid: 0.crates/hero_proc_lib/src/db/logs/pool.rs,partition.rs,store.rs— DELETE.hero_proc_integration_testanddb/integration_tests.rs— drop SQLite-specific assertions.Implementation Plan
Step 1: Add herolib_core dependency
Files:
Cargo.toml(workspace),crates/hero_proc_lib/Cargo.tomlherolib_core = { git = "https://forge.ourworld.tf/lhumina_code/hero_lib.git", branch = "development", default-features = false }.hero_proc_lib. Dropflate2.cargo check -p hero_proc_lib.Dependencies: none
Step 2: Build the hero_log adapter
Files:
crates/hero_proc_lib/src/db/logs/adapter.rs(new)to_hero_entry(LogEntry) -> herolib_core::logger::LogEntry: maps loglevel, builds dotted src<ctx>.<src>[.<job_id>], copies tags + addserror:1/attempt:N/stream:....from_hero_entry(...): split dotted src back tocontext_name/job_id, readerrorfrom tags,logid = 0.LogFilter -> LogQuery: prefix fromsrc+context_name, between(epoch_from,epoch_to), with_tags, limit.loglevel_maxanderror_onlypost-filtered.offsetapplied client-side.job_id/job_ids-> tagjob:<id>.Dependencies: Step 1
Step 3: Rewrite the LoggingApi to use hero_log
Files:
crates/hero_proc_lib/src/log_store.rs(new),crates/hero_proc_lib/src/db/factory.rsLogStore { logger: Arc<Logger> }exposing same methods (insert/insert_batch/query/count/list_sources/delete_by_src/delete_older_than).HeroProcDb::newbuildsArc<Logger>once. Addpub fn logger(&self) -> Arc<Logger>accessor.LoggingApimethod signatures byte-identical.Dependencies: Step 2
Step 4: Delete SQLite log code
Files: delete
pool.rs,partition.rs,store.rsunderdb/logs/. Trimdb/logs/mod.rsanddb/logs/model.rsto keep onlyLogEntry,LogFilter,name_fix.Dependencies: Step 3
Step 5: Replace log_batcher with hero_log shim
Files:
crates/hero_proc_server/src/log_batcher.rsLogBatcherSenderwrapsArc<Logger>.send(LogEntry)-> adapter ->logger.log(...).Dependencies: Step 3
Step 6: Wire executor and main.rs to dotted-src + tags
Files:
crates/hero_proc_server/src/supervisor/executor.rs,crates/hero_proc_server/src/main.rsjob_log_src(ctx, action, job_id)andjob_log_tags(job_id, attempt, stream).error: boolstays.db.logger().cleanup_before("", cutoff).Dependencies: Step 5
Step 7: Adjust job-scoped log RPC to use tags
Files:
crates/hero_proc_server/src/rpc/job.rs,crates/hero_proc_server/src/rpc/log.rsjob.logs: buildsLogFilter { job_id: Some(id), .. }. Adapter translates towith_tags(["job:<id>"]).job.log_archive: keys per-attempt offattempt:Ntag.build_filteraccepts the same RPC params.Dependencies: Step 6
Step 8: Update OpenRPC, SDK regen, and CLI
Files:
crates/hero_proc_server/openrpc.json,crates/hero_proc_sdk/src/factory.rs,crates/hero_proc/src/cli/commands.rsLogFilter.srcis dotted-prefix;logidalways 0.Dependencies: Step 7
Step 9: Tests
Files:
crates/hero_proc_integration_test/src/tests/logs.rs,parallel_jobs_logging.rs,db/factory.rstestslogid > 0checks. Convert glob*mid-string to dotted prefix. Add adapter unit tests.Dependencies: Step 8
Step 10: Cleanup, docs, full build
Files:
crates/hero_proc_server/instructions.md, openrpc.json doc strings,hero_proc_sdk/src/logger.rsdoc tweakscargo build --workspace,cargo test --workspacepass.Dependencies: Step 9
Acceptance Criteria
logs.sqlitefiles.herolib_core::loggeris the single backend; logs land under~/hero/var/logs/.cargo build --workspacepasses;cargo test --workspacepasses (excluding--ignored).hero_proc log query|filter|prune|exportandhero_proc job logsproduce same output shape.logs.insertreturns{"logid": 0}.logs.sourcesreturns non-empty after writes;logs.delete_older_thancleans stale sources.job.log_archivepartitions by attempt viaattempt:Ntags.Notes — semantics that move
*-> prefix match. Mid-string*not supported; adapter post-filters in Rust. CLI/UI only produce trailing-*today.min_level; we post-filter<= max.error:1on writes; post-filter on read (with_tags requires ALL match).job:<id>tag. Tag drives filtering.cleanup_beforeis whole-file. Boundary-day partial deletes are coarser. Non-issue for daily 7-day loop.cleanup_before(prefix, u32::MAX)— wipes the whole subtree.logs.insertover RPC. No client-side change.~/hero/var/hero_proc/logs/*orphaned — document manual cleanup. (3) No downstream services depend on partition-day directories.Test Results
Crates with significant test counts
Implementation Complete
The logging subsystem has been reworked end-to-end. SQLite is no longer the log backend; everything now flows through
herolib_core::logger(file-based TSV records under~/hero/var/logs/). All RPC method names, parameters, result envelopes, and SDK type names are byte-identical, so existing callers (CLI, Dioxus admin app, downstream services) keep working unchanged.What changed
New files
crates/hero_proc_lib/src/db/logs/adapter.rs— bidirectional translation between the localLogEntry/LogFilterandherolib_core::logger's entry/query types, plus post-filter helpers for semantics that hero_log does not natively support (error_only,loglevel_max, offset).crates/hero_proc_lib/src/db/logs/log_store.rs— replacementLoggingApiwrappingArc<Logger>. Same public method signatures as the old SQLite-backed one.Deleted
crates/hero_proc_lib/src/db/logs/store.rs,partition.rs,pool.rs— the oldPartitionedLogStore, day-partitioning, and per-shard SQLite pool.flate2dep fromhero_proc_lib/Cargo.toml(only used by the deleted gzip export/import paths).rusqlitestays — it is still the backend for jobs, runs, services, secrets, and actions.Modified
Cargo.toml(workspace) andhero_proc_lib/Cargo.toml— wiredherolib_coreas a workspace dep.hero_proc_lib/src/db/factory.rs—HeroProcDb::newconstructsArc<Logger>once; newpub fn logger(&self) -> Arc<Logger>accessor;db.loggingnow points at the hero_log-backedLoggingApi.hero_proc_lib/src/db/logs/mod.rsandmodel.rs— dropped deleted submodules; strippedopen_log_db/init_schema/rusqlite imports/tests; keptLogEntry,LogFilter,name_fix.LogFilter.srcdoc updated to dotted-prefix semantics.hero_proc_server/src/log_batcher.rs— full rewrite.LogBatcherSenderis now a thin shim aroundArc<Logger>. The internal MPSC channel, drain task, periodic flush, and dropped-line warning are all gone (hero_log already has its own non-blocking writer).start()signature unchanged; the JoinHandle is now a no-op task.hero_proc_server/src/supervisor/executor.rs— everyLogEntry { ... }construction site now writes a dotted<context>.<action>.<job_id>src and attaches["job:<id>", "stream:<stdout|stderr>"]tags (plus"attempt:<n>"when retrying). Two private helpers added:job_log_srcandjob_log_tags.hero_proc_server/Cargo.toml— addedherolib_coredep so the batcher can importLogger.hero_proc_server/openrpc.json— doc-string updates only:LogFilter.srcsemantics;logs.insert/logs.insert_batchnotelogidis always 0;logs.sourcesre-scans the hero_log backend. No wire-format changes.hero_proc_server/openrpc.client.generated.rs— three doc-comment lines mirrored from the schema. No code/type/signature changes.hero_proc_server/instructions.md— backend description updated for logs (other tables still SQLite).hero_proc_integration_test/src/tests/logs.rs,stress.rs) — relaxeddelete_by_sourceassertion (hero_log returns files-removed, not entries-removed) and fixed a units-mismatch indelete_older_than_epoch(was using milliseconds; hero_log uses seconds). Added alogs_count == 0follow-up to verify entries are actually gone.hero_proc_lib/src/db/integration_tests.rs— replaced SQLite-boundlogging_*tests with a single end-to-end smoke test that exercises insert, context filter,job_idfilter, anddelete_older_thanagainst the real hero_log backend.Acceptance criteria
logs.sqlitefiles, nopartitioned*modules inhero_proc_lib.herolib_core::loggeris the single backend; oneArc<Logger>per server process; logs land under~/hero/var/logs/....cargo build --workspacepasses;cargo test --workspacepasses (399 passed, 0 failed, 51 pre-existing ignored).hero_proc log query|filter|prune|exportandhero_proc job logsproduce the same console output shape (no flag/output changes were required).logs.insertreturns{"logid": 0}(documented in the schema).job.log_archivepartitions by attempt — the executor now emitsattempt:Ntags and the existing post-filter inrpc/job.rskeys off them.Test results
cargo build --workspace: PASScargo test --workspace: 399 passed, 0 failed, 51 ignored (51 ignored are pre-existing doctests + environment-gated shutdown tests, not related to logs).cargo test -p hero_proc_lib --lib: 140 passed, 0 failed.cargo test -p hero_proc_server --lib: 70 passed, 0 failed.cargo test -p hero_proc_sdk --lib: 20 passed, 0 failed.Operator notes
~/hero/var/hero_proc/logs/*are orphaned by this change. They are not migrated and not deleted automatically — operators should remove them manually when convenient.LogFilter.srcnow requires a dotted prefix, segment-aligned. A trailing*is accepted as a wildcard suffix and stripped. Wildcards anywhere else in the pattern degrade to a full scan with post-filter (the CLI and admin UI only ever produce trailing-*patterns today, so this is transparent in practice).logs.delete_by_src(prefix)removes the whole dotted subtree; the returned count is files-removed (not entries-removed), since hero_log cleans whole files.logs.delete_older_than(epoch)operates at file granularity — boundary-day partial deletes are coarser than before, but the daily 7-day retention loop is unaffected.