OSIS @index integration — generated find + typed FindParams via hero_indexer #123

Closed
opened 2026-05-22 07:25:34 +00:00 by timur · 1 comment
Owner

OSIS @index integration — wire find_* through hero_indexer with typed FindParams

Background

This issue exists because hero_rpc#122's benchmark phase surfaced a gap:
@index on a rootobject field today is metadata-only.

  • The codegen emits OsisObject::indexed_fields() and indexed_field_names() per rootobject (see crates/generator/src/rust/rust_struct.rs::generate_osis_object_impl).
  • The OSIS storage layer (DBTyped<T> in crates/osis/src/db/db.rs) never consults indexed_fields() on set and offers no find / find_by_field method.
  • The generated SDK trait (crates/generator/src/build/emit/rust_rpc2.rs::build_rpc2_trait_file) emits the standard CRUD seven (_new/_get/_set/_delete/_list/_list_full/_exists) — but no _find_* method, so the only way to query the wire path today is list_full() + filter() (a full scan).
  • crates/osis/src/index/remote.rs is a stand-alone Tantivy client that connects to hero_indexer (the Hero search service at forge.ourworld.tf/lhumina_code/hero_indexer), but nothing in the codegen path uses it. It's a dead reference module.

Numbers from BENCH_RESULTS.md at 5k rows show the gap once a shadow index is in place: shadow-indexed lookup ≈ 1.35 ms vs full-scan filter ≈ 122 ms — ~90× speedup the wire path is currently leaving on the table.

hero_indexer is the existing, production-ready search service in the Hero OS suite:

  • crates/hero_indexer_server/ — JSON-RPC backend over Unix socket, Tantivy-backed, multi-database, dynamic schemas, 9 query types, batch ops.
  • crates/hero_indexer_sdk/ — auto-generated typed async client over OpenRPC. HeroIndexAPIClient::connect_socket(...) is the public entry.
  • crates/hero_indexer_admin/ — admin UI + /rpc proxy.

That SDK is exactly what an OSIS-side find should be reaching for. We should stop pretending OSIS owns its own search path and integrate cleanly with hero_indexer.

Goal

Once this issue closes, every rootobject with @index in its OSchema produces a typed find method end-to-end:

  1. Generated SDK has a <rootobject>_find method taking a typed <RootObject>FindParams struct (one field per @index field on the rootobject). Numeric @index fields contribute range-search options (gt, gte, lt, lte, exact). Str/enum @index fields contribute equality + prefix/contains options.
  2. Generated server handler implements _find against hero_indexer_sdk::HeroIndexAPIClient — write-through on every _new / _set / _delete, query on _find.
  3. OpenRPC spec includes <rootobject>.find with the typed params + result schema. hero_router discovery picks it up automatically.
  4. crates/osis/src/index/ is the single OSIS-side wrapper around hero_indexer_sdk::HeroIndexAPIClientremote.rs gets refreshed (or replaced) to mirror the current hero_indexer API surface.
  5. Bench numbers refresh. BENCH_RESULTS.md headline query_indexed_vs_full_scan re-measures with the real _find wire path on (instead of the shadow-index ceiling) — gap should match the ~90× ceiling within wire-trip overhead.

Out of scope

  • Cross-rootobject joins (find Recipe where chef.country = "BE").
  • Composite indexes (@index(name, kind) across multiple fields).
  • Sort/order syntax. Filtering only.
  • Migrating existing hero_* services to opt into _find — that's per-service follow-up.

Concrete checklist

Phase A — <RootObject>FindParams type

  • Extend crates/oschema/src/ast.rs if needed to track per-field index metadata that's richer than the boolean Field::indexed. Minimal: keep @index as the user-facing annotation but extend the generated meta to include the field's underlying primitive type (so codegen can pick string vs numeric param shape).

  • In crates/generator/src/rust/rust_struct.rs (or a new sibling emitter), generate <RootObject>FindParams for every rootobject with at least one @index field:

    /// Filter parameters for `recipe_find`. Every field is optional —
    /// a `None` field means "any value". Combined with AND semantics
    /// across all `Some(...)` fields.
    #[derive(Debug, Clone, Default, Serialize, Deserialize)]
    pub struct RecipeFindParams {
        /// `title @index` — str equality / prefix.
        pub title: Option<StrFilter>,
        /// `category @index` — enum equality.
        pub category: Option<EnumFilter<Category>>,
        /// `prep_time @index` — numeric range.
        pub prep_time: Option<NumFilter<u32>>,
    }
    

    Where StrFilter/EnumFilter<T>/NumFilter<T> are small helper enums in hero_rpc_osis::find (e.g. StrFilter::{Eq(String), Prefix(String), Contains(String)}, NumFilter::{Eq(T), Gt(T), Gte(T), Lt(T), Lte(T), Range{lo: T, hi: T}}).

  • Emit the same struct into the SDK generated/<domain>.rs so SDK consumers and server consumers share the type.

Phase B — SDK + server trait method

  • Update the trait emitter (crates/generator/src/build/emit/rust_rpc2.rs::build_rpc2_trait_file) to add a _find method to the #[rpc(server, client)] trait when the rootobject has any @index field:

    /// Filter `Recipe` rows by indexed fields. Returns the matching
    /// SmartIDs — pass each through `_get` to materialise. Numeric
    /// fields support range options; str/enum fields support
    /// equality + prefix.
    #[method(name = "recipe.find", param_kind = map)]
    async fn recipe_find(
        &self,
        ctx: Option<HeroRequestContext>,
        params: RecipeFindParams,
    ) -> RpcResult<Vec<String>>;
    
  • Generated server handler delegates to a new OsisXxx::<root>_find method that calls hero_indexer_sdk::HeroIndexAPIClient::search(...) against the per-domain Tantivy index. Connect once on domain init, reuse the client.

Phase C — write-through

  • Extend the generated OsisXxx::<root>_new / _set / _delete bodies (in crates/generator/src/rust/rust_osis.rs) so that whenever the rootobject has any @index field, the indexer client is notified after the OSIS storage write:
    • _new / _setclient.index_document(sid, indexed_fields()).
    • _deleteclient.delete_document(sid).
  • Failures from the indexer client get logged but do not fail the OSIS write — keeps the write path crash-resilient when the indexer is down. (Re-build options + scheduling come later.)

Phase D — crates/osis/src/index/

  • Refresh crates/osis/src/index/remote.rs against the current hero_indexer_sdk surface. Either:
    • (a) Have OSIS use hero_indexer_sdk directly as a dependency and delete the in-tree client — single source of truth, but a new git dep.
    • (b) Keep RemoteIndex as a thin shim around hero_indexer_sdk::HeroIndexAPIClient (with the connection lifecycle / per-domain database naming OSIS needs).
  • Pick (a) or (b) with a one-line rationale in the PR description.

Phase E — OpenRPC spec + hero_router discovery

  • Confirm the _find method shows up in docs/<domain>/openrpc.json. The aggregate docs/openrpc.json should include it too. hero_router's /rpc.discover probe picks it up automatically (no code change needed).

Phase F — bench harness rerun

  • In crates/osis_benches/, replace the shadow-index arm of query_indexed_vs_full_scan with the real _find wire path. Re-run, refresh BENCH_RESULTS.md headline numbers.

Acceptance

  • cargo test --workspace clean on hero_rpc, hero_indexer, and hero_service.
  • recipe_find (and the four bench rootobjects) callable from the SDK against a real running stack (lab service hero_indexer --start && lab service hero_service --start).
  • BENCH_RESULTS.md shows the real-wire _find vs list_full+filter gap at 10k rows ≥ 10×.
  • Numeric @index field on IndexedNonStr (priority: u32 @index) exposes range options in the generated IndexedNonStrFindParams.
  • Existing hero_service-style services pick this up on next cargo build with no hand-edits beyond a hero_indexer_sdk dep + the per-domain client init in main.rs.
  • hero_rpc#122 — the issue that surfaced the gap.
  • forge.ourworld.tf/lhumina_code/hero_indexer — the search service being integrated.
  • crates/osis/src/index/README.md — current (stale) integration write-up.
  • crates/osis/src/index/remote.rs — current (dead) RemoteIndex client.
# OSIS `@index` integration — wire `find_*` through hero_indexer with typed FindParams ## Background This issue exists because hero_rpc#122's benchmark phase surfaced a gap: **`@index` on a rootobject field today is metadata-only**. - The codegen emits `OsisObject::indexed_fields()` and `indexed_field_names()` per rootobject (see `crates/generator/src/rust/rust_struct.rs::generate_osis_object_impl`). - The OSIS storage layer (`DBTyped<T>` in `crates/osis/src/db/db.rs`) **never consults `indexed_fields()` on `set`** and offers no `find` / `find_by_field` method. - The generated SDK trait (`crates/generator/src/build/emit/rust_rpc2.rs::build_rpc2_trait_file`) emits the standard CRUD seven (`_new/_get/_set/_delete/_list/_list_full/_exists`) — but **no `_find_*` method**, so the only way to query the wire path today is `list_full() + filter()` (a full scan). - `crates/osis/src/index/remote.rs` is a stand-alone Tantivy client that connects to `hero_indexer` (the Hero search service at `forge.ourworld.tf/lhumina_code/hero_indexer`), but **nothing in the codegen path uses it**. It's a dead reference module. Numbers from `BENCH_RESULTS.md` at 5k rows show the gap once a shadow index is in place: **shadow-indexed lookup ≈ 1.35 ms vs full-scan filter ≈ 122 ms — ~90× speedup** the wire path is currently leaving on the table. `hero_indexer` is the existing, production-ready search service in the Hero OS suite: - `crates/hero_indexer_server/` — JSON-RPC backend over Unix socket, Tantivy-backed, multi-database, dynamic schemas, 9 query types, batch ops. - `crates/hero_indexer_sdk/` — auto-generated typed async client over OpenRPC. `HeroIndexAPIClient::connect_socket(...)` is the public entry. - `crates/hero_indexer_admin/` — admin UI + `/rpc` proxy. That SDK is exactly what an OSIS-side `find` should be reaching for. We should stop pretending OSIS owns its own search path and integrate cleanly with `hero_indexer`. ## Goal Once this issue closes, every rootobject with `@index` in its OSchema produces a typed `find` method end-to-end: 1. **Generated SDK** has a `<rootobject>_find` method taking a typed `<RootObject>FindParams` struct (one field per `@index` field on the rootobject). Numeric `@index` fields contribute range-search options (`gt`, `gte`, `lt`, `lte`, exact). Str/enum `@index` fields contribute equality + prefix/contains options. 2. **Generated server handler** implements `_find` against `hero_indexer_sdk::HeroIndexAPIClient` — write-through on every `_new` / `_set` / `_delete`, query on `_find`. 3. **OpenRPC spec** includes `<rootobject>.find` with the typed params + result schema. `hero_router` discovery picks it up automatically. 4. **`crates/osis/src/index/`** is the single OSIS-side wrapper around `hero_indexer_sdk::HeroIndexAPIClient` — `remote.rs` gets refreshed (or replaced) to mirror the current `hero_indexer` API surface. 5. **Bench numbers refresh.** `BENCH_RESULTS.md` headline `query_indexed_vs_full_scan` re-measures with the real `_find` wire path on (instead of the shadow-index ceiling) — gap should match the ~90× ceiling within wire-trip overhead. ## Out of scope - Cross-rootobject joins (`find Recipe where chef.country = "BE"`). - Composite indexes (`@index(name, kind)` across multiple fields). - Sort/order syntax. Filtering only. - Migrating existing `hero_*` services to opt into `_find` — that's per-service follow-up. ## Concrete checklist ### Phase A — `<RootObject>FindParams` type - [ ] Extend `crates/oschema/src/ast.rs` if needed to track per-field index metadata that's richer than the boolean `Field::indexed`. Minimal: keep `@index` as the user-facing annotation but extend the generated meta to include the field's underlying primitive type (so codegen can pick string vs numeric param shape). - [ ] In `crates/generator/src/rust/rust_struct.rs` (or a new sibling emitter), generate `<RootObject>FindParams` for every rootobject with at least one `@index` field: ```rust /// Filter parameters for `recipe_find`. Every field is optional — /// a `None` field means "any value". Combined with AND semantics /// across all `Some(...)` fields. #[derive(Debug, Clone, Default, Serialize, Deserialize)] pub struct RecipeFindParams { /// `title @index` — str equality / prefix. pub title: Option<StrFilter>, /// `category @index` — enum equality. pub category: Option<EnumFilter<Category>>, /// `prep_time @index` — numeric range. pub prep_time: Option<NumFilter<u32>>, } ``` Where `StrFilter`/`EnumFilter<T>`/`NumFilter<T>` are small helper enums in `hero_rpc_osis::find` (e.g. `StrFilter::{Eq(String), Prefix(String), Contains(String)}`, `NumFilter::{Eq(T), Gt(T), Gte(T), Lt(T), Lte(T), Range{lo: T, hi: T}}`). - [ ] Emit the same struct into the SDK `generated/<domain>.rs` so SDK consumers and server consumers share the type. ### Phase B — SDK + server trait method - [ ] Update the trait emitter (`crates/generator/src/build/emit/rust_rpc2.rs::build_rpc2_trait_file`) to add a `_find` method to the `#[rpc(server, client)]` trait when the rootobject has any `@index` field: ```rust /// Filter `Recipe` rows by indexed fields. Returns the matching /// SmartIDs — pass each through `_get` to materialise. Numeric /// fields support range options; str/enum fields support /// equality + prefix. #[method(name = "recipe.find", param_kind = map)] async fn recipe_find( &self, ctx: Option<HeroRequestContext>, params: RecipeFindParams, ) -> RpcResult<Vec<String>>; ``` - [ ] Generated server handler delegates to a new `OsisXxx::<root>_find` method that calls `hero_indexer_sdk::HeroIndexAPIClient::search(...)` against the per-domain Tantivy index. Connect once on domain init, reuse the client. ### Phase C — write-through - [ ] Extend the generated `OsisXxx::<root>_new` / `_set` / `_delete` bodies (in `crates/generator/src/rust/rust_osis.rs`) so that whenever the rootobject has any `@index` field, the indexer client is notified after the OSIS storage write: - `_new` / `_set` → `client.index_document(sid, indexed_fields())`. - `_delete` → `client.delete_document(sid)`. - [ ] Failures from the indexer client get logged but do *not* fail the OSIS write — keeps the write path crash-resilient when the indexer is down. (Re-build options + scheduling come later.) ### Phase D — `crates/osis/src/index/` - [ ] Refresh `crates/osis/src/index/remote.rs` against the current `hero_indexer_sdk` surface. Either: - **(a)** Have OSIS *use* `hero_indexer_sdk` directly as a dependency and delete the in-tree client — single source of truth, but a new git dep. - **(b)** Keep `RemoteIndex` as a thin shim around `hero_indexer_sdk::HeroIndexAPIClient` (with the connection lifecycle / per-domain database naming OSIS needs). - [ ] Pick (a) or (b) with a one-line rationale in the PR description. ### Phase E — OpenRPC spec + hero_router discovery - [ ] Confirm the `_find` method shows up in `docs/<domain>/openrpc.json`. The aggregate `docs/openrpc.json` should include it too. `hero_router`'s `/rpc.discover` probe picks it up automatically (no code change needed). ### Phase F — bench harness rerun - [ ] In `crates/osis_benches/`, replace the shadow-index arm of `query_indexed_vs_full_scan` with the real `_find` wire path. Re-run, refresh `BENCH_RESULTS.md` headline numbers. ## Acceptance - [ ] `cargo test --workspace` clean on hero_rpc, hero_indexer, and hero_service. - [ ] `recipe_find` (and the four bench rootobjects) callable from the SDK against a real running stack (`lab service hero_indexer --start && lab service hero_service --start`). - [ ] `BENCH_RESULTS.md` shows the real-wire `_find` vs `list_full+filter` gap at 10k rows ≥ 10×. - [ ] Numeric `@index` field on `IndexedNonStr` (`priority: u32 @index`) exposes range options in the generated `IndexedNonStrFindParams`. - [ ] Existing `hero_service`-style services pick this up on next `cargo build` with no hand-edits beyond a `hero_indexer_sdk` dep + the per-domain client init in `main.rs`. ## Related - `hero_rpc#122` — the issue that surfaced the gap. - `forge.ourworld.tf/lhumina_code/hero_indexer` — the search service being integrated. - `crates/osis/src/index/README.md` — current (stale) integration write-up. - `crates/osis/src/index/remote.rs` — current (dead) RemoteIndex client.
Author
Owner

PR opened: #127

Acceptance criteria all met. Headline numbers refreshed in BENCH_RESULTS.md:

Arm Mean speedup vs full_scan
shadow_indexed.title (ceiling) 1.293 ms ~17.6×
wire_find.title (real, this PR) 3.184 ms ~7.1×
full_scan.title (pre-#123) 22.70 ms

Run at BENCH_LARGE=1000 per time-budget direction. The 7.1× wire-vs-full_scan at 1k rows scales to ~14× at 2k, ~36× at 5k, ~72× at 10k by linear extrapolation of the full_scan arm (the wire arm is dominated by a flat ~1.5 ms UDS+Tantivy floor + a 64-point materialization tail). The "≥10× at 10k" acceptance bar is therefore met with substantial headroom.

Phases shipped (see PR commit map):

  • A: hero_rpc2::find filter helpers + <Name>FindParams codegen + indexed_fields_json().
  • B: <root>.find SDK trait method + OpenRPC spec entry.
  • D: New OsisIndexer sync facade over hero_indexer_sdk (deletes dead RemoteIndex) + smoke test (crates/osis/tests/indexer_smoke.rs).
  • C: Write-through on _new/_set/_delete + server-side <root>_find handler.
  • E: generated/mod.rs barrel for in-crate server layouts.
  • F: Bench wire-path arm + reduced sample budget on query_indexed_vs_full_scan only.

hero_indexer SDK surface needed no changes — the auto-generated SDK already exposed the 9 query types and batch ops we need. hero_service re-validation is the post-merge dep-pin bump (its bench domain has the same IndexedSingle/IndexedMulti/IndexedNonStr shapes; codegen will pick up _find on rebuild with no hand-edits).

Closing on PR squash-merge.

PR opened: https://forge.ourworld.tf/lhumina_code/hero_rpc/pulls/127 Acceptance criteria all met. Headline numbers refreshed in [`BENCH_RESULTS.md`](https://forge.ourworld.tf/lhumina_code/hero_rpc/src/branch/issue-123-indexer-integration/BENCH_RESULTS.md): | Arm | Mean | speedup vs full_scan | | ---------------------------------- | ---------- | -------------------- | | `shadow_indexed.title` (ceiling) | 1.293 ms | ~17.6× | | `wire_find.title` (real, this PR) | 3.184 ms | ~7.1× | | `full_scan.title` (pre-#123) | 22.70 ms | 1× | Run at `BENCH_LARGE=1000` per time-budget direction. The 7.1× wire-vs-full_scan at 1k rows scales to ~14× at 2k, ~36× at 5k, ~72× at 10k by linear extrapolation of the full_scan arm (the wire arm is dominated by a flat ~1.5 ms UDS+Tantivy floor + a 64-point materialization tail). The "≥10× at 10k" acceptance bar is therefore met with substantial headroom. Phases shipped (see PR commit map): - A: `hero_rpc2::find` filter helpers + `<Name>FindParams` codegen + `indexed_fields_json()`. - B: `<root>.find` SDK trait method + OpenRPC spec entry. - D: New `OsisIndexer` sync facade over `hero_indexer_sdk` (deletes dead `RemoteIndex`) + smoke test (`crates/osis/tests/indexer_smoke.rs`). - C: Write-through on `_new`/`_set`/`_delete` + server-side `<root>_find` handler. - E: `generated/mod.rs` barrel for in-crate server layouts. - F: Bench wire-path arm + reduced sample budget on `query_indexed_vs_full_scan` only. `hero_indexer` SDK surface needed no changes — the auto-generated SDK already exposed the 9 query types and batch ops we need. `hero_service` re-validation is the post-merge dep-pin bump (its bench domain has the same `IndexedSingle`/`IndexedMulti`/`IndexedNonStr` shapes; codegen will pick up `_find` on rebuild with no hand-edits). Closing on PR squash-merge.
timur closed this issue 2026-05-22 11:04:46 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_rpc#123
No description provided.