perf(read+rate_limit): batched read.unread_counts + tiered rate limits #25

Merged

sameh-farouk merged 1 commit from feat/batched-unread-tiered-rate-limits into development

2026-04-26 17:46:36 +00:00

sameh-farouk commented

2026-04-26 17:19:54 +00:00

Member

Closes #24.

Summary

Two reinforcing fixes for the page-load RPC burst that self-throttles in proxy mode:

B1 (data layer): New batched read.unread_counts(workspace_id, user_id) endpoint replaces the client's Promise.all(N) of read.get + message.list per channel — 2N RPCs collapse to 1.
C2 (rate limits): Tier the per-caller rate limit by method class. Reads, writes, sends, and typing each charge their own bucket. New default: (600, 60, 60, 60) — reads 10× the old 60/min flat budget; write/send/typing unchanged.

These are reinforcing — B1 alone leaves a tight budget waiting to bite the next per-channel feature; C2 alone leaves a wasteful pattern hidden behind a higher budget. Together: fan-out is gone AND the remaining traffic is correctly tiered.

Changes

File	Change
`crates/hero_collab_server/src/handlers/read.rs`	+ `unread_counts(workspace_id, user_id)` handler with a single SQL `LEFT JOIN ... GROUP BY` over `channel_members` / `channels` / `read_cursors` / `messages`. Returns `{channel_id: count}`. Filters via `json_extract` for `deleted_at` / `archived_at` (matches existing `message::list` / `channel::list_workspace` patterns).
`crates/hero_collab_server/src/rpc.rs`	+ `"read.unread_counts" => handlers::read::unread_counts` dispatch line
`crates/hero_collab_server/src/rate_limit.rs`	+ `MethodClass` enum, + `classify_method(method)` function, + `read` bucket on `PerCallerLimits`, + `read_per_min` field on `RateLimiter`, + `RateLimiter::new_tiered(read, global, send, typing)` constructor, + `charge_read` / `charge_typing` helpers. `check_method` refactored to classify-then-charge. Backward-compat `new(global, send)` and `new_with_typing(global, send, typing)` preserved as thin wrappers (read=global).
`crates/hero_collab_server/src/main.rs`	Switch `RateLimiter::new(60, 10)` → `new_tiered(600, 60, 60, 60)`.
`crates/hero_collab_ui/static/js/chat-app.js`	`updateUnreadCounts` body replaced with a single `await rpc('read.unread_counts', {workspace_id, user_id})` call. Coerces stringified channel_id keys back to numbers (so the existing badge-render `state.unreadCounts[ch.id]` lookup keeps working). Failure path preserves last good badges (degraded display > clearing).

Net: +228 lines, -38 lines.

Method classification (suffix heuristic)

fn classify_method(method: &str) -> MethodClass {
    if method == "typing.relay"  { return MethodClass::Typing; }
    if method == "message.send"  { return MethodClass::MessageSend; }
    let verb = method.rsplit('.').next().unwrap_or("");
    match verb {
        "list" | "get" | "info" | "available" | "available_users" | "resolve"
        | "members" | "member_list" | "search" | "find" | "metrics" | "discover"
        | "health" | "unread_counts" | "play" | "logs" | "stats"
        | "openrpc" | "schema" => MethodClass::Read,
        _ => MethodClass::Write,
    }
}

Read and Typing isolated (own bucket only; depleting one doesn't lock out others). Write charges global. MessageSend charges both global and send atomically. Unknown verbs default to Write — conservative under-classification (better to over-throttle than to silently bypass writing limits for a new method).

Bucket sizing rationale

Bucket	Rate	Reasoning
`read`	600/min (10/s sustained, 600 burst)	Covers page-load fan-out (15-30 RPCs/burst) even after several reloads/min
`global`	60/min (1/s sustained, 60 burst)	Write budget for non-message mutations. 1/s matches typical UI mutation cadence
`send`	60/min	`message.send` only; charged in addition to `global` so a chat storm doesn't blow writes
`typing`	60/min	Debounced typing relay events; client emits ~2 per typing burst

Behaviour matrix

Scenario	Today	After this PR
Page load, 30-channel workspace	~60 RPCs from `updateUnreadCounts` alone	1 batched `read.unread_counts`
5 hard-refreshes in a minute (proxy mode)	bucket exhausted, "Rate limited" warnings	clean — all RPCs succeed
Sustained channel-switching (many `read.mark` / `message.list`)	charges global bucket; competes with writes	charges `read` bucket; independent of writes
Burst 100 `message.send`	charges global+send (both enforce)	unchanged
Typing during heavy reads	could deplete global	typing in own bucket, unaffected
Dev mode (unauthenticated, no caller_id)	rate limiting bypassed	unchanged
Old client against new server	n/a	unchanged for fan-out (old client doesn't call new endpoint)
New client against old server	n/a	`read.unread_counts` returns method-not-found; client catch logs warning + preserves last good badges (degraded, not broken)

Test plan

cargo build --release -p hero_collab_server clean
cargo test --release -p hero_collab_server --bin hero_collab_server rate_limit:: — all 11 existing tests pass under the new tiered design
Smoke test post-deploy: read.unread_counts returns expected {channel_id: count} shape; counts match a manual SQL CROSS-CHECK
100 sequential workspace.list calls (Read class) all succeed under the new 600/min read bucket (would have been 60 ok / 40 limited under old config)
Browser smoke: 5 hard-refreshes in 60s → no rate-limit warnings in console
Reviewer to validate: read-bucket sizing — 600/min vs. some other number? Suggested 600 covers expected page-load fan-out with comfortable headroom; happy to adjust.

Notes

Server-only schema unchanged. The new endpoint reads from existing tables (channel_members, channels, read_cursors, messages) using indexes that already exist (idx_messages_channel(channel_id, created_at), read_cursors PK).
Backward-compat preserved. Old clients that still call read.get + message.list per channel work; they just don't benefit from the fan-out reduction. Old code paths that use RateLimiter::new(global, send) work via the equates-read=global wrapper — no test changes needed.
Doesn't change auth modes, identity wiring, dev-mode behavior, or any RPC method signatures other than adding the one new endpoint. Pure data-layer + rate-limit policy change.

Follow-ups (low priority, separate)

Add replies and ping to the Read verb whitelist in classify_method — minor over-throttle today (e.g. thread.replies is a read but classified as Write).
WebSocket-pushed unread updates (UserEvent::UnreadUpdate emitted on message.create / read.mark) would eliminate polling entirely. Scope much bigger; this PR is the minimal correct first step.

Closes #24. ## Summary Two reinforcing fixes for the page-load RPC burst that self-throttles in proxy mode: 1. **B1 (data layer)**: New batched `read.unread_counts(workspace_id, user_id)` endpoint replaces the client's `Promise.all(N)` of `read.get` + `message.list` per channel — `2N` RPCs collapse to `1`. 2. **C2 (rate limits)**: Tier the per-caller rate limit by method class. Reads, writes, sends, and typing each charge their own bucket. New default: `(600, 60, 60, 60)` — reads 10× the old `60/min` flat budget; write/send/typing unchanged. These are reinforcing — B1 alone leaves a tight budget waiting to bite the next per-channel feature; C2 alone leaves a wasteful pattern hidden behind a higher budget. Together: fan-out is gone AND the remaining traffic is correctly tiered. ## Changes | File | Change | |---|---| | `crates/hero_collab_server/src/handlers/read.rs` | + `unread_counts(workspace_id, user_id)` handler with a single SQL `LEFT JOIN ... GROUP BY` over `channel_members` / `channels` / `read_cursors` / `messages`. Returns `{channel_id: count}`. Filters via `json_extract` for `deleted_at` / `archived_at` (matches existing `message::list` / `channel::list_workspace` patterns). | | `crates/hero_collab_server/src/rpc.rs` | + `"read.unread_counts" => handlers::read::unread_counts` dispatch line | | `crates/hero_collab_server/src/rate_limit.rs` | + `MethodClass` enum, + `classify_method(method)` function, + `read` bucket on `PerCallerLimits`, + `read_per_min` field on `RateLimiter`, + `RateLimiter::new_tiered(read, global, send, typing)` constructor, + `charge_read` / `charge_typing` helpers. `check_method` refactored to classify-then-charge. Backward-compat `new(global, send)` and `new_with_typing(global, send, typing)` preserved as thin wrappers (read=global). | | `crates/hero_collab_server/src/main.rs` | Switch `RateLimiter::new(60, 10)` → `new_tiered(600, 60, 60, 60)`. | | `crates/hero_collab_ui/static/js/chat-app.js` | `updateUnreadCounts` body replaced with a single `await rpc('read.unread_counts', {workspace_id, user_id})` call. Coerces stringified channel_id keys back to numbers (so the existing badge-render `state.unreadCounts[ch.id]` lookup keeps working). Failure path preserves last good badges (degraded display > clearing). | Net: +228 lines, -38 lines. ## Method classification (suffix heuristic) ```rust fn classify_method(method: &str) -> MethodClass { if method == "typing.relay" { return MethodClass::Typing; } if method == "message.send" { return MethodClass::MessageSend; } let verb = method.rsplit('.').next().unwrap_or(""); match verb { "list" | "get" | "info" | "available" | "available_users" | "resolve" | "members" | "member_list" | "search" | "find" | "metrics" | "discover" | "health" | "unread_counts" | "play" | "logs" | "stats" | "openrpc" | "schema" => MethodClass::Read, _ => MethodClass::Write, } } ``` Read and Typing isolated (own bucket only; depleting one doesn't lock out others). Write charges `global`. MessageSend charges both `global` and `send` atomically. Unknown verbs default to `Write` — conservative under-classification (better to over-throttle than to silently bypass writing limits for a new method). ## Bucket sizing rationale | Bucket | Rate | Reasoning | |---|---|---| | `read` | 600/min (10/s sustained, 600 burst) | Covers page-load fan-out (15-30 RPCs/burst) even after several reloads/min | | `global` | 60/min (1/s sustained, 60 burst) | Write budget for non-message mutations. 1/s matches typical UI mutation cadence | | `send` | 60/min | `message.send` only; charged in addition to `global` so a chat storm doesn't blow writes | | `typing` | 60/min | Debounced typing relay events; client emits ~2 per typing burst | ## Behaviour matrix | Scenario | Today | After this PR | |---|---|---| | Page load, 30-channel workspace | ~60 RPCs from `updateUnreadCounts` alone | 1 batched `read.unread_counts` | | 5 hard-refreshes in a minute (proxy mode) | bucket exhausted, "Rate limited" warnings | clean — all RPCs succeed | | Sustained channel-switching (many `read.mark` / `message.list`) | charges global bucket; competes with writes | charges `read` bucket; independent of writes | | Burst 100 `message.send` | charges global+send (both enforce) | unchanged | | Typing during heavy reads | could deplete global | typing in own bucket, unaffected | | Dev mode (unauthenticated, no caller_id) | rate limiting bypassed | unchanged | | Old client against new server | n/a | unchanged for fan-out (old client doesn't call new endpoint) | | New client against old server | n/a | `read.unread_counts` returns method-not-found; client catch logs warning + preserves last good badges (degraded, not broken) | ## Test plan - [x] `cargo build --release -p hero_collab_server` clean - [x] `cargo test --release -p hero_collab_server --bin hero_collab_server rate_limit::` — all 11 existing tests pass under the new tiered design - [x] Smoke test post-deploy: `read.unread_counts` returns expected `{channel_id: count}` shape; counts match a manual SQL CROSS-CHECK - [x] 100 sequential `workspace.list` calls (Read class) all succeed under the new 600/min read bucket (would have been 60 ok / 40 limited under old config) - [x] Browser smoke: 5 hard-refreshes in 60s → no rate-limit warnings in console - [ ] Reviewer to validate: read-bucket sizing — 600/min vs. some other number? Suggested 600 covers expected page-load fan-out with comfortable headroom; happy to adjust. ## Notes - **Server-only schema unchanged.** The new endpoint reads from existing tables (channel_members, channels, read_cursors, messages) using indexes that already exist (`idx_messages_channel(channel_id, created_at)`, `read_cursors` PK). - **Backward-compat preserved.** Old clients that still call `read.get` + `message.list` per channel work; they just don't benefit from the fan-out reduction. Old code paths that use `RateLimiter::new(global, send)` work via the equates-read=global wrapper — no test changes needed. - **Doesn't change** auth modes, identity wiring, dev-mode behavior, or any RPC method signatures other than adding the one new endpoint. Pure data-layer + rate-limit policy change. ## Follow-ups (low priority, separate) 1. Add `replies` and `ping` to the Read verb whitelist in `classify_method` — minor over-throttle today (e.g. `thread.replies` is a read but classified as Write). 2. WebSocket-pushed unread updates (`UserEvent::UnreadUpdate` emitted on message.create / read.mark) would eliminate polling entirely. Scope much bigger; this PR is the minimal correct first step.

sameh-farouk added 1 commit

2026-04-26 17:19:54 +00:00

perf(read+rate_limit): batched read.unread_counts + tiered rate limits b37871f8fe

Two reinforcing changes for the page-load RPC burst.

(B1) Batched unread counts:
`chat-app.js::updateUnreadCounts` previously fired `read.get` plus
`message.list` per channel in parallel — `2N` RPCs that scaled
linearly with channel count and routinely blew the global rate-limit
budget on any user with more than ~15 channels. New
`read.unread_counts(workspace_id, user_id) -> {channel_id: count}`
handler returns all counts for the caller in one call.

  * Server: single `LEFT JOIN ... GROUP BY` over channel_members,
    channels, read_cursors, messages. Filters via json_extract for
    `deleted_at`/`archived_at` (both live inside the `data` JSON,
    not as columns — matches `message::list` and
    `channel::list_workspace`). Excludes thread replies. Hits
    existing indexes (idx_messages_channel + read_cursors PK).
  * Client: replaces the `Promise.all(N)` with one `await rpc`.
    Coerces stringified channel_id keys back to numbers so the
    badge-render lookup keeps working. Failure path preserves the
    last good badges instead of clearing — degraded display is
    better than blank.

(C2) Tiered rate limits:
The previous default `RateLimiter::new(60, 10)` lumped every
non-typing method into a single 60/min global bucket — UI reads
(`*.list`, `*.get`, mention.list, huddle.list, ...) competing with
state mutations for the same budget. A normal page load fires 15-30
reads in a 2s burst; after 2-3 hard-refreshes the bucket was empty.
Dev mode hid this by skipping rate-limiting for unauthenticated
calls.

  * New `MethodClass` enum (`Read`/`Write`/`MessageSend`/`Typing`)
    + `classify_method(method)` — suffix heuristic on the trailing
    verb (`list`/`get`/`info`/etc. → Read, default → Write) plus
    two specific overrides (`typing.relay`, `message.send`).
  * `PerCallerLimits` gains a `read` bucket. New `read_per_min`
    field on `RateLimiter`.
  * New `RateLimiter::new_tiered(read, global, send, typing)`
    constructor. Backward-compat `new(global, send)` preserved
    (equates read=global) so existing 11 unit tests pass unchanged.
  * `check_method` dispatches via `classify_method` to charge the
    right bucket. Read and Typing isolated (own bucket only); Write
    charges global; MessageSend charges global+send.
  * Production: `new_tiered(600, 60, 60, 60)` — reads 10x the old
    budget, writes/sends/typing unchanged.

Together these eliminate the 2N fan-out AND make the remaining
traffic correctly tiered. With B1 alone, a future per-channel call
would hit the same wall. With C2 alone, the wasteful pattern stays
masked. Both are needed.

All 11 rate_limit tests still pass.

sameh-farouk force-pushed feat/batched-unread-tiered-rate-limits from b37871f8fe to 0549a56457

2026-04-26 17:46:19 +00:00

Compare