perf(read+rate_limit): batched read.unread_counts + tiered rate limits #25

Merged
sameh-farouk merged 1 commit from feat/batched-unread-tiered-rate-limits into development 2026-04-26 17:46:36 +00:00
Member

Closes #24.

Summary

Two reinforcing fixes for the page-load RPC burst that self-throttles in proxy mode:

  1. B1 (data layer): New batched read.unread_counts(workspace_id, user_id) endpoint replaces the client's Promise.all(N) of read.get + message.list per channel — 2N RPCs collapse to 1.
  2. C2 (rate limits): Tier the per-caller rate limit by method class. Reads, writes, sends, and typing each charge their own bucket. New default: (600, 60, 60, 60) — reads 10× the old 60/min flat budget; write/send/typing unchanged.

These are reinforcing — B1 alone leaves a tight budget waiting to bite the next per-channel feature; C2 alone leaves a wasteful pattern hidden behind a higher budget. Together: fan-out is gone AND the remaining traffic is correctly tiered.

Changes

File Change
crates/hero_collab_server/src/handlers/read.rs + unread_counts(workspace_id, user_id) handler with a single SQL LEFT JOIN ... GROUP BY over channel_members / channels / read_cursors / messages. Returns {channel_id: count}. Filters via json_extract for deleted_at / archived_at (matches existing message::list / channel::list_workspace patterns).
crates/hero_collab_server/src/rpc.rs + "read.unread_counts" => handlers::read::unread_counts dispatch line
crates/hero_collab_server/src/rate_limit.rs + MethodClass enum, + classify_method(method) function, + read bucket on PerCallerLimits, + read_per_min field on RateLimiter, + RateLimiter::new_tiered(read, global, send, typing) constructor, + charge_read / charge_typing helpers. check_method refactored to classify-then-charge. Backward-compat new(global, send) and new_with_typing(global, send, typing) preserved as thin wrappers (read=global).
crates/hero_collab_server/src/main.rs Switch RateLimiter::new(60, 10)new_tiered(600, 60, 60, 60).
crates/hero_collab_ui/static/js/chat-app.js updateUnreadCounts body replaced with a single await rpc('read.unread_counts', {workspace_id, user_id}) call. Coerces stringified channel_id keys back to numbers (so the existing badge-render state.unreadCounts[ch.id] lookup keeps working). Failure path preserves last good badges (degraded display > clearing).

Net: +228 lines, -38 lines.

Method classification (suffix heuristic)

fn classify_method(method: &str) -> MethodClass {
    if method == "typing.relay"  { return MethodClass::Typing; }
    if method == "message.send"  { return MethodClass::MessageSend; }
    let verb = method.rsplit('.').next().unwrap_or("");
    match verb {
        "list" | "get" | "info" | "available" | "available_users" | "resolve"
        | "members" | "member_list" | "search" | "find" | "metrics" | "discover"
        | "health" | "unread_counts" | "play" | "logs" | "stats"
        | "openrpc" | "schema" => MethodClass::Read,
        _ => MethodClass::Write,
    }
}

Read and Typing isolated (own bucket only; depleting one doesn't lock out others). Write charges global. MessageSend charges both global and send atomically. Unknown verbs default to Write — conservative under-classification (better to over-throttle than to silently bypass writing limits for a new method).

Bucket sizing rationale

Bucket Rate Reasoning
read 600/min (10/s sustained, 600 burst) Covers page-load fan-out (15-30 RPCs/burst) even after several reloads/min
global 60/min (1/s sustained, 60 burst) Write budget for non-message mutations. 1/s matches typical UI mutation cadence
send 60/min message.send only; charged in addition to global so a chat storm doesn't blow writes
typing 60/min Debounced typing relay events; client emits ~2 per typing burst

Behaviour matrix

Scenario Today After this PR
Page load, 30-channel workspace ~60 RPCs from updateUnreadCounts alone 1 batched read.unread_counts
5 hard-refreshes in a minute (proxy mode) bucket exhausted, "Rate limited" warnings clean — all RPCs succeed
Sustained channel-switching (many read.mark / message.list) charges global bucket; competes with writes charges read bucket; independent of writes
Burst 100 message.send charges global+send (both enforce) unchanged
Typing during heavy reads could deplete global typing in own bucket, unaffected
Dev mode (unauthenticated, no caller_id) rate limiting bypassed unchanged
Old client against new server n/a unchanged for fan-out (old client doesn't call new endpoint)
New client against old server n/a read.unread_counts returns method-not-found; client catch logs warning + preserves last good badges (degraded, not broken)

Test plan

  • cargo build --release -p hero_collab_server clean
  • cargo test --release -p hero_collab_server --bin hero_collab_server rate_limit:: — all 11 existing tests pass under the new tiered design
  • Smoke test post-deploy: read.unread_counts returns expected {channel_id: count} shape; counts match a manual SQL CROSS-CHECK
  • 100 sequential workspace.list calls (Read class) all succeed under the new 600/min read bucket (would have been 60 ok / 40 limited under old config)
  • Browser smoke: 5 hard-refreshes in 60s → no rate-limit warnings in console
  • Reviewer to validate: read-bucket sizing — 600/min vs. some other number? Suggested 600 covers expected page-load fan-out with comfortable headroom; happy to adjust.

Notes

  • Server-only schema unchanged. The new endpoint reads from existing tables (channel_members, channels, read_cursors, messages) using indexes that already exist (idx_messages_channel(channel_id, created_at), read_cursors PK).
  • Backward-compat preserved. Old clients that still call read.get + message.list per channel work; they just don't benefit from the fan-out reduction. Old code paths that use RateLimiter::new(global, send) work via the equates-read=global wrapper — no test changes needed.
  • Doesn't change auth modes, identity wiring, dev-mode behavior, or any RPC method signatures other than adding the one new endpoint. Pure data-layer + rate-limit policy change.

Follow-ups (low priority, separate)

  1. Add replies and ping to the Read verb whitelist in classify_method — minor over-throttle today (e.g. thread.replies is a read but classified as Write).
  2. WebSocket-pushed unread updates (UserEvent::UnreadUpdate emitted on message.create / read.mark) would eliminate polling entirely. Scope much bigger; this PR is the minimal correct first step.
Closes #24. ## Summary Two reinforcing fixes for the page-load RPC burst that self-throttles in proxy mode: 1. **B1 (data layer)**: New batched `read.unread_counts(workspace_id, user_id)` endpoint replaces the client's `Promise.all(N)` of `read.get` + `message.list` per channel — `2N` RPCs collapse to `1`. 2. **C2 (rate limits)**: Tier the per-caller rate limit by method class. Reads, writes, sends, and typing each charge their own bucket. New default: `(600, 60, 60, 60)` — reads 10× the old `60/min` flat budget; write/send/typing unchanged. These are reinforcing — B1 alone leaves a tight budget waiting to bite the next per-channel feature; C2 alone leaves a wasteful pattern hidden behind a higher budget. Together: fan-out is gone AND the remaining traffic is correctly tiered. ## Changes | File | Change | |---|---| | `crates/hero_collab_server/src/handlers/read.rs` | + `unread_counts(workspace_id, user_id)` handler with a single SQL `LEFT JOIN ... GROUP BY` over `channel_members` / `channels` / `read_cursors` / `messages`. Returns `{channel_id: count}`. Filters via `json_extract` for `deleted_at` / `archived_at` (matches existing `message::list` / `channel::list_workspace` patterns). | | `crates/hero_collab_server/src/rpc.rs` | + `"read.unread_counts" => handlers::read::unread_counts` dispatch line | | `crates/hero_collab_server/src/rate_limit.rs` | + `MethodClass` enum, + `classify_method(method)` function, + `read` bucket on `PerCallerLimits`, + `read_per_min` field on `RateLimiter`, + `RateLimiter::new_tiered(read, global, send, typing)` constructor, + `charge_read` / `charge_typing` helpers. `check_method` refactored to classify-then-charge. Backward-compat `new(global, send)` and `new_with_typing(global, send, typing)` preserved as thin wrappers (read=global). | | `crates/hero_collab_server/src/main.rs` | Switch `RateLimiter::new(60, 10)` → `new_tiered(600, 60, 60, 60)`. | | `crates/hero_collab_ui/static/js/chat-app.js` | `updateUnreadCounts` body replaced with a single `await rpc('read.unread_counts', {workspace_id, user_id})` call. Coerces stringified channel_id keys back to numbers (so the existing badge-render `state.unreadCounts[ch.id]` lookup keeps working). Failure path preserves last good badges (degraded display > clearing). | Net: +228 lines, -38 lines. ## Method classification (suffix heuristic) ```rust fn classify_method(method: &str) -> MethodClass { if method == "typing.relay" { return MethodClass::Typing; } if method == "message.send" { return MethodClass::MessageSend; } let verb = method.rsplit('.').next().unwrap_or(""); match verb { "list" | "get" | "info" | "available" | "available_users" | "resolve" | "members" | "member_list" | "search" | "find" | "metrics" | "discover" | "health" | "unread_counts" | "play" | "logs" | "stats" | "openrpc" | "schema" => MethodClass::Read, _ => MethodClass::Write, } } ``` Read and Typing isolated (own bucket only; depleting one doesn't lock out others). Write charges `global`. MessageSend charges both `global` and `send` atomically. Unknown verbs default to `Write` — conservative under-classification (better to over-throttle than to silently bypass writing limits for a new method). ## Bucket sizing rationale | Bucket | Rate | Reasoning | |---|---|---| | `read` | 600/min (10/s sustained, 600 burst) | Covers page-load fan-out (15-30 RPCs/burst) even after several reloads/min | | `global` | 60/min (1/s sustained, 60 burst) | Write budget for non-message mutations. 1/s matches typical UI mutation cadence | | `send` | 60/min | `message.send` only; charged in addition to `global` so a chat storm doesn't blow writes | | `typing` | 60/min | Debounced typing relay events; client emits ~2 per typing burst | ## Behaviour matrix | Scenario | Today | After this PR | |---|---|---| | Page load, 30-channel workspace | ~60 RPCs from `updateUnreadCounts` alone | 1 batched `read.unread_counts` | | 5 hard-refreshes in a minute (proxy mode) | bucket exhausted, "Rate limited" warnings | clean — all RPCs succeed | | Sustained channel-switching (many `read.mark` / `message.list`) | charges global bucket; competes with writes | charges `read` bucket; independent of writes | | Burst 100 `message.send` | charges global+send (both enforce) | unchanged | | Typing during heavy reads | could deplete global | typing in own bucket, unaffected | | Dev mode (unauthenticated, no caller_id) | rate limiting bypassed | unchanged | | Old client against new server | n/a | unchanged for fan-out (old client doesn't call new endpoint) | | New client against old server | n/a | `read.unread_counts` returns method-not-found; client catch logs warning + preserves last good badges (degraded, not broken) | ## Test plan - [x] `cargo build --release -p hero_collab_server` clean - [x] `cargo test --release -p hero_collab_server --bin hero_collab_server rate_limit::` — all 11 existing tests pass under the new tiered design - [x] Smoke test post-deploy: `read.unread_counts` returns expected `{channel_id: count}` shape; counts match a manual SQL CROSS-CHECK - [x] 100 sequential `workspace.list` calls (Read class) all succeed under the new 600/min read bucket (would have been 60 ok / 40 limited under old config) - [x] Browser smoke: 5 hard-refreshes in 60s → no rate-limit warnings in console - [ ] Reviewer to validate: read-bucket sizing — 600/min vs. some other number? Suggested 600 covers expected page-load fan-out with comfortable headroom; happy to adjust. ## Notes - **Server-only schema unchanged.** The new endpoint reads from existing tables (channel_members, channels, read_cursors, messages) using indexes that already exist (`idx_messages_channel(channel_id, created_at)`, `read_cursors` PK). - **Backward-compat preserved.** Old clients that still call `read.get` + `message.list` per channel work; they just don't benefit from the fan-out reduction. Old code paths that use `RateLimiter::new(global, send)` work via the equates-read=global wrapper — no test changes needed. - **Doesn't change** auth modes, identity wiring, dev-mode behavior, or any RPC method signatures other than adding the one new endpoint. Pure data-layer + rate-limit policy change. ## Follow-ups (low priority, separate) 1. Add `replies` and `ping` to the Read verb whitelist in `classify_method` — minor over-throttle today (e.g. `thread.replies` is a read but classified as Write). 2. WebSocket-pushed unread updates (`UserEvent::UnreadUpdate` emitted on message.create / read.mark) would eliminate polling entirely. Scope much bigger; this PR is the minimal correct first step.
Two reinforcing changes for the page-load RPC burst.

(B1) Batched unread counts:
`chat-app.js::updateUnreadCounts` previously fired `read.get` plus
`message.list` per channel in parallel — `2N` RPCs that scaled
linearly with channel count and routinely blew the global rate-limit
budget on any user with more than ~15 channels. New
`read.unread_counts(workspace_id, user_id) -> {channel_id: count}`
handler returns all counts for the caller in one call.

  * Server: single `LEFT JOIN ... GROUP BY` over channel_members,
    channels, read_cursors, messages. Filters via json_extract for
    `deleted_at`/`archived_at` (both live inside the `data` JSON,
    not as columns — matches `message::list` and
    `channel::list_workspace`). Excludes thread replies. Hits
    existing indexes (idx_messages_channel + read_cursors PK).
  * Client: replaces the `Promise.all(N)` with one `await rpc`.
    Coerces stringified channel_id keys back to numbers so the
    badge-render lookup keeps working. Failure path preserves the
    last good badges instead of clearing — degraded display is
    better than blank.

(C2) Tiered rate limits:
The previous default `RateLimiter::new(60, 10)` lumped every
non-typing method into a single 60/min global bucket — UI reads
(`*.list`, `*.get`, mention.list, huddle.list, ...) competing with
state mutations for the same budget. A normal page load fires 15-30
reads in a 2s burst; after 2-3 hard-refreshes the bucket was empty.
Dev mode hid this by skipping rate-limiting for unauthenticated
calls.

  * New `MethodClass` enum (`Read`/`Write`/`MessageSend`/`Typing`)
    + `classify_method(method)` — suffix heuristic on the trailing
    verb (`list`/`get`/`info`/etc. → Read, default → Write) plus
    two specific overrides (`typing.relay`, `message.send`).
  * `PerCallerLimits` gains a `read` bucket. New `read_per_min`
    field on `RateLimiter`.
  * New `RateLimiter::new_tiered(read, global, send, typing)`
    constructor. Backward-compat `new(global, send)` preserved
    (equates read=global) so existing 11 unit tests pass unchanged.
  * `check_method` dispatches via `classify_method` to charge the
    right bucket. Read and Typing isolated (own bucket only); Write
    charges global; MessageSend charges global+send.
  * Production: `new_tiered(600, 60, 60, 60)` — reads 10x the old
    budget, writes/sends/typing unchanged.

Together these eliminate the 2N fan-out AND make the remaining
traffic correctly tiered. With B1 alone, a future per-channel call
would hit the same wall. With C2 alone, the wasteful pattern stays
masked. Both are needed.

All 11 rate_limit tests still pass.
sameh-farouk force-pushed feat/batched-unread-tiered-rate-limits from b37871f8fe to 0549a56457 2026-04-26 17:46:19 +00:00 Compare
sameh-farouk merged commit c4f7e03da9 into development 2026-04-26 17:46:36 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_collab!25
No description provided.