perf(read+rate_limit): batched read.unread_counts + tiered rate limits #25
No reviewers
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_collab!25
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/batched-unread-tiered-rate-limits"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #24.
Summary
Two reinforcing fixes for the page-load RPC burst that self-throttles in proxy mode:
read.unread_counts(workspace_id, user_id)endpoint replaces the client'sPromise.all(N)ofread.get+message.listper channel —2NRPCs collapse to1.(600, 60, 60, 60)— reads 10× the old60/minflat budget; write/send/typing unchanged.These are reinforcing — B1 alone leaves a tight budget waiting to bite the next per-channel feature; C2 alone leaves a wasteful pattern hidden behind a higher budget. Together: fan-out is gone AND the remaining traffic is correctly tiered.
Changes
crates/hero_collab_server/src/handlers/read.rsunread_counts(workspace_id, user_id)handler with a single SQLLEFT JOIN ... GROUP BYoverchannel_members/channels/read_cursors/messages. Returns{channel_id: count}. Filters viajson_extractfordeleted_at/archived_at(matches existingmessage::list/channel::list_workspacepatterns).crates/hero_collab_server/src/rpc.rs"read.unread_counts" => handlers::read::unread_countsdispatch linecrates/hero_collab_server/src/rate_limit.rsMethodClassenum, +classify_method(method)function, +readbucket onPerCallerLimits, +read_per_minfield onRateLimiter, +RateLimiter::new_tiered(read, global, send, typing)constructor, +charge_read/charge_typinghelpers.check_methodrefactored to classify-then-charge. Backward-compatnew(global, send)andnew_with_typing(global, send, typing)preserved as thin wrappers (read=global).crates/hero_collab_server/src/main.rsRateLimiter::new(60, 10)→new_tiered(600, 60, 60, 60).crates/hero_collab_ui/static/js/chat-app.jsupdateUnreadCountsbody replaced with a singleawait rpc('read.unread_counts', {workspace_id, user_id})call. Coerces stringified channel_id keys back to numbers (so the existing badge-renderstate.unreadCounts[ch.id]lookup keeps working). Failure path preserves last good badges (degraded display > clearing).Net: +228 lines, -38 lines.
Method classification (suffix heuristic)
Read and Typing isolated (own bucket only; depleting one doesn't lock out others). Write charges
global. MessageSend charges bothglobalandsendatomically. Unknown verbs default toWrite— conservative under-classification (better to over-throttle than to silently bypass writing limits for a new method).Bucket sizing rationale
readglobalsendmessage.sendonly; charged in addition toglobalso a chat storm doesn't blow writestypingBehaviour matrix
updateUnreadCountsaloneread.unread_countsread.mark/message.list)readbucket; independent of writesmessage.sendread.unread_countsreturns method-not-found; client catch logs warning + preserves last good badges (degraded, not broken)Test plan
cargo build --release -p hero_collab_servercleancargo test --release -p hero_collab_server --bin hero_collab_server rate_limit::— all 11 existing tests pass under the new tiered designread.unread_countsreturns expected{channel_id: count}shape; counts match a manual SQL CROSS-CHECKworkspace.listcalls (Read class) all succeed under the new 600/min read bucket (would have been 60 ok / 40 limited under old config)Notes
idx_messages_channel(channel_id, created_at),read_cursorsPK).read.get+message.listper channel work; they just don't benefit from the fan-out reduction. Old code paths that useRateLimiter::new(global, send)work via the equates-read=global wrapper — no test changes needed.Follow-ups (low priority, separate)
repliesandpingto the Read verb whitelist inclassify_method— minor over-throttle today (e.g.thread.repliesis a read but classified as Write).UserEvent::UnreadUpdateemitted on message.create / read.mark) would eliminate polling entirely. Scope much bigger; this PR is the minimal correct first step.Two reinforcing changes for the page-load RPC burst. (B1) Batched unread counts: `chat-app.js::updateUnreadCounts` previously fired `read.get` plus `message.list` per channel in parallel — `2N` RPCs that scaled linearly with channel count and routinely blew the global rate-limit budget on any user with more than ~15 channels. New `read.unread_counts(workspace_id, user_id) -> {channel_id: count}` handler returns all counts for the caller in one call. * Server: single `LEFT JOIN ... GROUP BY` over channel_members, channels, read_cursors, messages. Filters via json_extract for `deleted_at`/`archived_at` (both live inside the `data` JSON, not as columns — matches `message::list` and `channel::list_workspace`). Excludes thread replies. Hits existing indexes (idx_messages_channel + read_cursors PK). * Client: replaces the `Promise.all(N)` with one `await rpc`. Coerces stringified channel_id keys back to numbers so the badge-render lookup keeps working. Failure path preserves the last good badges instead of clearing — degraded display is better than blank. (C2) Tiered rate limits: The previous default `RateLimiter::new(60, 10)` lumped every non-typing method into a single 60/min global bucket — UI reads (`*.list`, `*.get`, mention.list, huddle.list, ...) competing with state mutations for the same budget. A normal page load fires 15-30 reads in a 2s burst; after 2-3 hard-refreshes the bucket was empty. Dev mode hid this by skipping rate-limiting for unauthenticated calls. * New `MethodClass` enum (`Read`/`Write`/`MessageSend`/`Typing`) + `classify_method(method)` — suffix heuristic on the trailing verb (`list`/`get`/`info`/etc. → Read, default → Write) plus two specific overrides (`typing.relay`, `message.send`). * `PerCallerLimits` gains a `read` bucket. New `read_per_min` field on `RateLimiter`. * New `RateLimiter::new_tiered(read, global, send, typing)` constructor. Backward-compat `new(global, send)` preserved (equates read=global) so existing 11 unit tests pass unchanged. * `check_method` dispatches via `classify_method` to charge the right bucket. Read and Typing isolated (own bucket only); Write charges global; MessageSend charges global+send. * Production: `new_tiered(600, 60, 60, 60)` — reads 10x the old budget, writes/sends/typing unchanged. Together these eliminate the 2N fan-out AND make the remaining traffic correctly tiered. With B1 alone, a future per-channel call would hit the same wall. With C2 alone, the wasteful pattern stays masked. Both are needed. All 11 rate_limit tests still pass.b37871f8feto0549a56457