First-class presence (eliminate two-producer race) (B2 — WS refactor follow-up) #17

Closed
opened 2026-04-23 19:40:56 +00:00 by sameh-farouk · 1 comment
Member

Follow-up to #13. Closes gap B2 from the post-refactor architectural assessment.

Problem: the refactor's spec documented an accepted race in presence handling — two concurrent producers (handle_user_ws's first-tab/last-tab hooks AND client-invoked presence.update RPC) can emit events out of order. Recipients see offline after online flicker for up to ~60s until the next client heartbeat reconciles.

Solution: server is the sole producer of the online/offline bit. Split into two narrow RPCs:

  • presence.set_status_text — client sets status text ("In a meeting"); cannot flip online/offline
  • presence.mark_connection — server-authoritative, called only by handle_user_ws's lifecycle hooks

Keep presence.update for one release as a deprecated alias routing to set_status_text (ignores any status field). Client-side presence-flipping RPCs removed.

Docs:

Size: ~120 lines production + ~150 lines tests. 3 tasks.

Branch: lands on feat/ws-refactor. Must run BEFORE #A2 heartbeat and #B1 session resume — both extend handle_user_ws and reference the new presence.mark_connection call site.

Follow-up to #13. Closes gap B2 from the post-refactor architectural assessment. **Problem:** the refactor's spec documented an accepted race in presence handling — two concurrent producers (`handle_user_ws`'s first-tab/last-tab hooks AND client-invoked `presence.update` RPC) can emit events out of order. Recipients see `offline` after `online` flicker for up to ~60s until the next client heartbeat reconciles. **Solution:** server is the sole producer of the `online`/`offline` bit. Split into two narrow RPCs: - `presence.set_status_text` — client sets status text ("In a meeting"); cannot flip online/offline - `presence.mark_connection` — server-authoritative, called only by `handle_user_ws`'s lifecycle hooks Keep `presence.update` for one release as a deprecated alias routing to `set_status_text` (ignores any `status` field). Client-side presence-flipping RPCs removed. **Docs:** - Spec: [`plan/feature-first-class-presence.md`](../src/branch/feat/ws-refactor/plan/feature-first-class-presence.md) - Impl plan: [`plan/impl-first-class-presence.md`](../src/branch/feat/ws-refactor/plan/impl-first-class-presence.md) **Size:** ~120 lines production + ~150 lines tests. 3 tasks. **Branch:** lands on `feat/ws-refactor`. Must run BEFORE #A2 heartbeat and #B1 session resume — both extend `handle_user_ws` and reference the new `presence.mark_connection` call site.
Author
Member

Implementation landed

Three commits on feat/ws-refactor:

  • 505d7ef feat(server): first-class presence — server-authoritative online/offline
    Splits handlers::presence::update into three handlers:

    • set_status_text(caller_id, user_id, status_text) — client-facing. Persists users.data.status_text + last_seen. Never writes users.data.status (verified by grep: obj["status"] = ... only appears in mark_connection).
    • mark_connection(caller_id, user_id, online: bool) — server-authoritative WS-lifecycle path. Only way to flip the online bit.
    • update(...) — deprecated alias routing to set_status_text. Emits tracing::warn!; explicitly ignores any status field. Kept for one release to tolerate stray callers during cutover.

    list unchanged. 4 new integration tests + P7.12 test migrated to use mark_connection.

  • ab29528 docs(openrpc): register presence.set_status_text + presence.mark_connection
    Schema entries for both new methods; presence.update description updated with DEPRECATED note. SDK proc-macro regenerates typed PresenceSetStatusTextInput / PresenceMarkConnectionInput structs at build time.

  • 15bd2e7 refactor(ui): presence lifecycle via presence.mark_connection (server-authoritative)
    Both handle_user_ws first-tab/last-tab RPC call sites swapped from presence.update to presence.mark_connection. Deleted client-side 60s presence heartbeat (setInterval firing presence.update with status: 'online') and beforeunload offline beacon (sendBeacon firing status: 'offline'). Both were client-side workarounds for the race that B2 eliminates structurally.

Architectural notes

  • The beforeunload beacon removal is a net improvement, not just simplification. beforeunload is unreliable on mobile Safari and background-killed tabs — offline signal could be lost. The server's last-tab drop(_tx) + receiver_count == 0 check operates on TCP close, which IS reliable. B2 trades an unreliable signal for a reliable one.
  • Session_text length cap = 100 chars matches Slack's status text limit. Validation returns RpcError::Validation (code -32602).
  • Cross-user auth: mark_connection requires caller_id == user_id in proxy-auth mode via resolve_self_user_id. Dev mode allows mismatch (matches existing dev-mode semantics elsewhere). Malicious self-mark is harmless — flipping your own presence is self-DoS at worst.
  • Deprecated alias strategy: kept for one release so any cached client JS bundle or third-party script calling presence.update with status:"online" degrades gracefully (status_text still writes; status bit untouched). Remove in a future chore commit once log-grep shows no callers.

Tests impact

cargo test -p hero_collab_server: integration 68 (+4 new presence tests, P7.12 migrated).

Minor debt noted

Two orphaned state fields on chat-app.js (state._heartbeatInterval, state._unloadRegistered) are now unreferenced after the client-side workaround deletion. Noise-level; not worth a dedicated commit.

## Implementation landed Three commits on `feat/ws-refactor`: - [`505d7ef`](../commit/505d7ef) `feat(server): first-class presence — server-authoritative online/offline` Splits `handlers::presence::update` into three handlers: - `set_status_text(caller_id, user_id, status_text)` — client-facing. Persists `users.data.status_text` + `last_seen`. **Never writes `users.data.status`** (verified by grep: `obj["status"] = ...` only appears in `mark_connection`). - `mark_connection(caller_id, user_id, online: bool)` — server-authoritative WS-lifecycle path. Only way to flip the online bit. - `update(...)` — deprecated alias routing to `set_status_text`. Emits `tracing::warn!`; explicitly ignores any `status` field. Kept for one release to tolerate stray callers during cutover. `list` unchanged. 4 new integration tests + P7.12 test migrated to use `mark_connection`. - [`ab29528`](../commit/ab29528) `docs(openrpc): register presence.set_status_text + presence.mark_connection` Schema entries for both new methods; `presence.update` description updated with DEPRECATED note. SDK proc-macro regenerates typed `PresenceSetStatusTextInput` / `PresenceMarkConnectionInput` structs at build time. - [`15bd2e7`](../commit/15bd2e7) `refactor(ui): presence lifecycle via presence.mark_connection (server-authoritative)` Both `handle_user_ws` first-tab/last-tab RPC call sites swapped from `presence.update` to `presence.mark_connection`. **Deleted client-side 60s presence heartbeat** (`setInterval` firing `presence.update` with `status: 'online'`) and **`beforeunload` offline beacon** (`sendBeacon` firing `status: 'offline'`). Both were client-side workarounds for the race that B2 eliminates structurally. ## Architectural notes - **The `beforeunload` beacon removal is a net improvement,** not just simplification. `beforeunload` is unreliable on mobile Safari and background-killed tabs — offline signal could be lost. The server's last-tab `drop(_tx)` + `receiver_count == 0` check operates on TCP close, which IS reliable. B2 trades an unreliable signal for a reliable one. - **Session_text length cap = 100 chars** matches Slack's status text limit. Validation returns `RpcError::Validation` (code `-32602`). - **Cross-user auth:** `mark_connection` requires `caller_id == user_id` in proxy-auth mode via `resolve_self_user_id`. Dev mode allows mismatch (matches existing dev-mode semantics elsewhere). Malicious self-mark is harmless — flipping your own presence is self-DoS at worst. - **Deprecated alias strategy:** kept for one release so any cached client JS bundle or third-party script calling `presence.update` with `status:"online"` degrades gracefully (status_text still writes; status bit untouched). Remove in a future chore commit once log-grep shows no callers. ## Tests impact `cargo test -p hero_collab_server`: integration 68 (+4 new presence tests, P7.12 migrated). ## Minor debt noted Two orphaned state fields on chat-app.js (`state._heartbeatInterval`, `state._unloadRegistered`) are now unreferenced after the client-side workaround deletion. Noise-level; not worth a dedicated commit.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_collab#17
No description provided.