AI Assistant: Progressive SSE Streaming (word-by-word response rendering) #32

Closed
opened 2026-03-18 01:50:28 +00:00 by mik-tf · 3 comments
Owner

Current State

The AI assistant (Shrimp) returns responses via Server-Sent Events (SSE). The current implementation reads the stream incrementally and waits for the event: done event before displaying the response.

What works now:

  • Stream is read chunk-by-chunk (not blocked on full body)
  • Response appears as soon as the LLM finishes (when done event arrives)
  • Abort/cancel works via AbortSignal
  • No more infinite spin on slow responses

What's missing:

  • Responses appear all at once after the LLM finishes thinking
  • No visual feedback during generation (just a spinner)
  • Multi-step agent tasks show nothing until all steps complete

The Enhancement

Show the AI response progressively as it's generated, word by word — like ChatGPT, Claude web, etc.

Shrimp already sends intermediate SSE events during generation:

  • event: token — partial content as the LLM generates tokens
  • event: tool_call — when the agent uses a tool
  • event: tool_result — tool execution result
  • event: done — final complete response

We currently ignore token events and only process done. Progressive streaming would render token events in real-time.

Implementation

1. Service layer (ai_service.rs)

Change send_message to accept a callback for streaming updates:

pub async fn send_message_streaming(
    shrimp_url: &str,
    user_message: &str,
    conversation_id: Option<String>,
    on_token: impl Fn(&str),  // Called for each token chunk
    abort_signal: Option<web_sys::AbortSignal>,
) -> Result<String, String>

Or return a Stream that yields partial updates.

2. UI component (island.rs / chat view)

Update the message state model:

  • Current: PendingComplete
  • New: PendingStreaming(partial_content)Complete(full_content)

The chat bubble renders Streaming state with a blinking cursor and growing text.

3. Token event parsing

Process event: token in the SSE reader loop:

if collected_since_last.contains("event: token\n") {
    // Extract data, call on_token callback
    // UI updates the streaming message bubble
}

4. Agent tool use visualization (stretch goal)

When Shrimp uses tools (web search, file operations, etc.), show status:

  • "Searching the web..."
  • "Reading documentation..."
  • "Executing code..."

This requires parsing event: tool_call and event: tool_result events.

Files

File Change
hero_archipelagos/archipelagos/intelligence/ai/src/services/ai_service.rs Streaming API + token event parsing
hero_archipelagos/archipelagos/intelligence/ai/src/island.rs Streaming message state + UI rendering
hero_archipelagos/archipelagos/intelligence/ai/src/views/ Chat bubble streaming animation

Priority

Medium — the current fix prevents infinite spin. This enhancement improves UX but is not blocking.

## Current State The AI assistant (Shrimp) returns responses via Server-Sent Events (SSE). The current implementation reads the stream incrementally and waits for the `event: done` event before displaying the response. **What works now:** - Stream is read chunk-by-chunk (not blocked on full body) - Response appears as soon as the LLM finishes (when `done` event arrives) - Abort/cancel works via AbortSignal - No more infinite spin on slow responses **What's missing:** - Responses appear all at once after the LLM finishes thinking - No visual feedback during generation (just a spinner) - Multi-step agent tasks show nothing until all steps complete ## The Enhancement Show the AI response **progressively as it's generated**, word by word — like ChatGPT, Claude web, etc. Shrimp already sends intermediate SSE events during generation: - `event: token` — partial content as the LLM generates tokens - `event: tool_call` — when the agent uses a tool - `event: tool_result` — tool execution result - `event: done` — final complete response We currently ignore `token` events and only process `done`. Progressive streaming would render `token` events in real-time. ## Implementation ### 1. Service layer (`ai_service.rs`) Change `send_message` to accept a callback for streaming updates: ```rust pub async fn send_message_streaming( shrimp_url: &str, user_message: &str, conversation_id: Option<String>, on_token: impl Fn(&str), // Called for each token chunk abort_signal: Option<web_sys::AbortSignal>, ) -> Result<String, String> ``` Or return a `Stream` that yields partial updates. ### 2. UI component (`island.rs` / chat view) Update the message state model: - Current: `Pending` → `Complete` - New: `Pending` → `Streaming(partial_content)` → `Complete(full_content)` The chat bubble renders `Streaming` state with a blinking cursor and growing text. ### 3. Token event parsing Process `event: token` in the SSE reader loop: ```rust if collected_since_last.contains("event: token\n") { // Extract data, call on_token callback // UI updates the streaming message bubble } ``` ### 4. Agent tool use visualization (stretch goal) When Shrimp uses tools (web search, file operations, etc.), show status: - "Searching the web..." - "Reading documentation..." - "Executing code..." This requires parsing `event: tool_call` and `event: tool_result` events. ## Files | File | Change | |------|--------| | `hero_archipelagos/archipelagos/intelligence/ai/src/services/ai_service.rs` | Streaming API + token event parsing | | `hero_archipelagos/archipelagos/intelligence/ai/src/island.rs` | Streaming message state + UI rendering | | `hero_archipelagos/archipelagos/intelligence/ai/src/views/` | Chat bubble streaming animation | ## Priority Medium — the current fix prevents infinite spin. This enhancement improves UX but is not blocking.
Author
Owner

Implemented and deployed to herodev

Changes (4 files in hero_archipelagos/archipelagos/intelligence/ai/src/):

  1. ai_service.rs: Added process_sse_chunk() to parse event: token SSE events incrementally. Added on_token callback parameter to send_message() (WASM only).

  2. island.rs: Creates a placeholder AI message before the request. Token callback updates the message content in real-time as tokens arrive. Error case removes the placeholder. Added @keyframes blink CSS animation.

  3. message_list.rs: Typing dots (bounce animation) only show when no streaming content has arrived yet. Once first token comes in, the growing message bubble replaces the dots. Passes is_streaming to last AI message bubble.

  4. message_bubble.rs: Added is_streaming prop. Shows a blinking cursor after content during streaming.

Architecture:

Shrimp SSE: event:token → data:{"token":"word"}
    ↓
WASM ReadableStream reader (chunk by chunk)
    ↓
process_sse_chunk() → on_token callback
    ↓
messages signal write → Dioxus re-render
    ↓
MessageBubble grows word-by-word + blinking cursor
    ↓
event:done → replace with final complete response

Verify: Open herodev AI chat, send a message, response should appear word-by-word.

Commit: hero_archipelagos@7571357 on development

## Implemented and deployed to herodev **Changes** (4 files in `hero_archipelagos/archipelagos/intelligence/ai/src/`): 1. **ai_service.rs**: Added `process_sse_chunk()` to parse `event: token` SSE events incrementally. Added `on_token` callback parameter to `send_message()` (WASM only). 2. **island.rs**: Creates a placeholder AI message before the request. Token callback updates the message content in real-time as tokens arrive. Error case removes the placeholder. Added `@keyframes blink` CSS animation. 3. **message_list.rs**: Typing dots (bounce animation) only show when no streaming content has arrived yet. Once first token comes in, the growing message bubble replaces the dots. Passes `is_streaming` to last AI message bubble. 4. **message_bubble.rs**: Added `is_streaming` prop. Shows a blinking cursor after content during streaming. **Architecture**: ``` Shrimp SSE: event:token → data:{"token":"word"} ↓ WASM ReadableStream reader (chunk by chunk) ↓ process_sse_chunk() → on_token callback ↓ messages signal write → Dioxus re-render ↓ MessageBubble grows word-by-word + blinking cursor ↓ event:done → replace with final complete response ``` **Verify**: Open herodev AI chat, send a message, response should appear word-by-word. Commit: hero_archipelagos@7571357 on `development`
Author
Owner

Reopening — backend streaming not yet implemented

Investigated the actual SSE events from Shrimp:

event: partial
data: {"text":"Execution contract: ..."}

event: done
data: {"response":"Hello, my friend!"}

Shrimp sends only one partial event (execution plan) then one done event (full response). There are no per-token events. The LLM call in Shrimp is non-streaming — it waits for the complete response.

Frontend work done (hero_archipelagos@bbd1f75):

  • Handles event: partial and event: token formats
  • Shows streaming content with blinking cursor
  • Placeholder message pattern ready

Backend work needed (hero_shrimp):

  • Shrimp's llm_client.ts must use streaming LLM API calls (stream: true)
  • Forward each token as event: token / data: {"token": "word"} SSE events
  • Keep event: done as final response

This is a Shrimp-side change in src/core/llm_client.ts and src/core/agent.ts.

### Reopening — backend streaming not yet implemented Investigated the actual SSE events from Shrimp: ``` event: partial data: {"text":"Execution contract: ..."} event: done data: {"response":"Hello, my friend!"} ``` Shrimp sends only **one `partial` event** (execution plan) then **one `done` event** (full response). There are **no per-token events**. The LLM call in Shrimp is non-streaming — it waits for the complete response. **Frontend work done** (hero_archipelagos@bbd1f75): - Handles `event: partial` and `event: token` formats - Shows streaming content with blinking cursor - Placeholder message pattern ready **Backend work needed** (hero_shrimp): - Shrimp's `llm_client.ts` must use streaming LLM API calls (`stream: true`) - Forward each token as `event: token` / `data: {"token": "word"}` SSE events - Keep `event: done` as final response This is a Shrimp-side change in `src/core/llm_client.ts` and `src/core/agent.ts`.
mik-tf reopened this issue 2026-03-18 23:28:28 +00:00
Author
Owner

SSE streaming working with hero_agent

hero_agent (which replaced hero_shrimp in #72) has native SSE streaming:

  • POST /api/chat returns SSE stream with events: token, tool_call, tool_result, done, error
  • Tokens are streamed word-by-word as they arrive from the LLM
  • The AI island in hero_archipelagos consumes the SSE stream and renders progressively
  • Confirmed working on herodev with Claude Sonnet

This is functionally complete — the AI Assistant tab shows responses word-by-word.

Signed-off-by: mik-tf

## SSE streaming working with hero_agent hero_agent (which replaced hero_shrimp in #72) has native SSE streaming: - `POST /api/chat` returns SSE stream with events: `token`, `tool_call`, `tool_result`, `done`, `error` - Tokens are streamed word-by-word as they arrive from the LLM - The AI island in hero_archipelagos consumes the SSE stream and renders progressively - Confirmed working on herodev with Claude Sonnet This is functionally complete — the AI Assistant tab shows responses word-by-word. Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#32
No description provided.