AI Assistant: Progressive SSE Streaming (word-by-word response rendering)

mik-tf commented

2026-03-18 01:50:28 +00:00

Owner

Current State

The AI assistant (Shrimp) returns responses via Server-Sent Events (SSE). The current implementation reads the stream incrementally and waits for the event: done event before displaying the response.

What works now:

Stream is read chunk-by-chunk (not blocked on full body)
Response appears as soon as the LLM finishes (when done event arrives)
Abort/cancel works via AbortSignal
No more infinite spin on slow responses

What's missing:

Responses appear all at once after the LLM finishes thinking
No visual feedback during generation (just a spinner)
Multi-step agent tasks show nothing until all steps complete

The Enhancement

Show the AI response progressively as it's generated, word by word — like ChatGPT, Claude web, etc.

Shrimp already sends intermediate SSE events during generation:

event: token — partial content as the LLM generates tokens
event: tool_call — when the agent uses a tool
event: tool_result — tool execution result
event: done — final complete response

We currently ignore token events and only process done. Progressive streaming would render token events in real-time.

Implementation

1. Service layer (`ai_service.rs`)

Change send_message to accept a callback for streaming updates:

pub async fn send_message_streaming(
    shrimp_url: &str,
    user_message: &str,
    conversation_id: Option<String>,
    on_token: impl Fn(&str),  // Called for each token chunk
    abort_signal: Option<web_sys::AbortSignal>,
) -> Result<String, String>

Or return a Stream that yields partial updates.

2. UI component (`island.rs` / chat view)

Update the message state model:

Current: Pending → Complete
New: Pending → Streaming(partial_content) → Complete(full_content)

The chat bubble renders Streaming state with a blinking cursor and growing text.

3. Token event parsing

Process event: token in the SSE reader loop:

if collected_since_last.contains("event: token\n") {
    // Extract data, call on_token callback
    // UI updates the streaming message bubble
}

4. Agent tool use visualization (stretch goal)

When Shrimp uses tools (web search, file operations, etc.), show status:

"Searching the web..."
"Reading documentation..."
"Executing code..."

This requires parsing event: tool_call and event: tool_result events.

Files

File	Change
`hero_archipelagos/archipelagos/intelligence/ai/src/services/ai_service.rs`	Streaming API + token event parsing
`hero_archipelagos/archipelagos/intelligence/ai/src/island.rs`	Streaming message state + UI rendering
`hero_archipelagos/archipelagos/intelligence/ai/src/views/`	Chat bubble streaming animation

Priority

Medium — the current fix prevents infinite spin. This enhancement improves UX but is not blocking.

## Current State The AI assistant (Shrimp) returns responses via Server-Sent Events (SSE). The current implementation reads the stream incrementally and waits for the `event: done` event before displaying the response. **What works now:** - Stream is read chunk-by-chunk (not blocked on full body) - Response appears as soon as the LLM finishes (when `done` event arrives) - Abort/cancel works via AbortSignal - No more infinite spin on slow responses **What's missing:** - Responses appear all at once after the LLM finishes thinking - No visual feedback during generation (just a spinner) - Multi-step agent tasks show nothing until all steps complete ## The Enhancement Show the AI response **progressively as it's generated**, word by word — like ChatGPT, Claude web, etc. Shrimp already sends intermediate SSE events during generation: - `event: token` — partial content as the LLM generates tokens - `event: tool_call` — when the agent uses a tool - `event: tool_result` — tool execution result - `event: done` — final complete response We currently ignore `token` events and only process `done`. Progressive streaming would render `token` events in real-time. ## Implementation ### 1. Service layer (`ai_service.rs`) Change `send_message` to accept a callback for streaming updates: ```rust pub async fn send_message_streaming( shrimp_url: &str, user_message: &str, conversation_id: Option<String>, on_token: impl Fn(&str), // Called for each token chunk abort_signal: Option<web_sys::AbortSignal>, ) -> Result<String, String> ``` Or return a `Stream` that yields partial updates. ### 2. UI component (`island.rs` / chat view) Update the message state model: - Current: `Pending` → `Complete` - New: `Pending` → `Streaming(partial_content)` → `Complete(full_content)` The chat bubble renders `Streaming` state with a blinking cursor and growing text. ### 3. Token event parsing Process `event: token` in the SSE reader loop: ```rust if collected_since_last.contains("event: token\n") { // Extract data, call on_token callback // UI updates the streaming message bubble } ``` ### 4. Agent tool use visualization (stretch goal) When Shrimp uses tools (web search, file operations, etc.), show status: - "Searching the web..." - "Reading documentation..." - "Executing code..." This requires parsing `event: tool_call` and `event: tool_result` events. ## Files | File | Change | |------|--------| | `hero_archipelagos/archipelagos/intelligence/ai/src/services/ai_service.rs` | Streaming API + token event parsing | | `hero_archipelagos/archipelagos/intelligence/ai/src/island.rs` | Streaming message state + UI rendering | | `hero_archipelagos/archipelagos/intelligence/ai/src/views/` | Chat bubble streaming animation | ## Priority Medium — the current fix prevents infinite spin. This enhancement improves UX but is not blocking.

mik-tf referenced this issue

2026-03-18 01:55:43 +00:00

Hero OS Enhancements #23

mik-tf referenced this issue

Rows
Columns

AI Assistant: Progressive SSE Streaming (word-by-word response rendering) #32

Current State

The Enhancement

Implementation

1. Service layer (ai_service.rs)

2. UI component (island.rs / chat view)

3. Token event parsing

4. Agent tool use visualization (stretch goal)

Files

Priority

Implemented and deployed to herodev

Reopening — backend streaming not yet implemented

SSE streaming working with hero_agent

1. Service layer (`ai_service.rs`)

2. UI component (`island.rs` / chat view)