thinking is getting shown as normal text #104
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_shrimp#104
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implementation Spec for Issue #104
Objective
Inline
<think>...</think>reasoning segments emitted by models inside the assistantcontentfield must be routed into the existing reasoning lane (reasoning events,LlmResponse.reasoning,messages.reasoning_json, the UI ThinkingPane) instead of being rendered as plain text in the chat. The fix must work for streaming (tags may be split across SSE chunks), non-streaming responses, and already-persisted history that still contains raw tags.Requirements
<think>...</think>segments from assistant content at the runtime completion layer so every downstream consumer (events, persistence,turn:endreply, job narration) receives clean content.llm:deltawithkind: "reasoning"during streaming, andReasoningBlockentries on the finalLlmResponse(sopersist_message_with_reasoningstores it inreasoning_jsonand multi-turn rehydration keeps working).<thi, the next starts withnk>): no tag fragment may leak into the content lane, and held-back text must be flushed when it turns out not to be a tag.<think>at end of stream (treat the remainder as reasoning, not content).<think>blocks interleaved with prose in one response (as shown in the issue screenshot).<think>for distillation JSONL).<think>tags (and replies from older daemons) must render with the thinking content moved into the collapsible ThinkingPane, never as body text.Files to Modify/Create
crates/hero_shrimp_runtime/src/llm/completion/think_tags.rs- New module: incremental stream filter plus whole-string splitter for<think>tags, with unit tests.crates/hero_shrimp_runtime/src/llm/completion/mod.rs- Wire the filter into the streaming SSE loop and the non-streaming OpenAI-compatible path.crates/hero_shrimp_web/ui/src/store.ts- Client-side fallback: extract<think>segments influshStreamBuffer, theturn:endhandler, andloadConversationMessageshistory mapping.crates/hero_shrimp_web/ui/src/components/ChatThread.tsx- Render a collapsed ThinkingPane on settled assistant messages that carryreasoningText.crates/hero_shrimp_web/static/- Regenerated Vite bundle (output ofnpm run buildincrates/hero_shrimp_web/ui/; embedded via rust-embed).Implementation Plan
Step 1: Add a think-tag parser module to the runtime
Files:
crates/hero_shrimp_runtime/src/llm/completion/think_tags.rs(new),crates/hero_shrimp_runtime/src/llm/completion/mod.rs(module declaration only)think_tags.rscontaining a stateful incremental parserThinkTagFilterwithpush(&mut self, chunk: &str) -> Vec<DeltaPiece>andfinish(&mut self) -> Vec<DeltaPiece>, where eachDeltaPieceis either Content or Reasoning text. State machine toggled by exact<think>/</think>markers; when a chunk ends with a proper prefix of a potential marker, hold it back and resolve on the next push;finish()flushes held-back text as content and treats any still-open think section as reasoning.split_think_tags(content: &str) -> (Option<String>, Vec<String>)for non-streaming responses: returns content with all<think>...</think>spans removed (collapsing excess blank lines, trimming;Noneif nothing remains) plus extracted reasoning texts. An unclosed trailing<think>consumes to end-of-string as reasoning.<that is not a tag; unclosed<think>at stream end; input with no tags passes through byte-identical;split_think_tagson the issue's example transcript.mod think_tags;incompletion/mod.rs.Dependencies: none
Step 2: Filter inline think tags in the streaming dispatch path
Files:
crates/hero_shrimp_runtime/src/llm/completion/mod.rsThinkTagFilterper stream next toacc_content/acc_reasoning.delta.contentchunk throughfilter.push(c); append returned pieces toacc_contentoracc_reasoningand, whenemit_streamis true, emitllm:deltawithkind: "content"orkind: "reasoning"respectively, reusing the existing JSON shapes. Structureddelta.reasoning/delta.reasoning_contenthandling stays unchanged.filter.finish()and append the resulting pieces to the accumulators.emit_streamis false (background phases); only event emission stays gated, matching current behavior.Dependencies: Step 1
Step 3: Filter inline think tags in the non-streaming OpenAI-compatible path
Files:
crates/hero_shrimp_runtime/src/llm/completion/mod.rscontentis extracted andreasoningis built viareasoning_from_assistant_message, runsplit_think_tagson the content; replacecontentwith the cleaned remainder and push each extracted segment as aReasoningBlockonto thereasoningvec.phase_emits_to_chat_streamblock so the existingagent:reasoningemission and thellm:responsepreview pick up the changes.Dependencies: Step 1
Step 4: Client-side fallback extraction in the web UI store
Files:
crates/hero_shrimp_web/ui/src/store.tsstripPseudoToolCalls:extractThinkBlocks(text)removes all complete<think>...</think>spans plus a trailing unclosed<think>..., returning the cleaned text and the concatenated reasoning.flushStreamBuffer(run on the accumulated text beforestripPseudoToolCalls, appending extracted reasoning to the message'sreasoningText/reasoningChars); theturn:endhandler (cleanreply, merge reasoning); andloadConversationMessages(clean persisted history so old messages with raw tags render correctly).Dependencies: none (independent of Steps 1-3)
Step 5: Show a collapsed reasoning pane on settled messages
Files:
crates/hero_shrimp_web/ui/src/components/ChatThread.tsxThinkingPanecurrently renders only during streaming fallback and job-narration think segments; a finalized message withreasoningTextshows nothing. Render a collapsedThinkingPaneabove the message body whenever the message is an assistant message withreasoningTextand is not in the streaming-fallback state, so extracted thinking from history reload andturn:endis reachable via the existing "show reasoning" toggle.Dependencies: Step 4
Step 6: Rebuild the embedded UI bundle and run verification
Files:
crates/hero_shrimp_web/ui/(build),crates/hero_shrimp_web/static/assets/(generated output)npm run buildincrates/hero_shrimp_web/ui/and commit the regeneratedstatic/assets(the hashed bundle is committed and embedded via rust-embed at compile time).cargo test -p hero_shrimp_runtime,cargo clippy -p hero_shrimp_runtime -p hero_shrimp_web -- -D warnings, and a workspacecargo check.Dependencies: Steps 1-5
Acceptance Criteria
<think>...</think>renders the thinking text only in the collapsible ThinkingPane; the message body contains no<think>or</think>markers.<think>or</think>marker split across SSE chunk boundaries never leaks fragments (e.g.<thi) into the visible body, and non-tag text starting with<is not swallowed.<think>at end of stream is treated as reasoning, not content.<think>blocks interleaved with prose all land in the reasoning lane; the prose between them remains intact and in order.LlmResponse.contentis clean and extracted segments appear inLlmResponse.reasoning(and hencemessages.reasoning_json).think_tags.rspass;cargo test -p hero_shrimp_runtimeand workspace clippy are green; the regeneratedstatic/assetsbundle is committed.Notes
ReasoningBlock,reasoning_jsonpersistence, ThinkingPane); this issue is purely that inline-tag models bypass it. No new event type is needed; reusellm:delta kind:"reasoning".<think>/</think>. Some DeepSeek-R1 deployments omit the opening tag and emit only a closing</think>; handling a leading orphan</think>is a worthwhile hardening but should be a deliberate, tested choice in Step 1.trajectory.rsintentionally wraps reasoning in<think>for ShareGPT export; no change there and the filter must not run on that output.llm:deltaevents, so the backend fix automatically cleans background-job narration too.crates/hero_shrimp_web/ui/(Vite build intostatic/assets);v2-src/must not be modified.Test Results
cargo test --workspace(all 14 member crates, unit + integration + doc tests):Clippy: clean (
cargo clippy --workspace -- -D warningsexits 0, no warnings)New think_tags unit tests: 13 passed (
hero_shrimp_runtime::llm::completion::think_tags)All 13 think_tags tests verified individually:
Implementation Summary
Inline
<think>...</think>reasoning emitted by models inside the assistant content is now routed into the existing reasoning lane instead of being rendered as plain text in the chat. The thinking content still appears, but in the collapsible ThinkingPane where reasoning normally shows, and it is persisted as reasoning rather than message text.Changes
crates/hero_shrimp_runtime/src/llm/completion/think_tags.rs(new): incrementalThinkTagFilterfor streaming (handles markers split across SSE chunks, held-back partial markers, unclosed trailing<think>) andsplit_think_tagsfor whole-string responses; 13 unit tests.crates/hero_shrimp_runtime/src/llm/completion/mod.rs: streaming path routes everydelta.contentchunk through the filter, emittingllm:deltawithkind: "content"orkind: "reasoning"and accumulating into the existingacc_content/acc_reasoning; filter is flushed after the stream ends. Non-streaming OpenAI-compatible path runssplit_think_tagson the extracted content and pushes extracted segments asReasoningBlockentries before event emission. Anthropic-compatible branch and trajectory export untouched.crates/hero_shrimp_web/ui/src/store.ts: client-side fallbackextractThinkBlocksapplied influshStreamBuffer, theturn:endhandler, andloadConversationMessages, so history persisted with raw tags (and replies from older daemons) renders clean with the reasoning attached to the message.crates/hero_shrimp_web/ui/src/components/ChatThread.tsx: settled assistant messages with reasoning now render a collapsed ThinkingPane above the body (default closed, existing toggle), so extracted thinking remains reachable after the message finalizes.crates/hero_shrimp_web/static/: regenerated Vite bundle (app.CKaPpq6t.js, index.DypXtPlZ.css, index.html references).Test Results
Notes
<think>/</think>markers are matched. The DeepSeek variant that omits the opening tag (orphan leading</think>) is deliberately not handled; this is documented in the parser module.reasoning_jsonlike structured reasoning, so it survives conversation reloads.Pull request opened: #109
This PR implements the changes discussed in this issue.