Steering an autonomy job in its final/summary turn is silently dropped #93
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_shrimp#93
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
Steered a running autonomy job ("add docs and tests") while it was wrapping up. The guidance was acknowledged ("will be picked up at the next checkpoint") but never applied — the job finished without docs/tests.
Root cause
There are two steering paths, and message-send steering uses the one with a timing gap:
session.steer->global_steering_inbox): drained every iteration atiteration_shell.rsL193-211 and folded in as a[steering #N]user message. Immediate.steer_existing_job_from_message(session_autonomy.rsL139-147), which does NOT touch the live inbox — it writespending_operator_guidance. That is only surfaced byOperatorGuidanceProvider(job_context.rsL171-201), which runs insidebuild_history— i.e. only when the orchestrator builds the prompt for the next turn/checkpoint.If the job is already in its finalization/summary turn ("deliverable is complete, let me just report the summary and be done"), there is no next checkpoint, so the queued guidance is never rendered into a prompt and is dropped.
Secondary:
pending_operator_guidanceis read but never cleared after consumption (job_context.rs), so on a job that does continue it can be re-injected every turn.Proposed fix
pending_operator_guidance; if present, force one more turn/replan instead of ending. (Fixes the exact case.)session.steer.pending_operator_guidanceonce consumed.Update: fails from BOTH UI entry points
Confirmed steering does nothing from either UI control — and they call different RPCs, so the failure spans both mechanisms, not just the checkpoint-timing gap above:
MessageInput.tsx:290->steerActiveTurn(store.ts:1126-1135)session.steeriteration_shell.rs:193-211)LiveJobPanel.tsx:271-275job.steerSteeringInput.tsx:37job.steerChatActivity.tsx:96session.steerThe original issue only explained the
job.steer/ checkpoint path. But the chat Steer button usessession.steer(the live inbox that’s drained every iteration) and that also doesn’t land — so the live-inbox path is broken too.Lead to verify: the live-inbox drain is gated on
options.session_idbeingSome(iteration_shell.rs:193—if let Some(sid) = state.options.session_id ...). If an autonomy job’s agent loop runs withsession_id: None,session.steerpushes to a key nothing ever drains, so it silently no-ops. Need to confirm whether autonomy turns thread the session id intoAgentOptions.Net: all four steering entry points fail for autonomy jobs —
job.steerones because guidance is only read at a next checkpoint that never comes, andsession.steerones likely because the loop has nosession_idto drain against (to confirm).Implementation Spec for Issue #93
Findings (confirmed against code)
iteration_shell.rs:193-211: drain is gated onif let Some(sid) = state.options.session_id..., callsglobal_steering_inbox().drain_with_ids(sid, mode), folds each in as a[steering #N]user message. Confirmed.steering.rs:77-78: inbox isby_session: HashMap<i64, VecDeque<QueuedMessage>>, keyed by i64 session id.session.steerpushes viaglobal_steering_inbox().push(sid, text)(session.rs:172-174). Confirmed.steer_existing_job_from_message—session_autonomy.rs:455-519: writespending_operator_guidanceinto the job rowdetails_jsonviaqueue_pending_operator_guidance(L571-611). Does NOT touch the live inbox. Reply says "picked up at the next checkpoint." Confirmed.OperatorGuidanceProvider—job_context.rs:163-229: runs insidebuild_history, reads guidance from workspacetask_state.jsonand/or DBdetails_json. Confirmed.build_historyruns once per turn —loop_setup.rs:224: providers run once at turn setup, not per iteration. If the job is in its final turn, no new prompt is built and queued guidance is dropped. This is the root cause. Confirmed.pending_operator_guidancenever cleared — written atsession_autonomy.rs:588, read atjob_context.rs:187,228, no DB write removes it. A continuing job re-injects the same guidance every turn. Confirmed.proof_run.rs:658-696->in_process.rs:204-207->pipeline.rs:385,405: the autonomy loop runs withoptions.session_id = Some(real_session_id). The chat and its autonomy job share the session id, so the iteration_shell drain DOES run for autonomy jobs. Thesession.steerpath is not no-op'd by a missing session id. Confirmed.Objective
Ensure operator guidance steered into a running autonomy job is applied even when the job is in its final/summary turn, and stop re-injecting already-consumed guidance on jobs that continue.
Requirements
Files to Modify
crates/hero_shrimp_engine/src/agent_core/agent/job_context.rs— add read + clear helpers forpending_operator_guidance.crates/hero_shrimp_engine/src/agent_core/agent/no_tool_calls.rs— add the gate beforerecord_final_answer.crates/hero_shrimp_engine/src/agent_core/agent/loop_support.rs— add a bounded per-turn fold counter toAgentLoopState.crates/hero_shrimp_server/src/rpc/methods/session_autonomy.rs— (optional hardening) mirror message.send steer into the live inbox.Implementation Plan
Step 1: Read + clear helpers in
job_context.rsAdd
pending_operator_guidance(database, job_id)exposing the existing DB read (L215-229), andclear_pending_operator_guidance(database, job_id)that re-resolves the row (i64 ->get_autonomy_job, fallbackget_autonomy_job_by_artifact_job_id), removespending_operator_guidance/_atfromdetails_json, retainspending_operator_guidance_eventsas audit, and writes viaupsert_autonomy_job.Dependencies: none.
Step 2: Bounded fold counter in
AgentLoopStateAdd
operator_guidance_folds: u32(init 0) and aMAX_OPERATOR_GUIDANCE_FOLDScap (e.g. 3). Initialize at the construction site inloop_setup.rs.Dependencies: none (Step 3 reads it).
Step 3: Gate finalization in
no_tool_calls.rsImmediately before
record_final_answer, add: ifoptions.job_kind().is_autonomy()&& counter < cap && pending guidance exists -> push guidance as a high-priority user message, increment counter, clear the DB guidance, returnContinue. Refund one iteration if needed so the forced turn actually runs.Dependencies: Steps 1, 2.
Step 4 (optional): Mirror message.send steer into live inbox
In
session_autonomy.rs::steer_existing_job_from_message, alsoglobal_steering_inbox().push(session.as_i64(), text)so the running loop drains it per-iteration.Dependencies: independent.
Step 5: Tests
clear_pending_operator_guidanceround-trip (write, clear, assert key gone, events retained) co-located with existingjob_context.rstests.handle_no_tool_callsreturnsContinueand clears the row; non-autonomy chat turn returnsBreak.Acceptance Criteria
pending_operator_guidanceis removed once folded in;pending_operator_guidance_eventsretained as audit.Break).Notes
OperatorGuidanceProviderreads guidance from both the workspacetask_state.jsonand the DB row; Step 1 clears only the DB key. Whether the on-diskoperator_guidancefield is auto-cleared after a replan was not fully confirmed.Continueneeds an iteration refund to produce a real extra turn.Test Results
Ran
cargo test -p hero_shrimp_engine -p hero_shrimp_serverafter implementing the fix. All suites green.hero_shrimp_enginelib: 1667 passed, 0 failed, 1 ignoredhero_shrimp_serverlib + integration suites: 315 + 49 + 13 + 7 + 3 + 2 passed, 0 failedNew tests added in
job_context.rs:clear_pending_operator_guidance_round_trip— writes a job row withpending_operator_guidance+pending_operator_guidance_at+pending_operator_guidance_events, asserts the read helper returns the guidance, clears it, then asserts the guidance keys are gone, the read returnsNone, andpending_operator_guidance_eventsis retained as audit.clear_pending_operator_guidance_missing_row_is_noop— clearing with an unresolvable numeric id, a non-numeric id, a whitespace id, andNonedoes not panic.The full job_context test group:
8 passed; 0 failed.Implementation Summary
Fixed the dropped-steering bug for autonomy jobs in their final/summary turn, plus the never-cleared guidance re-injection, plus added per-iteration delivery for message.send steering.
Root cause (confirmed)
The
job.steer/ message.send path only writespending_operator_guidanceinto the job row. That field is read byOperatorGuidanceProviderinsidebuild_history, which runs once per turn at setup. If the job is already in its finalization turn, no new prompt is built, so the queued guidance is never rendered and is silently dropped. It was also never cleared, so on a job that did continue it re-injected every turn.Note on the secondary lead: the comment's hypothesis that
session.steerno-ops because the autonomy loop runs withsession_id: Nonedoes not hold for the in-process path — the loop runs withsession_id: Some(real_session_id)(pipeline.rs), so the live-inbox drain does fire for autonomy jobs.Changes
crates/hero_shrimp_engine/src/agent_core/agent/job_context.rs— addedpending_operator_guidance(...)(read helper) andclear_pending_operator_guidance(...)(removes thepending_operator_guidance/_atkeys from the job rowdetails_json, retainingpending_operator_guidance_eventsas audit).crates/hero_shrimp_engine/src/agent_core/agent/loop_support.rs— added a bounded per-turn counteroperator_guidance_foldsonAgentLoopStateandMAX_OPERATOR_GUIDANCE_FOLDS = 3.crates/hero_shrimp_engine/src/agent_core/agent/no_tool_calls.rs— added a finalization gate just beforerecord_final_answer: for autonomy jobs, if pending guidance exists and the fold cap is not reached, fold the guidance in as a high-priority user message, clear it, and force one more turn instead of finishing. Scoped strictly to autonomy jobs; chat turns are unchanged.crates/hero_shrimp_engine/src/agent_core/agent/loop_setup.rs— initialize the new counter.crates/hero_shrimp_server/src/rpc/methods/session_autonomy.rs—steer_existing_job_from_messagenow also pushes the steer text into the live steering inbox so the running loop drains it per-iteration (matches thesession.steerpath), in addition to the checkpoint guidance.Tests
clear_pending_operator_guidance_round_trip,clear_pending_operator_guidance_missing_row_is_noopinjob_context.rs.cargo test -p hero_shrimp_engine -p hero_shrimp_server: all green, 0 failures (engine 1667 passed, server suites 315+ passed).Caveats
pending_operator_guidancekey.OperatorGuidanceProvideralso reads an on-diskoperator_guidancefield in the workspacetask_state.json; whether that on-disk field is auto-cleared after a replan was not changed here.Changes are committed locally on the working branch; no push.
Implementation Summary
Fixed the dropped-steering bug for autonomy jobs in their final/summary turn, plus the never-cleared guidance re-injection, plus added per-iteration delivery for message.send steering.
Root cause (confirmed)
The
job.steer/ message.send path only writespending_operator_guidanceinto the job row. That field is read byOperatorGuidanceProviderinsidebuild_history, which runs once per turn at setup. If the job is already in its finalization turn, no new prompt is built, so the queued guidance is never rendered and is silently dropped. It was also never cleared, so on a job that did continue it re-injected every turn.Note on the secondary lead: the comment's hypothesis that
session.steerno-ops because the autonomy loop runs withsession_id: Nonedoes not hold for the in-process path — the loop runs withsession_id: Some(real_session_id)(pipeline.rs), so the live-inbox drain does fire for autonomy jobs.Changes
crates/hero_shrimp_engine/src/agent_core/agent/job_context.rs— addedpending_operator_guidance(...)(read helper) andclear_pending_operator_guidance(...)(removes thepending_operator_guidance/_atkeys from the job rowdetails_json, retainingpending_operator_guidance_eventsas audit).crates/hero_shrimp_engine/src/agent_core/agent/loop_support.rs— added a bounded per-turn counteroperator_guidance_foldsonAgentLoopStateandMAX_OPERATOR_GUIDANCE_FOLDS = 3.crates/hero_shrimp_engine/src/agent_core/agent/no_tool_calls.rs— added a finalization gate just beforerecord_final_answer: for autonomy jobs, if pending guidance exists and the fold cap is not reached, fold the guidance in as a high-priority user message, clear it, and force one more turn instead of finishing. Scoped strictly to autonomy jobs; chat turns are unchanged.crates/hero_shrimp_engine/src/agent_core/agent/loop_setup.rs— initialize the new counter.crates/hero_shrimp_server/src/rpc/methods/session_autonomy.rs—steer_existing_job_from_messagenow also pushes the steer text into the live steering inbox so the running loop drains it per-iteration (matches thesession.steerpath), in addition to the checkpoint guidance.Tests
clear_pending_operator_guidance_round_trip,clear_pending_operator_guidance_missing_row_is_noopinjob_context.rs.cargo test -p hero_shrimp_engine -p hero_shrimp_server: all green, 0 failures (engine 1667 passed, server suites 315+ passed).Caveats
pending_operator_guidancekey.OperatorGuidanceProvideralso reads an on-diskoperator_guidancefield in the workspacetask_state.json; whether that on-disk field is auto-cleared after a replan was not changed here.Changes are committed locally on the working branch; no push.