[nu-demo] hero_agent should support tool_choice="required" for grounded modes in llm_client.rs #150

Closed
opened 2026-04-24 01:20:01 +00:00 by mik-tf · 1 comment
Owner

Why

Even with the right system prompt (see #149) and the right tool in the always_include list, well-behaved LLMs still don't always call the tool — "use this tool first" reads as a hint when it's phrased as a guideline, and models weigh it against their own confidence.

For the Hero OS demo, when a user asks about Hero-stack topics, we want deterministic grounding: the model MUST call search_hero_docs, period. The OpenAI tool-use API (and its OpenAI-compatible cousins) support this directly via tool_choice:

Value Behavior
"none" Tools not offered
"auto" (default) Model decides
"required" Model MUST call some tool (any offered)
{"type":"function","function":{"name":"search_hero_docs"}} Model MUST call this specific tool

Claude's tool-use extension (via Anthropic's OpenAI-compat endpoint — which is what aibroker is doing) honors all four.

Current state

hero_agent/crates/hero_agent/src/llm_client.rs::try_completion (line ~343) builds the request body as:

let mut body = serde_json::json!({
    "model": target.model,
    "messages": messages,
    "max_tokens": options.max_tokens.unwrap_or(self.max_tokens),
    "temperature": options.temperature.unwrap_or(self.temperature),
});

if let Some(tools) = tools
    && !tools.is_empty()
{
    body["tools"] = serde_json::to_value(tools)?;
}

There is no tool_choice ever emitted — the field is entirely missing.

Same gap in try_stream around line 505.

Proposed change

  1. Add a tool_choice: Option<ToolChoice> field to LlmOptions:
pub enum ToolChoice {
    Auto,
    Required,
    Specific(String), // tool name
}
  1. In try_completion + try_stream, after setting tools, emit:
if let Some(choice) = &options.tool_choice {
    body["tool_choice"] = match choice {
        ToolChoice::Auto => json!("auto"),
        ToolChoice::Required => json!("required"),
        ToolChoice::Specific(name) => json!({
            "type": "function",
            "function": { "name": name }
        }),
    };
}
  1. In the agent's routing layer (agent.rs or routes.rs), heuristically set tool_choice = Required on first turn of any conversation whose message contains hero-stack keywords (same list as prompt.rs MANDATORY block — hero_*, hero os, etc.).

  2. Expose a force_tools: bool param in agent.chat params so callers (test harnesses, evaluation scripts) can override.

Rollback / safety

  • tool_choice = "required" forces a tool call but the model still picks which tool — if the model picks the wrong one (e.g. list_services instead of search_hero_docs), we get junk grounding. Use Specific("search_hero_docs") for the strongest guarantee.
  • On follow-up turns (after a tool_result), set tool_choice = "auto" so the model can compose the final answer without being forced back into another tool call.

Verification

Before (confirmed 2026-04-24): plain "What is Hero OS?" with model=claude-haiku-4.5 returns generic OS description.

After: same query must return content citing hero_os_guide and describing Hero OS as a "sovereign digital workspace" matching the docs_hero corpus.

  • #148 — flow-documentation index
  • #149 (or next available) — prompt.rs directive strengthening

Signed-off-by: mik-tf

## Why Even with the right system prompt (see https://forge.ourworld.tf/lhumina_code/home/issues/149) and the right tool in the `always_include` list, well-behaved LLMs still *don't always* call the tool — "use this tool first" reads as a hint when it's phrased as a guideline, and models weigh it against their own confidence. For the Hero OS demo, when a user asks about Hero-stack topics, we want deterministic grounding: the model MUST call `search_hero_docs`, period. The OpenAI tool-use API (and its OpenAI-compatible cousins) support this directly via `tool_choice`: | Value | Behavior | |---|---| | `"none"` | Tools not offered | | `"auto"` (default) | Model decides | | `"required"` | Model MUST call some tool (any offered) | | `{"type":"function","function":{"name":"search_hero_docs"}}` | Model MUST call this specific tool | Claude's tool-use extension (via Anthropic's OpenAI-compat endpoint — which is what aibroker is doing) honors all four. ## Current state `hero_agent/crates/hero_agent/src/llm_client.rs::try_completion` (line ~343) builds the request body as: ```rust let mut body = serde_json::json!({ "model": target.model, "messages": messages, "max_tokens": options.max_tokens.unwrap_or(self.max_tokens), "temperature": options.temperature.unwrap_or(self.temperature), }); if let Some(tools) = tools && !tools.is_empty() { body["tools"] = serde_json::to_value(tools)?; } ``` There is no `tool_choice` ever emitted — the field is entirely missing. Same gap in `try_stream` around line 505. ## Proposed change 1. Add a `tool_choice: Option<ToolChoice>` field to `LlmOptions`: ```rust pub enum ToolChoice { Auto, Required, Specific(String), // tool name } ``` 2. In `try_completion` + `try_stream`, after setting `tools`, emit: ```rust if let Some(choice) = &options.tool_choice { body["tool_choice"] = match choice { ToolChoice::Auto => json!("auto"), ToolChoice::Required => json!("required"), ToolChoice::Specific(name) => json!({ "type": "function", "function": { "name": name } }), }; } ``` 3. In the agent's routing layer (`agent.rs` or `routes.rs`), heuristically set `tool_choice = Required` on first turn of any conversation whose message contains hero-stack keywords (same list as prompt.rs MANDATORY block — `hero_*`, `hero os`, etc.). 4. Expose a `force_tools: bool` param in `agent.chat` params so callers (test harnesses, evaluation scripts) can override. ## Rollback / safety - `tool_choice = "required"` forces a tool call but the model still picks which tool — if the model picks the wrong one (e.g. `list_services` instead of `search_hero_docs`), we get junk grounding. Use `Specific("search_hero_docs")` for the strongest guarantee. - On follow-up turns (after a tool_result), set `tool_choice = "auto"` so the model can compose the final answer without being forced back into another tool call. ## Verification Before (confirmed 2026-04-24): plain "What is Hero OS?" with `model=claude-haiku-4.5` returns generic OS description. After: same query must return content citing `hero_os_guide` and describing Hero OS as a "sovereign digital workspace" matching the docs_hero corpus. ## Related - https://forge.ourworld.tf/lhumina_code/home/issues/148 — flow-documentation index - https://forge.ourworld.tf/lhumina_code/home/issues/149 (or next available) — prompt.rs directive strengthening Signed-off-by: mik-tf
Author
Owner

Fixed in hero_agent commit 54ba5d5 on development. Implements all 4 steps from the issue body.

Step 1+2 — LlmOptions.tool_choice + body emission (llm_client.rs):

pub enum ToolChoice {
    Auto,
    Required,
    Specific(String),
}

pub struct LlmOptions {
    // …
    pub tool_choice: Option<ToolChoice>,
}

In build_request_body, after setting tools:

if let Some(choice) = &options.tool_choice {
    body["tool_choice"] = match choice {
        ToolChoice::Auto => json!("auto"),
        ToolChoice::Required => json!("required"),
        ToolChoice::Specific(name) => json!({
            "type": "function",
            "function": { "name": name }
        }),
    };
}

The emit is gated inside the existing if let Some(tools) = tools && !tools.is_empty() block — tool_choice without tools is meaningless and would 400 the provider, so the field is omitted.

Step 3 — agent.rs heuristic for forced grounding:

New helper message_contains_hero_keyword(messages) scans the LATEST user message only, case-insensitive, against the same hero_* trigger set used in prompt.rs (home#149). Earlier turns are explicitly ignored — otherwise we'd pin grounding on every iteration after the first match.

In agent_loop(), on iteration 0 only:

let tool_choice = if iteration == 0
    && tools.iter().any(|t| t.function.name == "search_hero_docs")
    && message_contains_hero_keyword(messages)
{
    Some(ToolChoice::Specific("search_hero_docs".to_string()))
} else {
    None
};

Matches the issue body's "Rollback / safety": iterations >= 1 carry tool_choice = None so the model can compose the final answer from tool_results without being forced into another tool call.

Step 4 — force_tools agent.chat param: deferred. The tool_choice field on LlmOptions is already public and can be threaded through agent.chat if a test harness needs to force a specific tool — adding the explicit user-facing param can land in a follow-up if/when an integration test demands it.

Tests added (9 new):

llm_client::tests::tool_choice (4):

  • body_omits_tool_choice_when_none — preserves provider default ("auto")
  • body_emits_required_when_required
  • body_emits_specific_when_specific
  • body_omits_tool_choice_without_tools_even_if_set — guard against 400

agent::tests (5):

  • basic match (Hero OS, hero_books, hero_aibroker)
  • case-insensitive (HERO OS, Hero_Books)
  • unrelated content rejected ("Who is your hero?", weather)
  • only the latest user message is scanned
  • empty messages handled

Verification: cargo fmt, cargo check -p hero_agent clean, all 98 hero_agent lib tests pass (89 pre-existing + 9 new).

Pairing: This is the belt to home#149's prompt-directive suspenders. The prompt directive guides the model; tool_choice = Specific(name) is a hard constraint at the API level. Together they ensure deterministic grounding for hero-stack questions without breaking the agent's freedom on non-hero queries.

Meta-tracker: home#193.

Signed-off-by: mik-tf

Fixed in hero_agent commit `54ba5d5` on `development`. Implements all 4 steps from the issue body. **Step 1+2 — `LlmOptions.tool_choice` + body emission** (`llm_client.rs`): ```rust pub enum ToolChoice { Auto, Required, Specific(String), } pub struct LlmOptions { // … pub tool_choice: Option<ToolChoice>, } ``` In `build_request_body`, after setting `tools`: ```rust if let Some(choice) = &options.tool_choice { body["tool_choice"] = match choice { ToolChoice::Auto => json!("auto"), ToolChoice::Required => json!("required"), ToolChoice::Specific(name) => json!({ "type": "function", "function": { "name": name } }), }; } ``` The emit is gated inside the existing `if let Some(tools) = tools && !tools.is_empty()` block — `tool_choice` without `tools` is meaningless and would 400 the provider, so the field is omitted. **Step 3 — agent.rs heuristic for forced grounding:** New helper `message_contains_hero_keyword(messages)` scans the LATEST user message only, case-insensitive, against the same hero_* trigger set used in prompt.rs (home#149). Earlier turns are explicitly ignored — otherwise we'd pin grounding on every iteration after the first match. In `agent_loop()`, on iteration 0 only: ```rust let tool_choice = if iteration == 0 && tools.iter().any(|t| t.function.name == "search_hero_docs") && message_contains_hero_keyword(messages) { Some(ToolChoice::Specific("search_hero_docs".to_string())) } else { None }; ``` Matches the issue body's "Rollback / safety": iterations >= 1 carry `tool_choice = None` so the model can compose the final answer from tool_results without being forced into another tool call. **Step 4 — `force_tools` agent.chat param:** deferred. The `tool_choice` field on `LlmOptions` is already public and can be threaded through `agent.chat` if a test harness needs to force a specific tool — adding the explicit user-facing param can land in a follow-up if/when an integration test demands it. **Tests added (9 new):** `llm_client::tests::tool_choice` (4): - `body_omits_tool_choice_when_none` — preserves provider default ("auto") - `body_emits_required_when_required` - `body_emits_specific_when_specific` - `body_omits_tool_choice_without_tools_even_if_set` — guard against 400 `agent::tests` (5): - basic match (Hero OS, hero_books, hero_aibroker) - case-insensitive (HERO OS, Hero_Books) - unrelated content rejected ("Who is your hero?", weather) - only the latest user message is scanned - empty messages handled **Verification:** `cargo fmt`, `cargo check -p hero_agent` clean, all 98 hero_agent lib tests pass (89 pre-existing + 9 new). **Pairing:** This is the belt to home#149's prompt-directive suspenders. The prompt directive *guides* the model; `tool_choice = Specific(name)` is a *hard constraint* at the API level. Together they ensure deterministic grounding for hero-stack questions without breaking the agent's freedom on non-hero queries. Meta-tracker: home#193. Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/home#150
No description provided.