Add JSON-schema-constrained chat completion (json_schema / response_format) #59

New issue

Closed

opened 2026-05-05 13:05:05 +00:00 by timur · 2 comments

timur commented

2026-05-05 13:05:05 +00:00

Owner

Need: JSON-schema-constrained chat completion

Per hero_memory ADR-0006, hero_memory routes all LLM calls through hero_aibroker. The Q&A extractor (Phase 4) and the ontology extractor (Phase 5) both require provider-agnostic JSON-schema-constrained output — the model returns JSON guaranteed to match a given schema.

This is missing today. Audit summary:

ai.chat JSON-RPC method (openrpc.json:579–618): no json_schema parameter
POST /v1/chat/completions REST endpoint (crates/hero_aibroker_server/src/api/chat.rs:17–42): no response_format / json_schema field
ChatRequest (crates/hero_aibroker_lib/src/providers/types.rs:120–145): no schema field
OpenAIProvider does not forward any structured-output constraint

Proposal

Add json_schema: Option<serde_json::Value> (or response_format: Option<ResponseFormat> matching OpenAI's shape) to ChatRequest. Pass it through the OpenAI-compatible providers (OpenAI, OpenRouter, Groq, SambaNova) which all accept response_format = { "type": "json_schema", "json_schema": {...} }. Surface in:

ai.chat RPC method (new optional param)
POST /v1/chat/completions REST request struct (new optional field)
Provider trait pass-through (already a wide JSON-pass-through for most providers, so this is a wire-through, not a per-provider implementation)

The audit estimated this as a small change (< 1 day), no breaking changes since all fields are optional.

What hero_memory needs from the broker, exact shape

chat.complete(
    messages: Vec<Message>,
    model_hint: Option<String>,    // existing `model`, just optional/auto
    json_schema: Option<Value>,    // NEW
    max_tokens: Option<u32>,       // existing
    timeout_ms: Option<u64>,       // NEW (per-request timeout; currently global)
)

timeout_ms is also currently fixed at 300s in the OpenAI client builder — would be useful per-request but lower priority than json_schema.

Why this is blocking

hero_memory Phase 4 (Q&A extraction): each LLM call must return {"pairs": [{"question": "...", "answer": "...", "anchor": "..."}]} — without schema constraint, free-form responses break parsing.
hero_memory Phase 5 (ontology extraction): each LLM call must return nodes/edges matching the loaded ontology shape — without schema constraint, validation churn becomes prohibitive.

Acceptance

ChatRequest accepts json_schema (or equivalent structured-output field).
OpenAI / OpenRouter / Groq / SambaNova providers pass it through to upstream response_format.
ai.chat RPC accepts the new field.
POST /v1/chat/completions accepts the new field.
An integration test asserts that a request with a json_schema returns JSON parseable against that schema.

References hero_memory issue: lhumina_code/hero_memory#1

## Need: JSON-schema-constrained chat completion Per [hero_memory ADR-0006](https://forge.ourworld.tf/lhumina_code/hero_memory/src/branch/development/docs/adr/0006-aibroker-routing.md), `hero_memory` routes all LLM calls through `hero_aibroker`. The Q&A extractor (Phase 4) and the ontology extractor (Phase 5) both require provider-agnostic **JSON-schema-constrained output** — the model returns JSON guaranteed to match a given schema. This is missing today. Audit summary: - `ai.chat` JSON-RPC method (openrpc.json:579–618): no `json_schema` parameter - `POST /v1/chat/completions` REST endpoint (`crates/hero_aibroker_server/src/api/chat.rs:17–42`): no `response_format` / `json_schema` field - `ChatRequest` (`crates/hero_aibroker_lib/src/providers/types.rs:120–145`): no schema field - `OpenAIProvider` does not forward any structured-output constraint ## Proposal **Add `json_schema: Option<serde_json::Value>` (or `response_format: Option<ResponseFormat>` matching OpenAI's shape) to `ChatRequest`.** Pass it through the OpenAI-compatible providers (OpenAI, OpenRouter, Groq, SambaNova) which all accept `response_format = { "type": "json_schema", "json_schema": {...} }`. Surface in: - `ai.chat` RPC method (new optional param) - `POST /v1/chat/completions` REST request struct (new optional field) - Provider trait pass-through (already a wide JSON-pass-through for most providers, so this is a wire-through, not a per-provider implementation) The audit estimated this as a **small change (< 1 day)**, no breaking changes since all fields are optional. ## What hero_memory needs from the broker, exact shape ```rust chat.complete( messages: Vec<Message>, model_hint: Option<String>, // existing `model`, just optional/auto json_schema: Option<Value>, // NEW max_tokens: Option<u32>, // existing timeout_ms: Option<u64>, // NEW (per-request timeout; currently global) ) ``` `timeout_ms` is also currently fixed at 300s in the OpenAI client builder — would be useful per-request but lower priority than `json_schema`. ## Why this is blocking - hero_memory Phase 4 (Q&A extraction): each LLM call must return `{"pairs": [{"question": "...", "answer": "...", "anchor": "..."}]}` — without schema constraint, free-form responses break parsing. - hero_memory Phase 5 (ontology extraction): each LLM call must return nodes/edges matching the loaded ontology shape — without schema constraint, validation churn becomes prohibitive. ## Acceptance - [ ] `ChatRequest` accepts `json_schema` (or equivalent structured-output field). - [ ] OpenAI / OpenRouter / Groq / SambaNova providers pass it through to upstream `response_format`. - [ ] `ai.chat` RPC accepts the new field. - [ ] `POST /v1/chat/completions` accepts the new field. - [ ] An integration test asserts that a request with a json_schema returns JSON parseable against that schema. References hero_memory issue: https://forge.ourworld.tf/lhumina_code/hero_memory/issues/1

timur referenced this issue from lhumina_code/hero_memory

2026-05-05 13:15:12 +00:00

Implement hero_memory #1

timur referenced this issue

2026-05-05 13:25:14 +00:00

feat(chat): add response_format passthrough for structured output #60

timur commented

2026-05-05 13:25:15 +00:00

Author

Owner

Branch + PR up: #60 (commit ebfd358).

Small change as the audit predicted — response_format: Option<Value> on ChatRequest and OpenAIChatRequest, forwarded unchanged to upstream. ai.chat RPC documents the new optional param. Pass-through only — callers (hero_memory) build the schema.

Branch + PR up: https://forge.ourworld.tf/lhumina_code/hero_aibroker/pulls/60 (commit ebfd358). Small change as the audit predicted — `response_format: Option<Value>` on `ChatRequest` and `OpenAIChatRequest`, forwarded unchanged to upstream. `ai.chat` RPC documents the new optional param. Pass-through only — callers (hero_memory) build the schema.

timur commented

2026-05-06 08:39:09 +00:00

Author

Owner

Resolved upstream. response_format is already on ChatRequest (typed ResponseFormat { kind, json_schema }) and exposed via ai.chat in openrpc.json on development, courtesy of the cascade multi-broker change. hero_memory's Q&A and ontology extractors are now pinned to development directly. PR #60 closed as redundant.

Resolved upstream. `response_format` is already on `ChatRequest` (typed `ResponseFormat { kind, json_schema }`) and exposed via `ai.chat` in openrpc.json on `development`, courtesy of the cascade multi-broker change. hero_memory's Q&A and ontology extractors are now pinned to `development` directly. PR #60 closed as redundant.

timur closed this issue

2026-05-06 08:39:09 +00:00