ai client ready for image generation #124
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_lib#124
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
see crates/ai/src
we need to improve the client for
https://openrouter.ai/google/gemini-3.1-flash-image-preview/api
model = Nano Banana 2
we need to see how to also add images to the context to generate based on one or more images
for images see https://openrouter.ai/docs/guides/overview/multimodal/image-generation#image-aspect-ratio-configuration
we need a clean API extension to allow to generate images also starting from image(s)
make an example as well
Below is a practical spec for using OpenRouter with Nano Banana 2 for image generation and image-conditioned generation.
OpenRouter image generation spec for Nano Banana 2
Model
Use:
This is the OpenRouter model page for Nano Banana 2. OpenRouter lists it as released on February 26, 2026, with 65,536 context, and states that aspect ratios are controlled through the
image_configparameter. (OpenRouter)Endpoint
Use the normal OpenRouter chat endpoint:
OpenRouter’s API is designed to be close to the OpenAI Chat Completions format, and the full schema is published in OpenAPI form. (OpenRouter)
1. Core rules
For image generation with Gemini image-capable models on OpenRouter, send:
OpenRouter’s image generation guide explicitly says Gemini-style models that output both text and images should use
["image", "text"]. (OpenRouter)To pass one or more input images, use message content arrays with items of type:
OpenRouter supports both public image URLs and base64 data URLs for image inputs, and multiple images can be sent as separate content items in the same message. OpenRouter also recommends putting the text prompt first, then the images. (OpenRouter)
2. Supported input formats
For input images, OpenRouter supports:
https://.../image.jpgdata:image/jpeg;base64,...Supported image MIME types in the image input guide are:
image/pngimage/jpegimage/webpimage/gif(OpenRouter)3. Image output format
Generated images come back in the assistant message under an
imagesfield. The docs show this structure:OpenRouter says the generated images are returned as base64-encoded data URLs, typically PNG, and some models can return multiple images in one response. (OpenRouter)
4. Image configuration for Nano Banana 2
OpenRouter supports
image_configfor image-capable models. For this model you can control:Aspect ratio
Standard supported aspect ratios include:
1:12:33:23:44:34:55:49:1616:921:9For google/gemini-3.1-flash-image-preview specifically, OpenRouter also lists extended ratios:
1:44:11:88:1(OpenRouter)Image size
Supported values:
0.5K— listed as supported by this Gemini model only1K2K4K(OpenRouter)Example
OpenRouter’s image generation guide shows
aspect_ratioandimage_sizebeing used together in the same request. (OpenRouter)5. Minimal spec: text-to-image
Request
The
HTTP-RefererandX-OpenRouter-Titleheaders are optional OpenRouter-specific headers. (OpenRouter)Response handling
Read:
That value will be a base64 data URL. (OpenRouter)
6. Spec: image-to-image or image-conditioned generation with one image
Use a content array in the user message. Put the instruction first, then the image.
Request
The OpenRouter docs say images are sent in the
messagesarray asimage_urlcontent parts, and multiple image parts may be included in one request. (OpenRouter)7. Spec: image-conditioned generation with multiple images
You can provide multiple images as separate
image_urlparts in the same content array.Request
OpenRouter does not give a single universal maximum image count in the general docs; it says the number of images allowed varies by provider and model. So treat multi-image support as supported in structure, but keep your app ready for provider/model-specific limits. (OpenRouter)
8. Base64 local file variant
For private or local images, send a data URL.
Example content part
OpenRouter explicitly documents base64 image inputs for local/private images. (OpenRouter)
9. Streaming behavior
You can stream image generation responses. In streaming mode, image chunks appear under:
The image generation guide shows streamed image events arriving in the
delta.imagesfield. (OpenRouter)Streaming note
For most apps, non-streaming is simpler for image generation unless you specifically want incremental progress handling.
10. Recommended prompt structure for reference-image workflows
For better control, use this instruction pattern:
This part is my recommendation based on the input format OpenRouter supports and how multimodal prompts are parsed.
11. Important implementation detail: raw JSON vs SDK field names
OpenRouter’s general API schema shows raw request content parts using snake_case:
But the SDK examples on the docs page show
imageUrlin code examples. That means:image_urlimageUrldepending on the SDK language/binding (OpenRouter)For a direct REST client, I would standardize on this raw JSON shape:
12. Production guidance
Use these defaults:
Then expose these app-level controls:
promptreference_images[]aspect_ratioimage_sizestreamreturn_texttoggle if you want to keep or ignoremessage.contentAlso validate that your parser checks for
message.imagesbefore assuming output exists, which OpenRouter recommends in its best practices. (OpenRouter)13. Full reusable spec block
If you want, I can turn this into a clean developer markdown spec with curl + TypeScript + Python examples.
REMARK: make sure images which are given from local pathgs can be reformatted to approprate format e.g. from jpeg to png or other way around and in right size (nr of bytes)
Implementation Spec for Issue #124
Objective
Extend the
herolib_aicrate so the existing Gemini 3.1 Flash Image Preview integration supports image-conditioned generation (one or more reference images, including from local files), keeps the current text-to-image path working, and exposes a clean, idiomatic Rust builder API. Add a local-image loader that detects MIME, optionally re-encodes (PNG/JPEG/WebP) and resizes to fit a max-byte budget. Ship a runnable example covering both flows.Requirements
crates/ai/src/client/mod.rs(send_image_request,parse_image_response) — no newProvidervariant.ImageGenerationRequestwith:.prompt(..),.add_image_url(..),.add_image_data(mime, bytes),.add_image_path(path),.add_image_path_with(path, ImageLoadOptions),.aspect_ratio(..),.image_size(..),.model(..),.execute(&AiClient) -> AiResult<ImageGenerationResult>.image_iosubmodule) supporting PNG / JPEG / WebP / GIF; functions to: detect MIME from extension + magic bytes, re-encode between formats, resize to fit a max-bytes budget by iteratively shrinking dimensions and/or lowering JPEG/WebP quality.Vec<GeneratedImage> { bytes: Vec<u8>, mime: String, data_url: String }plus optional accompanying text and the model used. Keep the legacyImageGenerationResponsetype as a re-export/alias for back-compat with existing examples/tests.usermessage using existingContentPart::ImageUrl/ImageUrlInput.["image","text"]already wired; reuse.imagecrate (0.25, default-features off, features png/jpeg/webp/gif) behind a default-onimage-iocargo feature so consumers who don't need local-file support keep their build slim.crates/ai/examples/doing both pure text-to-image and image-conditioned editing from a local PNG/JPEG path.thiserrorfor new errors, doc-comments on every public item. Sync (no tokio) — match existingureqpattern.Files to Modify/Create
crates/ai/Cargo.toml— add optionalimagedep +image-iofeature (default-on).crates/ai/src/image_generation/mod.rs— declare new submodules; re-exports.crates/ai/src/image_generation/image_io.rs— NEW.ImageLoadOptions,LoadedImage,load_image_from_path,reencode,fit_to_byte_budget, MIME helpers.crates/ai/src/image_generation/request.rs— NEW.ImageGenerationRequestbuilder,ImageInput,GeneratedImage,ImageGenerationResult,.execute(&AiClient).crates/ai/src/error.rs— addAiError::ImageIo(String)withFrom<image::ImageError>under feature.crates/ai/src/client/mod.rs— exposeimage_request()entry point; refactor response parsing toVec<GeneratedImage>; keep legacygenerate_image*methods.crates/ai/src/lib.rs— re-export new public types.crates/ai/examples/image_generation_builder.rs— NEW. Text-to-image + image-conditioned from local file.crates/ai/examples/README.md— one-line entry.Implementation Plan
Step 1: Add
imagecrate dependency behind featureFiles:
crates/ai/Cargo.tomlimage = { version = "0.25", default-features = false, features = ["png","jpeg","webp","gif"], optional = true }image-iofeature; include indefault.Dependencies: none
Step 2: Implement image I/O helpers
Files:
crates/ai/src/image_generation/image_io.rsImageFormatHint { Png, Jpeg, Webp, Gif }withas_mime()/from_mime().ImageLoadOptions { target_format, max_bytes, max_dimension, jpeg_quality }builder. Defaults: keep format, 4 MiB, no dim cap, quality 85.LoadedImage { bytes, mime }withto_data_url().detect_mime_from_path(&Path)(extension) +detect_mime_from_bytes(magic bytes /image::guess_format).load_image_from_path(&Path, &ImageLoadOptions) -> AiResult<LoadedImage>— decodes, optionally converts, shrinks dimensions / drops quality until undermax_bytes(max 6 iterations).#[cfg(feature = "image-io")]; stub errors otherwise.Dependencies: Step 1
Step 3: Add
ImageIovariant toAiErrorFiles:
crates/ai/src/error.rs#[error("image io error: {0}")] ImageIo(String)+From<image::ImageError>under feature.Dependencies: Step 1
Step 4: Build
ImageGenerationRequestbuilder & result typesFiles:
crates/ai/src/image_generation/request.rsImageInput { Url(String), DataUrl(String), Bytes { mime, bytes }, Path(PathBuf, ImageLoadOptions) }withinto_data_url().ImageGenerationRequest { model, prompt, images, config }with builder methods.GeneratedImage { bytes, mime, data_url }/ImageGenerationResult { images, text, model }.execute(self, client: &AiClient) -> AiResult<ImageGenerationResult>— resolves inputs, builds multipart user message (text first, images after), reuses client plumbing.Dependencies: Steps 2, 3
Step 5: Wire builder into the client; multi-image response parsing
Files:
crates/ai/src/client/mod.rs,crates/ai/src/image_generation/mod.rs,crates/ai/src/lib.rsparse_image_response→parse_all_images(Vec<GeneratedImage>+ text). Legacygenerate_image_with_optionskeeps wrapping first image.AiClient::image_request() -> ImageGenerationRequest.Dependencies: Step 4
Step 6: Example program
Files:
crates/ai/examples/image_generation_builder.rs,crates/ai/examples/README.mdAiClient::from_env().image_request().prompt(..).aspect_ratio(..).execute()?→ save.ImageLoadOptions::new().with_target_format(Png).with_max_bytes(4*1024*1024)→ save result.Dependencies: Step 5
Step 7: Tests
Files:
crates/ai/src/image_generation/image_io.rs,crates/ai/src/image_generation/request.rsmodalities, prompt first, images after, validdata:URLs.cargo test -p herolib_ai+cargo build -p herolib_ai --examples.Dependencies: Steps 2, 4, 5
Acceptance Criteria
cargo build -p herolib_aiandcargo build -p herolib_ai --examplessucceed on Rust 1.92 / edition 2024.cargo test -p herolib_aipasses; new tests cover MIME detect, byte-budget shrink, builder JSON shape.ImageGenerationRequestexposes the full builder surface and is documented.max_bytesbudget enforced.modalities=["image","text"], text first, images after, validdata:URLs.Vec<GeneratedImage>.generate_image*andimage_generation_test.rsstill compile and behave as before.///doc-comments.Notes
Message::user_with_images(text, &[(mime, b64), ...])already intypes.rs.image0.25 with minimal features keeps binary size reasonable.ureq.Providerenum unchanged; OpenRouter remains the only mapping forModel::Gemini3_1FlashImagePreview.Test Results
Total: 72 passed, 0 failed.
New tests in
image_generation::image_io:test_detect_mime_from_pathtest_format_roundtriptest_load_small_image_no_resizetest_byte_budget_shrinks_large_image(2000x2000 PNG shrunk into a 50 KiB JPEG budget)test_format_conversion_png_to_jpegtest_max_dimensiontest_data_urlNew tests in
image_generation::request:test_builder_basictest_builder_with_urltest_image_input_url_passthroughtest_image_input_data_url_passthroughtest_image_input_bytestest_decode_data_urlImplementation Summary
Files Added
crates/ai/src/image_generation/image_io.rs— image loader with MIME detection, format conversion (PNG/JPEG/WebP/GIF), and byte-budget resizing (quality stepping for lossy formats, dimension halving for lossless, max 6 iterations).crates/ai/src/image_generation/request.rs—ImageGenerationRequestbuilder;ImageInputenum (Url/DataUrl/Bytes/Path);GeneratedImageandImageGenerationResulttypes; multi-image response parsing viaparse_all_images.crates/ai/examples/image_generation_builder.rs— runnable example covering text-to-image and image-conditioned generation from a local path.Files Modified
crates/ai/Cargo.toml— added optionalimage = "0.25"(png/jpeg/webp/gif features only) andbase64 = "0.22"behind a default-onimage-iofeature.crates/ai/src/error.rs— addedAiError::ImageIo(String)variant.crates/ai/src/image_generation/mod.rs— declaredpub mod image_io;(feature-gated) andpub mod request;.crates/ai/src/client/mod.rs— addedAiClient::image_request()builder entry point; exposedprovider_config,send_image_request_for,record_usage_pubaspub(crate)helpers for the builder; promotedbase64_decodetopub(crate).crates/ai/src/lib.rs— re-exportsImageGenerationRequest,ImageGenerationResult,GeneratedImage,ImageInput, and (behindimage-io)ImageFormat,ImageLoadOptions,LoadedImage.crates/ai/examples/README.md— entry for the new builder example.Design Notes
Providerenum untouched; OpenRouter remains the only mapping forModel::Gemini3_1FlashImagePreview.generate_image*methods are unchanged and keep working for back-compat — the exampleimage_generation_test.rsstill compiles and runs.modalities=["image","text"]; reference images sent asdata:<mime>;base64,<data>URLs.ureq), no tokio.Test Results
Pull request opened: #125
This PR implements the changes discussed in this issue.