Default 10 second upstream timeout cuts off substrate-awaiting compute calls #111

New issue

Closed

opened 2026-05-24 16:58:52 +00:00 by mik-tf · 0 comments

mik-tf commented

2026-05-24 16:58:52 +00:00

Owner

hero_router's default upstream timeout (10 seconds) is shorter than the legitimate response time of substrate-awaiting RPCs. Concretely, deployer.provision_vm calls ComputeService.deploy_vm which now (per the D-27 fix at hero_compute 39d9b8a) waits for on-chain ack before returning, so a successful response can take 30 to 300 seconds. Through hero_router the call returns compute.deploy_vm: invalid response shape: json parse: expected value at line 1 column 1; raw: upstream timeout after exactly 10 seconds while the upstream daemon is still legitimately working. Bypassing the router and hitting the daemon UDS directly returns the real result in 70+ seconds with no client-side timeout. Two reasonable fix options: lift the default to something like 600 seconds for all routes that route to a substrate-backed daemon, or accept a per-route timeout config so the compute path gets 600 seconds while everything else keeps the existing default. The current behaviour silently truncates the deployer-mediated flow even when the underlying call succeeds eventually.

Signed-by: mik-tf mik-tf@noreply.invalid

hero_router's default upstream timeout (10 seconds) is shorter than the legitimate response time of substrate-awaiting RPCs. Concretely, `deployer.provision_vm` calls `ComputeService.deploy_vm` which now (per the D-27 fix at hero_compute 39d9b8a) waits for on-chain ack before returning, so a successful response can take 30 to 300 seconds. Through hero_router the call returns `compute.deploy_vm: invalid response shape: json parse: expected value at line 1 column 1; raw: upstream timeout` after exactly 10 seconds while the upstream daemon is still legitimately working. Bypassing the router and hitting the daemon UDS directly returns the real result in 70+ seconds with no client-side timeout. Two reasonable fix options: lift the default to something like 600 seconds for all routes that route to a substrate-backed daemon, or accept a per-route timeout config so the compute path gets 600 seconds while everything else keeps the existing default. The current behaviour silently truncates the deployer-mediated flow even when the underlying call succeeds eventually. Signed-by: mik-tf <mik-tf@noreply.invalid>