[onlyoffice][hotfixes-applied] document-server unbreakage 2026-05-01 — three-layer postmortem + prod-level follow-ups #57
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
OnlyOffice document editor on the herodemo VM was completely broken across the 2026-04-30 redeploy. After ~3 layers of investigation we have it working. This issue documents the symptom chain, the layered hotfixes that landed, and the prod-level follow-ups needed before the next deploy.
Symptom chain (as observed)
"ONLYOFFICE_JWT_SECRET is not set on the server"(red banner in editor)."The file cannot be accessed right now"(modal in editor; spreadsheet/presentation never loads)."Download failed"(red error dialog in editor).Each fix unblocked the next layer underneath.
Root causes & hotfixes
Layer 1 — JWT secret missing
Cause:
ONLYOFFICE_JWT_SECRETwas not in~/hero/cfg/env/env.shat all. Even when added, env.sh exports don't propagate to nu deploy scripts (see hero_skills#191 —load_init_shdoesn't followsourcedirectives). Result:service_onlyoffice.nuregistered the action spec with the hardcoded placeholderOO_DEFAULT_SECRET = "hero-demo-jwt-secret-change-in-prod", baked into the docker run command. hero_office_server validated incoming JWTs against the (correct) env.sh secret while the OnlyOffice container signed them with the placeholder → JWT mismatch, every callback rejected.Hotfix applied:
openssl rand -hex 32).~/hero/cfg/env/env.shand to hero_proc secret store (hero_proc secret set ONLYOFFICE_JWT_SECRET ...).service_onlyoffice start --resetto re-register the action spec — only--resetregenerates the docker run command embedded in the action.Prod-level fix needed:
OO_DEFAULT_SECRET = "hero-demo-jwt-secret-change-in-prod"fromservice_onlyoffice.nu. Fail-closed ifONLYOFFICE_JWT_SECRETis unset, with a clear error message pointing toinit.shandenv.sh.service_onlyofficeshould read viaproc secret getinstead of env.Layer 2 —
host.docker.internaldoesn't resolve on Linux DockerCause: hero_office_server gives OnlyOffice a callback/download URL of the form
http://host.docker.internal:9988/hero_office/ui/.... On Docker Desktop (macOS/Windows),host.docker.internalis auto-provided. On Linux Docker (this VM, prod), it is not — and the Dockerhost-gatewaymagic value resolves to the docker0 bridge IP (172.17.0.1), which has no listener on:9988(nginx is bound to10.1.2.2:9988, hero_router to127.0.0.1:9988).Hotfix applied: patched
service_onlyoffice.nuto detect the host IP at install time (prefers private LAN IP from non-loopback, non-bridge interfaces; here10.1.2.2) and inject--add-host=host.docker.internal:<detected-ip>into the docker run command. Operator override:HERO_HOST_IPenv var.Prod-level fix needed:
:9988from a docker container's perspective at install time, and use that.host-gateway-ipin/etc/docker/daemon.jsonso--add-host=host.docker.internal:host-gatewayworks portably without per-host detection. Trades one daemon config edit at provisioning time for cross-host portability of the launcher script.Layer 3 — nginx basic auth blocks the OnlyOffice callback/download paths
Cause:
/etc/nginx/sites-enabled/hero_demoenforcesauth_basic "Hero OS Demo"on every path except^/hero_[a-z_]+/rpc(/|$). OnlyOffice container hitsGET /hero_office/ui/files/<ctx>/<file>andPOST /hero_office/ui/callback/<ctx>— both UI paths, both basic-auth gated → 401, surfacing as "Download failed" in the editor. The container has no way to send basic-auth credentials (operator-side).Hotfix applied: patched
tools/modules/installers/auth.nuto add an additionallocation ~ ^/hero_office/ui/(files|callback)(/|$) { auth_basic off; ... }block. Both endpoints are JWT-signed (HMAC-SHA256 withONLYOFFICE_JWT_SECRET) and validated by hero_office_server, so the JWT is the actual auth. Live nginx config patched manually + reloaded; full installer re-run will reproduce on next deploy.Prod-level fix needed:
/hero_office/ui/files/*to unauthenticated access./files/<ctx>/<file>should only be readable by an OnlyOffice container request (presence ofAuthorization: Bearer <jwt>from the OO container), not by random clients.docs_heroso the next operator who adds an OnlyOffice-like component knows the JWT-vs-basic-auth boundary.Where the patches landed
lhumina_code/hero_skills/tools/modules/services/service_onlyoffice.nuoo_host_alias_iphelper (HERO_HOST_IP env var or auto-detect).oo_launcher_scriptacceptshost_aliasand embeds--add-host=host.docker.internal:<ip>.oo_install_launchercalls the helper and prints the resolved IP.lhumina_code/hero_skills/tools/modules/installers/auth.nulocationblock exempting/hero_office/ui/(files|callback)/*from basic auth.Local-only on the VM at time of writing (under
/home/driver/hero/code/hero_skills/); upstream PR pending.Cross-references
load_init_shdoesn't followsourcedirectives. This compounds the JWT-secret problem at deploy time and is the broader fix that would have prevented Layer 1 entirely.Demo posture
For tomorrow's demo: working as of 2026-05-01 22:38 UTC. Don't trigger another
service_onlyoffice start --resetwithoutONLYOFFICE_JWT_SECRETin nu env (would re-bake the placeholder secret into the action spec). The current launcher script + docker container are correctly paired.Signed-off-by: mik-tf
mik-tf referenced this issue2026-05-02 03:28:52 +00:00