service_collab restart does not re-read livekit.secret — stale JWT signing key after rotation #36
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_collab#36
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Observed
On dev box (138.201.206.39), after rotating the livekit shared secret in
~/hero/cfg/livekit.secret:livekit-serverrestarted, picked up the new secret cleanlyhero_collab_serverkept signing JWTs with the old secret401 Unauthorizedagainst livekitproc service restart hero_collab_serverdid not fix itpkill -9+ manual respawn did the new secret loadHypothesis
hero_collab_serverreads--livekit-api-secret-fileonce at startup into in-memory state. A graceful restart viaproc service restartshould re-read it (new process = fresh memory), but it didn't here.Likely interaction with hero_proc orphan-procs bug
This may be a downstream effect of the orphan-procs issue filed against hero_proc. If
proc service restartdoesn't fully terminate the oldhero_collab_serverprocess, the new one might fail silently to bind the UDS (the old one still owns it), and traffic continues to hit the old PID with the stale secret.Diagnostic to confirm: after a restart that fails to refresh the secret, run
pgrep -af hero_collab_server— if there are two PIDs for one user, the orphan is the one serving stale traffic.Suggested fixes (independent of root cause)
Even if the orphan-procs root cause is fixed in hero_proc, defense-in-depth here is cheap and matches typical Unix daemon convention:
The SIGHUP approach is the most idiomatic and lets ops scripts trigger a refresh without going through the full proc lifecycle.
Repro
proc service restart hero_collab_serverpgrep -af hero_collab_server— check if the old PID is still alive (it usually is on this box)Why this matters
Secret rotation is exactly the kind of operation where you expect graceful restart to suffice. The current behavior forces operators to use
pkill -9, which masks the underlying supervisor bug and risks data loss if the process is mid-write.