multi_user_add: nu set as login shell → SSH disconnect leaves orphan nushell REPLs at 100% CPU #199
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills#199
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
multi_user_addsets each user's login shell directly to~/hero/bin/nu. This puts a bare nushell REPL between sshd and the user's pty with no multiplexer in between. When the SSH connection dies (clean disconnect, broken pipe, network drop, laptop sleep) the pty is yanked out from under the runningnu, and nushell's REPL hits an upstream bug class where the read loop returnsEIOrepeatedly and spins at 100 % CPU forever — orphaned to PID 1, in an abandoned systemd-logind scope, unreachable byCtrl+Cor sshd's SIGHUP.Each interrupted SSH session leaves another core permanently consumed. Live evidence and full investigation: lhumina_code/home#205.
Where in the code
tools/modules/installers/multiuser.nu:The project already treats nu-as-login-shell as undesirable for the install user —
tools/install.sh:747-781explicitly reverts the calling user's shell back tobash(Linux) /zsh(macOS) if it finds it set to nu.multi_user_addis asymmetric with that policy.Why nushell spins forever after SSH disconnect
sshdexecs the login shell directly. With nu as login shell, there is notmux/zellij/moshbetween sshd and nu to hold the pty. When SSH dies, the pty is closed under nu's feet. Nushell's REPL loop doesn't handle that case cleanly — the read returnsEIO(or short reads), nu prints an error, retries the read, repeats at 100 % CPU. Upstream issues confirming the bug class:nushell/nushell#6455— exact symptom: infinite "Input/output error" loop, high CPU,Ctrl+Cdoes not break it (originally reported forflatpak enter; same pty-vanishes trigger).nushell/nushell#17964— recent (2025) one-core-pegged report when nu is the system shell.#9876,#10219,#9497,#5029,#7938.Ctrl+Ccannot break out (no controlling tty), and SIGHUP from the dead sshd never arrives (its sshd process exited before sending it). The only ways out arekill -9from another session orloginctl kill-session <id>. Two such orphans on kristof5 right now have burned ~290 hours of CPU each.Note:
multi_user_delalready calls^sudo killall -u $username(line 1056), which sends SIGTERM, so the deletion path probably reaps these orphans. The leak is in the steady-state running path — every disconnected SSH session that didn't terminate cleanly leaves an orphan that survives until the user is deleted or the box is rebooted.Recommended engineered fix
The proper fix is to put a multiplexer between sshd and nu. This both prevents the bug (nu never sees its tty disappear) and gives users session reattach-after-disconnect, which they want anyway.
1. Set the login shell to
/bin/bash, not~/hero/bin/nuIn
multi_user_add, drop thenushell assignment and use the system shell:2. Auto-attach to a multiplexer on interactive SSH
The template's
~/.bashrc(whichmulti_user_template_createalready provisions) gets an exec-into-tmux block, gated so it only fires for interactive SSH and not nested invocations:Behavior:
herosession, attaching if it already exists.ssh user@host nu -c …, scp, rsync,ForceCommand-style admin tooling) →$- == *i*is false → block is skipped, command runs normally under bash.tmux split, manualbashfrom inside nu) →$TMUXis set → block is skipped, no recursion.3. Provide a
HERO_NO_TMUXescape hatchLets operators bypass the wrapper for debugging without editing the user's home (
ssh user@host -o SetEnv='HERO_NO_TMUX=1').4. Ensure tmux is installed by the installer
tools/install.shshould install tmux as a hard dependency (it's already standard on every distro the project targets). Same line of reasoning as the recentinstall_nushell || diechange atinstall.sh:744.What this does NOT fix
nushell/nushelland out of scope here. Worth filing a minimal SSH-disconnect repro upstream against #6455 once the workaround is in place.Cross-reference
Umbrella: lhumina_code/home#205
Sibling: lhumina_code/hero_browser#18 (
BrowserPoollifecycle)