- Rust 100%
Decouple subvolume names from mount paths by changing SUBVOLUMES from a flat name list to (name, mount_point) tuples. This allows the new mos-bin subvolume to mount at /var/cache/bin rather than /var/cache/mos-bin. |
||
|---|---|---|
| docs/adr | ||
| src | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
mos_sysvol
Persistent storage initialization for MOS nodes. Called during early boot by zinit as a oneshot service to detect, initialize, or mount system storage.
What it does
mos_sysvol is an idempotent storage initializer:
- Existing storage found — assembles raid arrays (if any), mounts everything
- Empty disk(s) available — partitions, creates raid (dual-disk), formats, creates subvolumes, mounts
- No suitable disk — returns cleanly (node runs diskless)
It never touches disks that have existing data (partition tables, filesystems, mount state, or device-mapper holders).
Partition layout
A fixed 5-partition GPT layout, identical on every disk, identical on BIOS and UEFI:
| # | Name | Size | GPT Type | Filesystem | Mount point |
|---|---|---|---|---|---|
| 1 | mosbios |
1 MB | EF02 (BIOS Boot) |
none | none |
| 2 | mosefi |
100 MB | EF00 (ESP) |
FAT32 | /boot/efi |
| 3 | mosboot |
1 GB | 8300 (Linux) |
ext4 | /boot |
| 4 | mosswap |
2 GB (default) | 8200 (Linux swap) |
swap | none |
| 5 | mosdata |
configurable (default 4 GB) | 8300 (Linux) |
btrfs | subvolumes |
GPT partition tables are created using the gptman crate (pure Rust, no sgdisk).
Dual-disk support
When two or more empty disks of the same type tier are found, mos_sysvol partitions exactly two with the identical layout and creates mdadm raid1 arrays:
| Array | Members | mdadm metadata | Reason |
|---|---|---|---|
/dev/md/mosefi |
disk1p2 + disk2p2 | v0.9 | Superblock at end — firmware sees clean FAT32 at sector 0. Each disk independently bootable. |
/dev/md/mosboot |
disk1p3 + disk2p3 | v1.2 | Standard metadata. GRUB reads v1.2 natively. More robust. |
Swap and data are only formatted on the first disk. The future volume manager handles btrfs raid and swap activation.
Disk pairing rules
- Group by type tier: NVMe > SSD > HDD (never mix tiers)
- Pick the first two from the best tier (alphabetically)
- Size may differ — smaller disk constrains partition sizes
- Extra disks are left untouched for the volume manager
- Single disk works fine — no raid, direct partitions
Subsequent boots
On reboot, mos_sysvol detects existing storage by label, runs mdadm --assemble for any inactive arrays, then mounts. It does not depend on initramfs auto-assembly.
Btrfs subvolumes
The mosdata partition contains four subvolumes:
| Subvolume | Mount point | Purpose |
|---|---|---|
system |
/var/cache/system |
System state |
etc |
/var/cache/etc |
Configuration |
modules |
/var/cache/modules |
Kernel modules |
vm-meta |
/var/cache/vm-meta |
VM metadata |
All subvolumes are mounted with noatime,space_cache=v2.
Kernel command line
| Parameter | Default | Description |
|---|---|---|
mossize=N |
4 |
Data partition size in GB |
mosswap=N |
2 |
Swap partition size in GB |
Example: mossize=32 mosswap=4 creates a 32 GB data partition and 4 GB swap.
Disk selection
Candidate disks are enumerated from /sys/block and sorted by type priority:
- NVMe (highest priority)
- SSD (SATA/SAS, non-rotational)
- HDD (rotational)
Excluded: loop, ram, dm-, sr, fd, zram devices.
Empty-disk verification
A disk must pass five checks before mos_sysvol will touch it:
- No partition entries in sysfs (
/sys/block/<dev>/<dev>*) - Not currently mounted (
/proc/mounts) - No device-mapper holders (
/sys/block/<dev>/holders/) - No filesystem signatures (
blkid -p) - No GPT partition entries (read via
gptman)
Library usage
mos_sysvol exposes its functionality as a library crate. The binary is a thin wrapper around mos_sysvol::init().
Initialization (full flow)
use mos_sysvol::{StorageState, StorageError};
fn main() -> Result<(), StorageError> {
env_logger::init();
match mos_sysvol::init()? {
StorageState::Mounted { device, boot_device, efi_device } => {
println!("Mounted: data={} boot={} efi={}",
device.display(), boot_device.display(), efi_device.display());
}
StorageState::Initialized { device, boot_device, efi_device, dual_disk } => {
println!("Initialized: data={} boot={} efi={} dual={}",
device.display(), boot_device.display(), efi_device.display(), dual_disk);
}
StorageState::NoDisk => {
println!("No suitable disk found, running diskless");
}
}
Ok(())
}
Querying status (read-only)
let status = mos_sysvol::status();
println!("Data device: {:?}", status.mosdata_device);
println!("Boot device: {:?}", status.mosboot_device);
println!("EFI device: {:?}", status.mosefi_device);
println!("Mounted: {}", status.is_mounted);
for sv in &status.subvolumes {
println!(" {} mounted={} at {:?}", sv.name, sv.is_mounted, sv.mount_point);
}
Checking if storage exists
if mos_sysvol::storage_exists() {
println!("MOS storage already provisioned");
}
Mounting existing storage
let state = mos_sysvol::mount_existing()?;
Assembles raid arrays if present, mounts all subvolumes, boot, and EFI. Also creates any missing subvolumes if a previous init was interrupted.
Unmounting
mos_sysvol::unmount()?;
Unmounts subvolumes, then /boot/efi, then /boot.
Public types
StorageState
pub enum StorageState {
Mounted {
device: PathBuf, // mosdata partition
boot_device: PathBuf, // mosboot partition or /dev/md/mosboot
efi_device: PathBuf, // mosefi partition or /dev/md/mosefi
},
Initialized {
device: PathBuf,
boot_device: PathBuf,
efi_device: PathBuf,
dual_disk: bool, // true if raid1 arrays were created
},
NoDisk,
}
StorageStatus
pub struct StorageStatus {
pub mosdata_device: Option<PathBuf>,
pub mosboot_device: Option<PathBuf>,
pub mosefi_device: Option<PathBuf>,
pub is_mounted: bool,
pub subvolumes: Vec<SubvolumeInfo>,
}
SubvolumeInfo
pub struct SubvolumeInfo {
pub name: String,
pub mount_point: Option<PathBuf>,
pub is_mounted: bool,
}
StorageError
| Variant | When |
|---|---|
EnumerationFailed |
Cannot read /sys/block |
PartitioningFailed |
Partition creation failed |
GptError |
GPT read/write error (gptman) |
RaidError |
mdadm array create/assemble failed |
FormatFailed |
mkfs.* or mkswap failed |
MountFailed |
mount command failed |
SubvolumeFailed |
btrfs subvolume command failed |
CommandFailed |
Any other external command failed |
DeviceInUse |
Device is mounted or has holders |
DeviceNotEmpty |
Device has existing data |
Io |
Underlying I/O error |
What mos_sysvol does NOT do
- Does not activate swap — deferred to volume manager
- Does not install kernels or GRUB — only creates and mounts partitions
- Does not assemble btrfs raid — volume manager's responsibility
- Does not treat BIOS and UEFI differently for partition layout
Runtime requirements
Root privileges and these tools:
| Tool | Used for |
|---|---|
modprobe |
Loading btrfs kernel module |
blkid |
Filesystem label detection |
mkfs.vfat |
Formatting ESP |
mkfs.ext4 |
Formatting /boot |
mkswap |
Formatting swap |
mkfs.btrfs |
Formatting data partition |
btrfs |
Subvolume creation and listing |
mount / umount |
Mounting filesystems |
mdadm |
RAID array creation and assembly (dual-disk) |
udevadm or partprobe |
Kernel partition table reload |
GPT operations are handled in pure Rust via gptman — no sgdisk or gdisk required.
Building
cargo build --release
Requires Rust 1.85+ (edition 2024). Linux only.
Testing
cargo test
Unit tests cover command line parsing, partition path generation, boot mode logic, GPT type GUID encoding, and disk pairing logic. Integration testing requires a VM with empty virtual disks.
License
Apache-2.0