No description
Find a file
Jan De Landtsheer 51daccf917
Add mos-bin subvolume mounted at /var/cache/bin
Decouple subvolume names from mount paths by changing SUBVOLUMES
from a flat name list to (name, mount_point) tuples. This allows
the new mos-bin subvolume to mount at /var/cache/bin rather than
/var/cache/mos-bin.
2026-03-25 17:45:24 +01:00
docs/adr Unified 5-partition layout with dual-disk mdadm raid1 2026-03-24 15:19:46 +01:00
src Add mos-bin subvolume mounted at /var/cache/bin 2026-03-25 17:45:24 +01:00
.gitignore Initial commit: MOS system volume initialization 2026-03-23 17:13:46 +01:00
Cargo.lock Initial commit: MOS system volume initialization 2026-03-23 17:13:46 +01:00
Cargo.toml Initial commit: MOS system volume initialization 2026-03-23 17:13:46 +01:00
README.md Unified 5-partition layout with dual-disk mdadm raid1 2026-03-24 15:19:46 +01:00

mos_sysvol

Persistent storage initialization for MOS nodes. Called during early boot by zinit as a oneshot service to detect, initialize, or mount system storage.

What it does

mos_sysvol is an idempotent storage initializer:

  1. Existing storage found — assembles raid arrays (if any), mounts everything
  2. Empty disk(s) available — partitions, creates raid (dual-disk), formats, creates subvolumes, mounts
  3. No suitable disk — returns cleanly (node runs diskless)

It never touches disks that have existing data (partition tables, filesystems, mount state, or device-mapper holders).

Partition layout

A fixed 5-partition GPT layout, identical on every disk, identical on BIOS and UEFI:

# Name Size GPT Type Filesystem Mount point
1 mosbios 1 MB EF02 (BIOS Boot) none none
2 mosefi 100 MB EF00 (ESP) FAT32 /boot/efi
3 mosboot 1 GB 8300 (Linux) ext4 /boot
4 mosswap 2 GB (default) 8200 (Linux swap) swap none
5 mosdata configurable (default 4 GB) 8300 (Linux) btrfs subvolumes

GPT partition tables are created using the gptman crate (pure Rust, no sgdisk).

Dual-disk support

When two or more empty disks of the same type tier are found, mos_sysvol partitions exactly two with the identical layout and creates mdadm raid1 arrays:

Array Members mdadm metadata Reason
/dev/md/mosefi disk1p2 + disk2p2 v0.9 Superblock at end — firmware sees clean FAT32 at sector 0. Each disk independently bootable.
/dev/md/mosboot disk1p3 + disk2p3 v1.2 Standard metadata. GRUB reads v1.2 natively. More robust.

Swap and data are only formatted on the first disk. The future volume manager handles btrfs raid and swap activation.

Disk pairing rules

  • Group by type tier: NVMe > SSD > HDD (never mix tiers)
  • Pick the first two from the best tier (alphabetically)
  • Size may differ — smaller disk constrains partition sizes
  • Extra disks are left untouched for the volume manager
  • Single disk works fine — no raid, direct partitions

Subsequent boots

On reboot, mos_sysvol detects existing storage by label, runs mdadm --assemble for any inactive arrays, then mounts. It does not depend on initramfs auto-assembly.

Btrfs subvolumes

The mosdata partition contains four subvolumes:

Subvolume Mount point Purpose
system /var/cache/system System state
etc /var/cache/etc Configuration
modules /var/cache/modules Kernel modules
vm-meta /var/cache/vm-meta VM metadata

All subvolumes are mounted with noatime,space_cache=v2.

Kernel command line

Parameter Default Description
mossize=N 4 Data partition size in GB
mosswap=N 2 Swap partition size in GB

Example: mossize=32 mosswap=4 creates a 32 GB data partition and 4 GB swap.

Disk selection

Candidate disks are enumerated from /sys/block and sorted by type priority:

  1. NVMe (highest priority)
  2. SSD (SATA/SAS, non-rotational)
  3. HDD (rotational)

Excluded: loop, ram, dm-, sr, fd, zram devices.

Empty-disk verification

A disk must pass five checks before mos_sysvol will touch it:

  1. No partition entries in sysfs (/sys/block/<dev>/<dev>*)
  2. Not currently mounted (/proc/mounts)
  3. No device-mapper holders (/sys/block/<dev>/holders/)
  4. No filesystem signatures (blkid -p)
  5. No GPT partition entries (read via gptman)

Library usage

mos_sysvol exposes its functionality as a library crate. The binary is a thin wrapper around mos_sysvol::init().

Initialization (full flow)

use mos_sysvol::{StorageState, StorageError};

fn main() -> Result<(), StorageError> {
    env_logger::init();

    match mos_sysvol::init()? {
        StorageState::Mounted { device, boot_device, efi_device } => {
            println!("Mounted: data={} boot={} efi={}",
                device.display(), boot_device.display(), efi_device.display());
        }
        StorageState::Initialized { device, boot_device, efi_device, dual_disk } => {
            println!("Initialized: data={} boot={} efi={} dual={}",
                device.display(), boot_device.display(), efi_device.display(), dual_disk);
        }
        StorageState::NoDisk => {
            println!("No suitable disk found, running diskless");
        }
    }

    Ok(())
}

Querying status (read-only)

let status = mos_sysvol::status();

println!("Data device: {:?}", status.mosdata_device);
println!("Boot device: {:?}", status.mosboot_device);
println!("EFI device: {:?}", status.mosefi_device);
println!("Mounted: {}", status.is_mounted);

for sv in &status.subvolumes {
    println!("  {} mounted={} at {:?}", sv.name, sv.is_mounted, sv.mount_point);
}

Checking if storage exists

if mos_sysvol::storage_exists() {
    println!("MOS storage already provisioned");
}

Mounting existing storage

let state = mos_sysvol::mount_existing()?;

Assembles raid arrays if present, mounts all subvolumes, boot, and EFI. Also creates any missing subvolumes if a previous init was interrupted.

Unmounting

mos_sysvol::unmount()?;

Unmounts subvolumes, then /boot/efi, then /boot.

Public types

StorageState

pub enum StorageState {
    Mounted {
        device: PathBuf,       // mosdata partition
        boot_device: PathBuf,  // mosboot partition or /dev/md/mosboot
        efi_device: PathBuf,   // mosefi partition or /dev/md/mosefi
    },
    Initialized {
        device: PathBuf,
        boot_device: PathBuf,
        efi_device: PathBuf,
        dual_disk: bool,       // true if raid1 arrays were created
    },
    NoDisk,
}

StorageStatus

pub struct StorageStatus {
    pub mosdata_device: Option<PathBuf>,
    pub mosboot_device: Option<PathBuf>,
    pub mosefi_device: Option<PathBuf>,
    pub is_mounted: bool,
    pub subvolumes: Vec<SubvolumeInfo>,
}

SubvolumeInfo

pub struct SubvolumeInfo {
    pub name: String,
    pub mount_point: Option<PathBuf>,
    pub is_mounted: bool,
}

StorageError

Variant When
EnumerationFailed Cannot read /sys/block
PartitioningFailed Partition creation failed
GptError GPT read/write error (gptman)
RaidError mdadm array create/assemble failed
FormatFailed mkfs.* or mkswap failed
MountFailed mount command failed
SubvolumeFailed btrfs subvolume command failed
CommandFailed Any other external command failed
DeviceInUse Device is mounted or has holders
DeviceNotEmpty Device has existing data
Io Underlying I/O error

What mos_sysvol does NOT do

  • Does not activate swap — deferred to volume manager
  • Does not install kernels or GRUB — only creates and mounts partitions
  • Does not assemble btrfs raid — volume manager's responsibility
  • Does not treat BIOS and UEFI differently for partition layout

Runtime requirements

Root privileges and these tools:

Tool Used for
modprobe Loading btrfs kernel module
blkid Filesystem label detection
mkfs.vfat Formatting ESP
mkfs.ext4 Formatting /boot
mkswap Formatting swap
mkfs.btrfs Formatting data partition
btrfs Subvolume creation and listing
mount / umount Mounting filesystems
mdadm RAID array creation and assembly (dual-disk)
udevadm or partprobe Kernel partition table reload

GPT operations are handled in pure Rust via gptman — no sgdisk or gdisk required.

Building

cargo build --release

Requires Rust 1.85+ (edition 2024). Linux only.

Testing

cargo test

Unit tests cover command line parsing, partition path generation, boot mode logic, GPT type GUID encoding, and disk pairing logic. Integration testing requires a VM with empty virtual disks.

License

Apache-2.0