Notes After Migrating to BTRFS

Table of Contents

Last year I wrote that I wanted to migrate to BTRFS on my old, trusty laptop. I did it this weekend. Although I have bootstrap scripts, I didn’t want to reinstall OS so I simply copied my entire file system to the external hard drive and back. Here are some of my notes which I made along the process and which maybe will make someone’s life easier.

Data Migration

  • I have migrated from Ext4 on top of LVM on top of LUKS to BTRFS on top of LUKS.
  • Just in case I have made ordinary backup to devices which were not involved in the migration process.
  • To migrate I used external HDD which I connected via USB3. It was formatted to Ext4. Procedure was dead simple:
    • clone the whole file system to the external drive
    • re-partition my disks from the Live Debian Image (I used Debian Bookworm KDE Live image on USB stick).
      • KDE has wonderful partition GUI program, much better than gparted, which can create encrypted partitions
    • clone data back
  • For Debian Live systems the username is user and password live.
  • I used rsync to clone data.
    • I did it from Live image.
    • I first mounted my old FS to /mnt/r and USB drive to /mnt/usb
    • The extact command was: rsync -axHAWXS --numeric-ids --info=progress2 /mnt/r/ /mnt/usb/. Note slashes at the end of rsync’s paths which change the semantics of the command. I used the same command, but revesed paths, to clone data back.
  • Copying data from Ext4 to BTRFS often freezed and slowed down. It didn’t matter which command was used (rsync, cp).
    • After migration I made some tests and copying of ~30 GB of data from Ext4 to BTRFS froze whole system. It eventually unfroze.
    • I don’t know the reason of freezes.
    • It’s best to not plan on doing anything in parallel when copying is in progress.

Partitioning

  • I didn’t touch partition table for my main SSD because it contained separate /boot and /boot/efi partitions which I didn’t want to touch.
  • I created a separate partition for each of my SSDs (I have 2). One is a root file system, the other is storage used for interned downloads, photos etc.
    • I didn’t want to join these disks in RAID. They’re both 256 GB which is a little low these days. I know I should buy a new bigger disk and I have it on my list.
    • I didn’t want to join these disks in a single or RAID0 array, because when one disk fails in such array, then the whole array is dead.
    • Instead of RAID I’m taking regular backups to my NAS and to the cloud.
  • I created swap partition on my secondary SSD. Previously I used swap file, but I’ve read that BTRFS has problems with these in multi-disks setups. I’ve made it a little bigger than previous swap file, because I plan to enable hibernation one day.
  • I reformated both drives to BTRFS on LUKS. KDE partition manager works fine for this purpose. Alternatively cryptsetup luksFormat /dev/nvme0n1p3 worked for my NVME SSD.
    • I think it’s possible to reuse the old LUKS partition by mounting it and running wipefs on /dev/mapper/. I didn’t see a reason to do that and simply created new ones.
  • New partitions mean new UUIDs which must be changed in /etc/crypttab and /etc/fstab. Alternatively, btrfstune program, which is a part of btrfs-tools may be used to change UUIDs of offline filesystems.
  • At some point I migrated to a bigger drive. (sidenote: With clonezilla disk-to-disk) I used KDE partition manager to resize BTRFS partition on a new drive to use it fully. KDE partition manager has good support for BTRFS and runs btrfs commands underneath to resize file system. Even though, it “ate” ~200 GB of space: partition had 930 GB and btrfs filesystem usage showed only 700 GB available. I had to resize it manually: btrfs filesystem resize max /.

Chroot

  • I did the initial setup of BTRFS subvolumes in a chroot environment. I ran the following commands to get my FS into it:

    # cryptsetup luksOpen /dev/nvme0n1p3 crypt_nvme0n1p3
    # mkdir /mnt/p
    # mount /dev/mapper/crypt_nvme0n1p3 /mnt/r
    # mount /dev/nvme0n1p2 /mnt/r/boot
    # mount /dev/nvme0n1p1 /mnt/r/boot/efi
    # mkdir -p /mnt/r/{dev,media,mnt,proc,run,sys,tmp}
    # for d in dev media mnt proc run sys tmp; do mount --rbind /$d /mnt/r/$d; done
    # chroot /mnt/r
    
  • In the above it was important to mount both /boot and /boot/efi

  • I modified /etc/crypttab and /etc/fstab to reflect changes in UUIDs of partitions.
  • At the end of chroot session I updated initramfs and regenerated grub’s configuration:
    # update-initramfs -u -k all
    # update-grub
    

Mounting File System

  • In /etc/crypttab I resigned from the trick of unlocking many encrypted devices by passing a keyfile which resides on the first unlocked device. This trick is unsuitable for some RAIDs and multi-disk BTRFS setups which require all disks to properly mount a file system (sidenote: Maybe it works with noearly or some other options which I didn’t investigate.) Instead I chose to use a keyscript approach. Debian (and its derivatives I guess) ships with decrypt_keyctl script which caches a password for 60 seconds and passes it to all encrypted volumes in a configured group (which I called pw1), so if they share the same password, they’ll be automatically unlocked.

    # rootfs
    crypt_nvme0n1p3 UUID=... pw1 luks,discard,keyscript=decrypt_keyctl
    # storage
    crypt_sda1 UUID=... pw1 luks,discard,keyscript=decrypt_keyctl
    # swap
    crypt_sda2 UUID=... pw1,luks,discard,keyscript=decrypt_keyctl
    
    • You must run update-initramfs after adding a keyscript to crypttab.
  • I used the following options in /etc/fstab for BTRFS: UUID=... / btrfs subvol=@rootfs,noatime,compress=zstd,discard=async 0 1

    • discard=async is default since kernel 6.2. Apparently it helps to reduce read latencies.
    • in chroot phase it’s easy to forget about mounting BTRFS file system with compression enabled and compression works only for new files. To re-compress the whole system in this case, just defragment it. Example for zstd:

      btrfs filesystem defragment -r -v -czstd /

BTRFS Subvolumes

  • I followed openSUSE recommendations regarding subvolume layout: @rootfs for the root and nested subvolumes for /home, /opt, /root, /srv, /tmp, /usr/local and /var (which also has copy-on-write disabled: chattr +C /var).
    • BTRFS doesn’t create snapshots for nested subvolumes. It’s a feature, not a bug - above layout prevents data loss.
    • 2024-04-24: I usually don’t create a subvolume for /root, because I don’t use root account directly and don’t store any files there
  • Default subvolume is named @rootfs because I have read somewhere that this is how Debian calls its own default subvolume. I figured that this will make it more compatible with features which Debian cooks for BTRFS.
  • I configured snapper to make auto snapshots of @rootfs (which excludes nested subvolumes). It makes them on several occasions: on boot, before and after apt upgrade and once every hour. It performs auto-cleanup of old snapshots once a day.

    • snapper is a wonderful program which you set up once and then forget about them as they do their work.
    • snapper lists snapshots very slowly when BTRFS quota is enabled.
    • snapper creates snapshots in a /.snapshots directory. It’s a good idea to create a root-level snapshots subvolume and mount it to /.snapshots via fstab, because it should ease recoveries from a grub (I didn’t have opportunity to test this mechanism yet):

      mkdir /mnt/foo
      mount -o subvolid=5 /dev/... /mnt/foo
      cd /mnt/foo
      btrfs subvolume create @snapshots
      
      # and in /etc/fstab:
      UUID=... /.snapshots btrfs subvol=@snapshots,compress=zstd 0 0
      
    • grub-btrfs adds snapshots to grub, allowing to boot into snapshots

    • btrfs-assistant is a well-done GUI for btrfs management
  • I disabled quota feature which apparently is responsible for a lot of slow downs: btrfs quota disable <path>. I did it for all of my subvolumes.

Conversion from ext4

  • The following procedure worked for converting a simple ext4 filesystem on my other server:

    fsck.ext4 -f /dev/sda1
    btrfs-convert /dev/sda1
    
  • To create a subvolume structure mentioned above I used a trick which doesn’t use any additional disk space:

    mount /dev/sda1 /mnt/r && cd /mnt/r
    btrfs subvolume snapshot . @rootfs
    cd @rootfs
    rm -rf /home /hopt /root /srv /tmp /usr/local /var
    

    Now for each removed directory we can create a snapshot and remove+move all unwanted contents. For example (beware of Bashism in first line!):

    shopt -s extglob
    btrfs subvolume snapshot .. ./home
    cd home
    rm -rf -- !(home)
    mv home/* . ; mv home/.* . ; rmdir home
    
  • If /boot is on btrfs after convert, remember to reinstall it:

    grub-install /dev/sda
    update-grub
    
  • I screwed this up initially, but btrfs-convert -r /dev/sda1 worked flawlessly.

Daily maintenance

  • btrfsmaintenance is a “setup once and forget” package for periodic maintenance of BTRFS
  • Its configuration is in /etc/default/btrfsmaintenance. One should edit it and then activate with systemctl restart btrfsmaintenance-refresh.service, which setups several systemd timers for tasks like scrubbing and balancing of filesystem.
  • I did setup monthly scrubbing and balancing on my personal laptop. I disabled defragmentation and trimming.

Fedora

  • Fedora (as of version 40) supports BTRFS on installer level, but the interface for disk partitioning could be improved; it wasn’t crystal clear for me what I was doing. (sidenote: Terminology issues, maybe translation - I used Polish installer.)
  • I found it the easiest to choose the “semi-automatic” approach. (sidenote: Sorry, I don’t remember the exact name - not automatic, not gparted-like partitioning - the middle one.) I let Fedora use the whole disk using “BTRFS scheme” (sidenote: I had to manually remove all existing partitions before I started the installer. IIRC this wasn’t the case in Fedora 39.) and then I add the new mountpoints. Fedora automatically creates subvolumes for them (at the root level) and assigns their names.
  • Using one of immutable variants of Fedora may remove my need for snapshots. I’ll evaluate them when I decide to move from Debian on my main machine (probably when I’ll have to change hardware).
    • Fedora atomic projects are immutable, not reproducible (like NixOS), which is more important for me. But Fedora atomic don’t invent a language with horrible syntax.
    • rpm-ostree - the one component to bind them all.