btrfs for home usage

Posted on 2021-May-16 in storage

Some of my currently unused 2.5-inch drives

Some of my currently unused 2.5-inch drives

Over the years I have accumulated so many hard drives at home it is getting ridiculous. Last time I went through a decommissioning frenzy I must have dumped at least twenty of them to the tech recycle bin. Sizes ranged from several hundred megabytes to several hundred gigabytes. The drives were too small to be of any use, too power-greedy, noisy, slow (looking at you old 4GB bigfoot), or just plain defective (why did I even keep them?) with that dreadful tick-tick-tick music you never want to hear from a live machine.

Before throwing disks away I take care of erasing their contents while they can still be reached on a USB adapter. No need for complicated methods, a simple zeroing does the job:

dd if=/dev/zero of=/dev/sdX bs=4M status=progress

I will sometimes recycle the powerful magnets inside to stick pictures to my fridge, but unscrewing those little boxes can take some time.

Anyway: I was still left with a couple of perfectly functional terabyte-sized drives and decided to have some fun with filesystems because why not? I got a 4-bay USB3 enclosure for cheap on Amazon, installed the drives, and connected it to a Linux box. ZFS seemed like an interesting thing to learn but that box runs Debian: I would have needed to jump through hoops to install ZFS support and didn't feel so adventurous. I opted instead for the lesser-known btrfs, which has apparently been in the Linux kernel for over a decade now.

https://en.wikipedia.org/wiki/Btrfs

Seen from wikipedia, btrfs comes with most of the ZFS goodness, and sometimes more: snapshots, live checksumming, subvolumes, perfect solution to experiment building a simple array with just two (identical) terabyte-sized drives. Long story short: after a couple of months or so this thing ate all the data I copied to it. I only copied test files so nothing lost, and I learned quite a few things in the process.

In more details:

Whenever a btrfs filesystem is mounted you immediately get a dozen or so processes running full time and eating away at your CPU. I suppose they do some housekeeping like computing and validating checksums in the background. Nothing wrong with that but I will object to having a constant CPU load due to a filesystem. It would have been great to be able to control when housekeeping takes place but it either does not exist or I overlooked it.

At some point the disks started failing, and I only noticed because the filesystem had been remounted as read-only. Sure, 'dmesg' gave me a long list of access/read errors all over the place, but I didn't get any particular warning as a user until it was impossible to write a new file to that directory. A better user interface could have been to run a cron job to check for errors and send me an email when failures are detected? I managed to get that working with weekly SMART reports but could not find or overlooked any way to do that in the btrfs docs.

Once the drives had failed I could still read about half of the files. The other half was gone. Best I could do was 'ls' in a directory and get a screenful of question marks all over the place, indicating that the file system was corrupted. I tried copying some of the readable files but a quick MD5 indicated that they were corrupted too. Tough! Turning the machine off and on again didn't help: the filesystem would go read-only within a couple of minutes.

Ok so those drives are old: maybe 7 or 8 years, but their SMART reports indicate absolutely no errors. I ran daily short and weekly long off-line tests all along and all lights always came up green. I have now re-formatted them to ext4 and they are happily humming in the background as I type this, checking for bad blocks. Wait and see if there are more errors with a different filesystem.

Now to be fair: this is just a data point, and not a very scientific one at that. I could certainly have spent more time in the documentation to find out about how to get regular reports and what to do when corruption is detected, or even run some post-mortem analysis tools to understand what went wrong. This is certainly something I would have done in a professional context, just not at home when all I am trying to do is get a feel for a fun filesystem.

Let's be clear: btrfs sounds like a lot of fun. The notion of subvolumes within a volume is great, think of it like partitions that have no quotas. This can be useful when you only want to backup a subset of the drive. btrfs also has the notion of send and receive (like ZFS), allowing to easily make backups on live systems without stopping anything. You can also live-detach a drive from a group or add a new drive while the filesystem is under use. I haven't tried any of that but I could imagine myself using that on production servers.

The main reason why I wanted a checksumming filesystem was to get warnings before bitrot starts eating through my archives. Unfortunately I didn't get any of that and I still don't know if the corruption I got was purely due to me messing up the btrfs settings (didn't do much of that) or defective drives. Time will tell, but the point is made: a btrfs filesystem can go wild without any warning.

My conclusion: this filesystem seems still too rough for your average casual user. Like all professional tools, it requires some investment if you want to understand what is going on under the hood and how to best use it to serve your needs. To be fair, I would expect to get bitten just as well by a ZFS install on the same hardware. Life is short, I guess ext4 will have to do the job for now.