Book review: Practical File System Design with the Be File System

I’ve been dragging the PDF version of Dominic Giampaolo‘s book around for some time but never bothered to read it until recently when I went fishing for PDFs in my archive (~/Downloads) to load up on my new toy, a reMarkable 2 tablet. It has been a while since I read a book related to technical details of computers so a book on a file system from the past ticked all the right boxes and was a good test to see how the tablet fares as a reading device.

I’ve not read many books specifically on file systems, only one that comes to mind is the Solaris 10 ZFS Essentials book on Prentice Hall which is more of an administrators guide rather than a dive into the implementation and the thinking behind it. The Practical File System Design book starts by introducing how the BFS project come about and works up from the concept of what is a file system, establishing terminology and building up a picture from blocks on a disk up to mounting a file system, reading and writing a file, features which enhance a file system, the hurdles when developing and testing the file system, across twelve chapters. The book dedicates a chapter to cover other file systems in use at the time like FFS (described as the grandfather of modern file systems), XFS (the burly nephew) , HFS (the odd-ball cousin), NTFS (the blue-suited distant relative), ext2 (the fast and unsafe grandchild).

Memory usage was also a big concern. We did not have the luxury of assuming large amounts of memory for buffers because the primary development system for BFS was a BeBox with 8 MB of memory.

Dominic Giampaolo

The initial target for the file system project was six months, to fit with the operating system’s release cycle, but took nine months for the first beta release and the final version shipped a month after. The book was written around sixteen months after the initial development of the file system.

After the first three months of development it became necessary to enable others to use the BFS, so BFS graduated to become a full-time member of kernel space. At this stage, although it was not feature complete (by far!), BFS had enough functionality for use as a traditional-style file system. As expected, the file system went from a level of apparent stability in my own testing to a devastating number of bugs the minute other people were allowed to use it. With immediate feedback from the testers, the file system often saw three or four fixes per day. After several weeks of continual refinements and close work with the testing group, the file system reached a milestone: it was now possible for other engineers to use it to work on their own part of the operating system without immediate fear of corruption.

The book was written at a time when HFS+ was a recent revision, the block size of most modern hard disks was 512 bytes, when a disk greater than 5 GB was considered very large, and companies like AltaVista were trying to bring search to the desktop (and Yahoo! many years later). The search part (attributes, indexing, and queries) as the book states is “the essence of why BFS is interesting”, Dominic Giampaolo would later join Apple and bring desktop search to OS X in the form of Spotlight.

A file system designer must make many choices when implementing a file system.
Not all features are appropriate or even necessary for all systems.
System constraints may dictate some choices, while available time and resources may dictate others.

I really liked the writing style of the book, it was very self contained in that it explained everything it introduced clearly, covering minute details which would cause problems, options for solving a particular problem, and routes taken. For example, in the data structures chapter the impact of disk block size on the virtual memory subsystem and the avenues it would close when they come to unify the buffer cache and the VM system or accommodating the users expectation instead of using elegant data-structure search algorithms (read The inmates are running the asylum by Alan Cooper).

The short amount of time to complete the project and the lack of engineering resources meant that there was little time to explore different designs and to experiment with completely untested ideas.

The journaling and disk block cache chapters were my favourite to read. The journaling chapter made me realise my lack of understanding of journaling and what I thought I knew about how it worked, assuming that just because the term journaling was used the feature performed the same across different implementations (metadata journaling and the meaning of consistency vs storage leaks). Regarding caching, I still struggle with the concept of write back vs through in the abstract so always interested to read more about the subject.

The chapter on the vnode layer explained how the filesystem hooked into the kernel. Describing what it means in terms of process to mount a file system, starting from an i-node to vnode and back down from how the kernel interacts with the file system via the vnode layer using the functions provided by the file system and support for live queries, proceeded by the API the operating system offers for manipulating files in the following chapter.

A vnode layer connects the user-level abstraction of a file descriptor with specific file system implementations. In general, a vnode layer allows many different file systems to hook into the file system name space and appear as one seamless unit.

The API chapter was amusing to read because of the human aspect of the problem and trying to come to an agreement on approach, here being fought out between those in favour of Macintosh style file handling and POSIX style.

The BeOS C++ API for manipulating files and performing I/O suffered a traumatic birthing process. Many forces drove the design back and forth between the extremes of POSIX-dom and Macintosh-like file handling. The API changed many times, the class hierarchy mutated just as many times, and with only two weeks to go before shipping, the API went through one more spasmodic change. This tumultuous process resulted from trying to appeal to too many different desires. In the end it seemed that no one was particularly pleased. Although the API is functional and not overly burdensome to use, each of the people involved in the design would have done it slightly differently, and some parts of the API still seem quirky at times. The difficulties that arose were never in the implementation but rather in the design: how to structure the classes and what features to provide in each.

The book wraps up with a chapter on testing and various approaches to shake out bugs. One suggestion to stress the file system in early 1998, was to support a full USENET feed, resulting in 2GB of data per day at least being written to disk. When collecting more PDFs after reading the journaling chapter, I found a paper from USENIX in 2000 which states “anecdotal evidence suggests that a full news feed today is 15-20 GB per day”.
ISC‘s InterNet News (INN) and netnews were useful tools for testing the robustness of a file system.

Of these tests, the most stressful by far is handling an Internet news feed. The volume of traffic of a full Internet news feed is on the order of 2 GB per day spread over several hundred thousand messages (in early 1998). The INN software package stores each message in a separate file and uses the file system hierarchy to manage the news hierarchy. In addition to the large number of files, the news system also uses several large databases stored in files that contain overview and history information about all the active articles in the news system. The amount of activity, the sizes of the files, and the sheer number of files involved make running INN perhaps the most brutal test any file system can endure. Running the INN software and accepting a full news feed is a significant task. Unfortunately the INN software does not yet run on BeOS, and so this test was not possible (hence the reason for creating the synthetic news test program). A file system able to support the real INN software and to do so without corrupting the disk is a truly mature file system.

The book was a great read, and provides lots of historical context and grounding of concepts for an autodidact (just don’t take away thinking 5GB HDD is a large disk). From a nostalgia perspective it was interesting because of the desktop search thing that was happening around that time and more recently the Systems We Love talk regarding the search capabilities of BFS.

At the time I never had the full BeOS experience since I didn’t have a system with enough RAM. I could boot BeOS but the system decomposed to no sound nor colour!
I recall a disappointed experience, trying to boot the demo copy of BeOS v4.5? Personal Edition from a PC Plus cover disk.
It would’ve been nice to use the colour display capabilities of my CRT at the very least. 🙂

I have amassed more PDFs and currently reading a paper from 1996 on Scalability in the XFS File System which closes with

Adam SweeneyDoug DoucetteWei HuCurtis AndersonMichael Nishimoto, and Geoff Peck are all members of the Server Technology group at Silicon Graphics. Adam went to Stanford, Doug to NYU and Berkeley, Wei to MIT, Curtis to Cal Poly, Michael to Berkeley and Stanford, and Geoff to Harvard and Berkeley. None of them holds a Ph.D. All together they have worked at somewhere around 27 companies, on projects including secure operating systems, distributed operating systems, fault tolerant systems, and plain old Unix systems. None of them intends to make a career out of building file systems, but they all enjoyed building one.

Scalability in the XFS File System

There’s an article on BFS at Ars Technica if you want to read more about the file system. The article features an interview with BFS developers at Be & Haiku, and a comment by jean-Louis Gassée.

As an aside, the reMarkable 2 is physically a really nice device to hold, display is great, the ability to extract my highlighted items from a PDF could be a lot better, I could export a copy of this book as a PDF but there’s no way to get a view of just highlighted items and it’s not possible to copy from pdf on the device which meant I had to manually scan the exported PDF and save them in my notes.