HFS “Incorrect number of thread records” error

Note: this post doesn’t provide a solution to address the “Incorrect number of thread records” file system issue but documents what I went through to see if I could when I faced it. I think your best bet would probably be to try ALSOFT’s Disk Warrior if you need to fix this issue.

I was sitting in front of a Mac Pro which had previously suffered from boot issues due to the expansion boards holding the memory not making contact with the logic board properly. The boot issue manifested itself as the system not POSTing and the fans spinning loudly. Luckily the issue was resolved by reseating things. The system also had a failing Cinema Display which would work fine after a “cold” power cycle having removed the power connector from PSU and reattached, but would fail to wake from sleep if it had been left on. The user was not aware of this issue and assumed whenever the system wouldn’t display anything on boot, it was the POST issue and would power cycle the system, hoping to strike it lucky. Through doing this, the HFS+ journaled file system ended up with problems where fsck_hfs(8) was unable to fix the issue.

Since the filesystem had problems, the system would try to boot, the grey screen with the Apple logo would appear along with a progress bar which would move from left to right and then after some time the system would switch off.

Booting the system in verbose mode by holding command + v showed that the system was running fsck on boot which is what the progress bar was an indicator for normally. It would attempt to repair the file system 3 times, fail, and switch off.

Connecting to another Mac via TDM, it was possible to mount the filesystem as read only and take a closer look. It would have been possible to also boot the system in rescue mode but the Mac I connected to via TDM was running a newer version of macOS and I was hoping that the tooling would be better able to deal with such an issue though fsck_hfs(8) bugs section states “fsck_hfs is not able to fix some inconsistencies that it detects“, regardless.

% sudo fsck_hfs -ypd /dev/disk2s2 
/dev/rdisk2s2: starting
journal_replay(/dev/disk2s2) returned 0
	Using cacheBlockSize=32K cacheTotalBlock=32768 cacheSize=1048576K.
   Executing fsck_hfs (version hfs-522.100.5).
** Checking Journaled HFS Plus volume.
   The volume name is Macintosh HD
** Checking extents overflow file.
** Checking catalog file.
   Incorrect number of thread records
(4, 21962)
	CheckCatalogBTree: fileCount = 421327, fileThread = 421326
** Checking multi-linked files.
   Orphaned open unlinked file temp2639983
   Orphaned open unlinked file temp2651454
   Whole bunch of these temp files trimmed from listing
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
   Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
                  CBTStat = 0x0800 CatStat = 0x00000000
** Repairing volume.
 	FixOrphanedFiles: nodeName for id=2671681 do not match
	FixOrphanedFiles: Created thread record for id=2671681 (err=0)
	FixOrphanedFiles: nodeName for id=2671681 do not match
	FixOrphanedFiles: Created thread record for id=2671681 (err=0)
	FixOrphanedFiles: nodeName for id=2671681 do not match
	FixOrphanedFiles: Created thread record for id=2671681 (err=0)
	FixOrphanedFiles: nodeName for id=2671681 do not match
	FixOrphanedFiles: Created thread record for id=2671681 (err=0)
** Rechecking volume.

Repeat again a second time

** Checking Journaled HFS Plus volume.
   The volume name is Macintosh HD
** Checking extents overflow file.
** Checking catalog file.
   Incorrect number of thread records
(4, 21962)
	CheckCatalogBTree: fileCount = 421327, fileThread = 421326
** Checking multi-linked files.
   Orphaned open unlinked file temp2639983
   Orphaned open unlinked file temp2651454
   Whole bunch of these temp files trimmed from listing
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
   Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
                  CBTStat = 0x0800 CatStat = 0x00000000
** The volume Macintosh HD could not be repaired after 3 attempts.
	volume type is pure HFS+ 
	primary MDB is at block 0 0x00 
	alternate MDB is at block 0 0x00 
	primary VHB is at block 2 0x02 
	alternate VHB is at block 975093950 0x3a1ec0be 
	sector size = 512 0x200 
	VolumeObject flags = 0x07 
	total sectors for volume = 975093952 0x3a1ec0c0 
	total sectors for embedded volume = 0 0x00 
	CheckHFS returned 8, fsmodified = 1

It was possible to find out the offending files by looking up the inodes listed in the fsck_hfs output. Note the id reported in “FixOrphanedFiles: nodeName for id=2671681 do not match” and use find(1) to look it up

% find /tmp/p -inum 2671681

Ironically, the files causing the issue are auditd(8) logs from crash reports?

I thought perhaps turning off journaling would help sidestep the issue by causing fsck to remove the offending files, rather than trying to make use of journal data to replay. hfs.util(8) which is tucked away in /System/Library/Filesystems/hfs.fs/Contents/Resources let’s you do that.

% sudo /System/Library/Filesystems/hfs.fs/Contents/Resources/hfs.util -N /dev/disk2s2 
Turned off the journaling bit for /dev/disk2s2

It didn’t help.

hfs.util‘s supposedly supports a -M (Force Mount) but I was unable to get this to work. I was hoping to force mount the file system read/write & delete the 2 files.

I ended up wiping the disk and reinstalling macOS.

As an aside, the history section of the hfs.util(8) claims it was “Derived from the Openstep Workspace Manager file system utility programs“. The sources for hfs v106 package on Apple’s site shed some more light. Oldest entry in “change history” section of hfsutil_main.c states

13-Jan-1998 jwc 		first cut (derived from old NextStep macfs.util code and cdrom.util code).

Note the description of what main() does in a comment block inside hfsutil_main.c.

Purpose -
This our main entry point to this utility.  We get called by the WorkSpace.  See ParseArgs for detail info on input arguments.
Input -
argc - the number of arguments in argv.
argv - array of arguments.
Output -
returns FSUR_IO_SUCCESS if OK else one of the other FSUR_xyz errors in loadable_fs.h.

There are icons for HFS formatted disks from Rhapsody in the directory: hfs_RHD.fs.tiff, hfs_RHD.openfs.tiff. These live on in hfs v556.60.1 which is the most recent version available on the site as I write this.

LFS, round #5

Up to this point I’ve been working with a chroot to build OS images from loop back mounted flat file which is then converted to the vmdk format for testing with virtualbox. I created packages for bpftrace and BCC, BCC was fairly trivial and the availability of a single archive which includes submodules as bcc-src-with-submodule.tar.gz helped avoiding the need to package libbpf. bpftrace doesn’t offer such an archive and tries to clone the googletest repo which I sidestepped addressing just to obtain the package. Both packages worked ok though I only tested the Python side of BCC and not LuaJit.

Execsnoop via BCC
Execsnoop from BCC

With that I wanted to see if what I had would boot on actual hardware, so dd’d the flat file to a usb flash drive and booted it on a Dell Optiplex. Things worked as far as making it to grub but then hit a couple of glitches. First issue was that because of the delay probing the USB bus the kernel needs to be passed the rootwait keyword that I was missing so it would just panic as no root file system could be found otherwise. After that I hit the issue that I’d nailed things to a specific device node (sda) and with the other disks in the system the flash drive was now another device node (sdb). Addressing that got me to the login prompt and I was able to repartition the SSD installed in the system with cfdisk, make a new file system, copy the contents of flash drive to SSD, install grub and reboot to boot the system off the new Linux install.

As the grub-mkconfig had included references to the GUID of the file system on the flash drive the system landed in the GRUB rescue mode. Since it wasn’t able to load the config nothing is loaded, most importantly, its prefix variable is set incorrectly. This results in a strange behaviour where nothing which would normally work in the GRUB prompt works. Setting the prefix variable to the correct path allows you to load the “normal module” and switch from rescue mode to normal mode.

grub rescue> set prefix=(hd0,msdos1)/boot/grub
grub rescue> insmod normal
grub rescue> normal

Once in normal mode it was possible to boot the system by loading the ext2 module and pointing the linux command to the path of the kernel to boot. Re-running grub-mkconfig once the system was up generated a working config.

With a faster build machine, the next step is to produce a fresh image, address these nits, and start putting things together to share.

Execsnoop via bpftrace
Execsnoop from bpftrace

Forcing an Xcode command line tools reinstall in order to update

Lost some time debugging a build issue which I was unable to reproduce. Turns out I was on an older version of clang despite both of us running the same version of macOS Catalina. Though you install the command line tools using xcode-select --install, there’s no way to force a reinstall with the tool as rerunning the command will tell you xcode-select: error: command line tools are already installed, use "Software Update" to install updates.
So updates are managed via the Software Update section in System Preferences and macOS reckons I’m up to date.

You can remove /Library/Developer/CommandLineTools and rerun xcode-select --install at which point you’ll obtain the latest version of command line tools. As a bonus while the install is in progress, macOS will serve a notice that an update is available and pop up the Software Update section in System Preferences.

When the initial install process invoked by running xcode-select complete, the update offered via Software Update disappears and it goes back to reporting everything is up to date.

While these two events were happening I wondered why the initial download was clocking up hours to download a 451MB file so I fired up tcpdump to see if there was any traffic coming through, turns out actually my machine was very busy downloading from an IP address of my ISP via plain HTTP. I initially wrote this post thinking that command line tools are not updated across major OS version upgrades, but I’m now wondering if the cache at the ISP is stale which is why I do not have the update. I also was not served the iOS 14.3 update notice until sometime last week though it was relased over a month ago!

LFS, round #4

Haven’t made any progress for a couple of weeks but things came together and instrumenting libc works as expected. One example demonstrated in section 12.2.2, chapter 12 of the BPF Performance book is attempting to instrument bash compiled without frame pointers where you only see a call to the read function.

Instrumenting bash which has been built with -fomit-frame-pointer.

Compiling with -fno-omit-frame-pointer produces a stack trace, within bash but the first frame is unresolved if libc isn’t also built with it, resulting in a hexadecimal address rather than showing a call to _start.

stack trace from bash on system with bash and libc built with -fno-omit-frame-pointer.

I’m going to look at integrating ZFS support next but I’m thinking of sharing a build as-is for the benefit of anyone wanting to work through the BPF Performance book.

LFS, round #3

With the OS image that I wrote about in the previous post I was able to build a new distro with the various substitution I’d made. There were 3 things that I wanted to mention in this post. First, turns out Linux has a bc(1) build dependency which I found when I omitted building it and came to compile the kernel. Second, you really need to run make defconfig before visiting the kernel menuconfig, otherwise you end up with a “default kernel config” lacking any on disk file system support (memory and network file systems are still supported). This was the reason why the kernel was unable to find init in the previous post.

My workflow is that I have flat file which I mount via loop back and I use that as the file system to build the OS in a chroot. When it comes to installing a boot loader using grub-install, if sysfs is not mounted to a location from within the chroot, grub-install will fail, complaining Unknown device "/dev/loop10p1": No such device, p1 being a child node for the first partition on the file backed by /dev/loop10 which is odd as grub-install was pointed at /dev/loop10, the parent node. This is because it is trying to enumerate the start of partitions so it can work out the start sector via sysfs (see grub-core/osdep/linux/hostdisk.c, starting at sysfs_partition_path() . Elsewhere it might have been achieved via ioctls. Either way it is asking the kernel but the dependency for a pseudo filesystem to be mounted in place threw me but looking around /sys/devices/virtual/block/loop10 I see files which expose various characteristics, including offset and start points for each partition.

My focus is now on building the dependencies for bpftrace which means getting BCC built and installed first.

LFS, round #2, 3rd try

In my previous post I ended with the binutils test suite not being happy after steering off the guide and making some changes to which components were installed. I decided to start again but cut back on the changes and see just how much I could omit from installing to get to the point of completing chapter 8. Ideally, I would like to shed everything that’s only a build dependency. It was doable but the sticking point was Python which is needed by Meson / ninja in order to build systemd and though you build Python earlier in chapter 7, at that stage it is built without the ctypes module as it requires libffi and the ctypes module is needed by Meson.

I thought I’d cheat by using pkgsrc to satisfy the build dependencies but the infrastructure for detecting termcap support is unable to detect support via ncursesw which LFS uses. Opting to prefer satisfying all dependencies from pkgsrc which is now the default setting for pkgsrc on Linux created a new problem that the components you compile manually outside of pkgsrc which call pkg-config would link to the pkgsrc versions of dependencies. To side step this issue I moved the pkg-config binary from pkgsrc out of the way where upon I hit an issue with linking systemd. After a night’s sleep I found that ninja in pkgsrc is patched to not adjust the rpath in binaries and this is need for systemd’s binaries because the libraries they depend on are tucked away in a subdirectory.

Upon completing chapter 8, I went back and started afresh once more, this time with the intent to make changes and substitutions once again. I installed bash as /bin/bash but did not create a link to /bin/sh and was surprised to find most things were happy with that, the autoconfed infrastructure could cope, until I reached binutils in chapter 8 where it called /bin/sh explicitly in tests. At this point I installed mksh and pointed /bin/sh to it. This revealed various failures from scripts and tests on other packages which were built after binutils in chapter 8, most significantly by GCC’s build infrastructure. Setting the CONIFG_SHELL variable to /bin/bash when envoking configure ensured that bash was called instead of sh during the build and when invoking the test stage, as the SHELL variable inherits this setting down the line and things move on smoothly. I need to look at getting binutils handle the override as well, rather than hardcoding /bin/sh.

All build dependencies were installed in a separate prefix so that they could be removed after the build. m4, make, bison, Perl, Python, gawk, pkg-config (built with pc location set to /usr/lib/pkgconfig), autoconf, automake, libffi, check, expect, flex, TCL were installed in this location.

Python’s build infrastructure assumes system provides libffi and if the system doesn’t, it struggles with linking. There’s a bug report to teach the build to make use of the information from pkg-config for libffi but the proposed patch in my case did not work as the location under the new prefix where libraries are installed were not in the search path for the dynamic linker. Since I was installing Python in the same prefix as libffi was already installed in, I adjusted the rpath by setting LDFLAGS.

Besides the bash to mksh swap for /bin/sh, I replaced gawk with nawk once more, there was no fallout though I did also install gawk under the new prefix as glibc requires it. tar was swapped with bsdtar from libarchive, Man-DB for mandoc.

I skipped on texinfo as it has a Perl runtime dependency and I don’t want to include Perl in the base OS. Groff was out as it had a texinfo dependency. I omitted libelf as I thought it was only used by tc from IPRoute2 which is for setting QoS policies, turns out it’s a build dependency for the kernel so that went back in.

With everything in place, I managed to build a kernel which for some reason couldn’t go multiuser because it couldn’t find init or sh! Comparing the config with my image from round #1 showed lots of differences which was baffling as I thought I’d only made the changes which the LFS guide suggests. Starting with a new config resolved the issue. I can only suspect that I must’ve pressed space bar by accident when navigating the kernel config menus which switched a bunch of stuff off. 🙁

I now have an OS image which appears to work. For the next round to put it to the test, I am going to try and use it to build a new distro. I will need to address the binutils build infrastructure issue so that I can point it to bash, otherwise I suspect I will run into the issues again when running the test suite (see previous post). I would also like to try and swap binutils out for elftoolchain. I have also been thinking about subsequent OS upgrades and using mtree for that.

LFS, round #1

Following on from the previous blog post, I started on the path of build a Linux From Scratch distribution. The project offers two paths, one using traditional Sys V init and systemd for the other. I opted for systemd route and followed the guide, it was all very straight forward. Essentially you fetch a bunch of source archives off the internet, run tar, configure, make, make test, make install a bunch of times with some system setup in between, before compiling your kernel, putting it into place in /boot and getting grub installed and configured.

The book assumes that you have a system running Linux already and you have spare space which you use for a new partition to install your linux distro that you built. I opted to create a 10GB file which I mounted via loopback instead, with the intention of using it as the boot disk of a virtual machine.

The guide has 11 chapters which takes you through building software on the host, as a new user (for a clean environment), then in a chroot with three iterations of GCC and binutils builds. With each software component that you’ll build, the guide instructs you on how to run their test suite, what failures should be expected and why, before performing an install.

For each component that you build, the guide documents why a patch is applied and what the configure options specified mean. At the end of each section all the installed components are documented, followed by a short description of each item. Unfortunately the dependencies are not documented but sort of implied by the order which things appear in the guide.

Chapter 8 is the most laborious with a hefty 71 packages to install/reinstall. End result is a fully fledged environment with Autotools, Perl, Python, TCL, GCC, Vim, various libraries, compression tools. If you follow the guide every step of the way, it should work a-ok in the end, providing the test suites passed as expected at each stage.

After I finished all 11 step, I had to convert my flat file which I created with dd(1) to a format which VirtualBox would recognise. I wasn’t sure if any of the supported formats was a flat file with a new file extension and it was quicker to convert it to a vmdk file than work through the list to see. First try it made it to the GRUB menu which was nice.

Grub menu

Followed by a panic as I guessed the wrong device node to specify as root in my grub.cfg.

Panic on first boot

A re-edit of the config to specify the device node hinted in the kernel panic and I made it to the login prompt.

First successful boot

At this point I began thinking how much trouble I could get into by substituting or omitting components and started a fresh new build.

Inverse vandalism: the making of things because you can

Alan Kay, “The Early History of Smalltalk,” ACM SIGPLAN Notices Volume 28, March 1, 1993

LFS round #2 started life with nawk instead of gawk, no bash installed but mksh is /bin/sh, BSD make instead of GNU make. No Perl, Python, TCL, gettext, bunzip, xz. BSD make got swapped for GNU make on the first step as it wasn’t happy about its sys.mk that was installed but will be revisited. I made it back to chapter 8 quickly (look ma, less dependencies) and things began to fall apart with rebuilding Glibc, turns out it really wants bison, Python, gawk not nawk. Glibc also really wants bash as well but its configure test is happy with mksh and it passes. It became apparent it wanted bash when running the test suite as some tests call /bin/bash specifically and stop when it is not found. At this point my environment began behaving strangely so I exited the chroot and I couldn’t get back in. Running strace on chroot showed that calls to execve() were returning ENOENT. Rebuilding glibc from the host environment allowed me to get back into the chroot once again, at which point I installed bash. For glibc’s Python dependency, I decided to treat it as part of the bootstrap kit as it seems to be a build dependency. Python got built without shared components (--disable-shared) and installed it in a separate prefix with the plan to remove it after the system is built. From glibc I jumped to building binutils in my chroot and again things came tumbling down during the test suite run. It was not happy about finding libgcc_s, despite the system being aware of it in its ld cache but I haven’t had a chance to investigate any further. I feel very much lost in the bazaar but I’m having fun. 🙂

How to open source: going from NetBSD to Linux

TL;DR: some BSD user tries something other and wonder why things are different.

This post has sat in draft form for quite some time. At first it was written with highlighting the NetBSD project in mind and I started thinking about revisiting it recently due to frustration with running a mainstream Linux distribution when investigating

  • how some critical libraries I was running were built
  • what, if any changes were made to them
  • wondering why the source repositories for components were buried away if at all available.

The recent article on LWN titled Toward a “modern” Emacs mentioned the frustration with distributions provided sufficient confirmation bias to get this together and posted. Note: this is not intended as a bragging contest about NetBSD or pkgsrc or a put down for Linux, but perhaps things I’m not grasping and expecting one to be like the other.

Each technology community has a set of norms around how they interact with their technology, with regard to obtaining software for example mobile users obtain theirs from an “app store”. Mac OS/Windows users traditionally would install packaged software but now mostly obtain their software from a store again. It would be odd to be given a source archive and asked to compile the software for yourself as a user on these platforms (if the source code was even available to users). Unix was the opposite, it was common to receive software in source form and have to compile it yourself. By association and nature (Open Source software) so do GNU/Linux distributions, however binary packages are provided and encouraged for use. The packages save a great deal of compilation time and lower the barrier for users which again is a good thing. I get the impression the details regarding source code and changes do not get the same spotlight especially in a security context, for example as I edit this post, among the most recent advisories on the Debian security page is an advisory for ModSecurity, fairly short, lists the CVEs and states “We recommend that you upgrade your modsecurity packages.”. If I’m interested in the actual changes to the package it’s buried five pages away from the advisory. The GUI update manager on my distro goes as far as collapsing the description panel for the updates which I find amusing.

Software updater in Ubuntu

I agree hiding technical detail from a user is a valid case. Actually, while trying to take this screenshot I visited the bug report of the GCC update and with a bit of clicking around, I found a link to a diff of changes. Why can’t the advisories document both paths (build your own or obtain the packages) and allow the user to choose.
I was hoping for something a bit more flexible which would allow me to use what’s in place and also allow me to rebuild the system or parts with ease should I wish/need to.
Relying on a distribution as a means of obtaining gratis binaries to use, at best, isn’t very appealing.
Use of Open Source software in such a way while completely acceptable overlooks the opportunity to mould software to your requirements should you be inclined.
Given a piece of software, to consume provided binaries, avoiding any customisation is akin to bending around an implementation and is actually heading in the opposite direction of what Open Source software is able to allow you to do.
Let me clarify, I’m not saying just because a piece of software is Open Source it must be compiled by every user by them self for maximum benefit (a talk I gave in 2019 was torpedoed by the objection that one should build their own version of Chrome or Firefox 🙂 ).
I’m suggesting that if you are relying on tools of an Open Source nature, you are best off owning your stack.
That is, you take active participation in projects, for you are able to help shape the evolution of your tools through participating and get insight into upcoming changes.
This makes upgrades and maintenance smoother because you are not reliant on a 3rd party and their release cycle for updates, potentially resulting in long gaps between upgrades which could also mean big jumps between major versions when you do upgrade, bringing about many changes since the previous version you were running.
You become familiarised with the process to assemble your tools which helps when you are reasoning about your stack during debugging.
Questions like “are there local changes from your distribution?” are off the table. e.g Linux From Scratch
Bad tools harbour bad habits.
The shortcomings of a bad tool are pushed on to the user/operator who is then forced to tolerate them and work around accordingly, leading to a clumsy workflow. See Poka-yoke
With a system composed of many such components, it becomes harder and harder to think about new ideas or existing problems in a new way because of the mental burden of coping with what is currently in place and adapting, leading to paralysis and surviving in maintenance mode where the system remains static and is kept running because no one dares make a change.

The enjoyment of one’s tools is an essential ingredient of successful work.

Vol. II, Seminumerical Algorithms, Section 4.2.2 part A, final paragraph

Enter NetBSD and pkgsrc which is where I was coming from as a user.
NetBSD is an open source operating system with a focus on portability.
It has been around since the early 1990s and is the oldest Open Source operating system which is still actively developed as well as one of the oldest active source code repositories on the internet today. The lineage of the code base is easily traceable back to the early days of UNIX thanks to the CSRG archive repository. This is not so important as a first port of call for a new user or for day to day operation but provides useful insight during debugging/troubleshooting.
Having the source code alone is not as useful as having access to the source repo and the history of the code base with commit messages (not that all commit messages are useful).
As with the other BSDs, the current source repository plays a prominent role on the front page of the website and very easy to find.
pkgsrc is NetBSD’s sibling packaging system project with a similar focus on portability.
pkgsrc provides a framework to package tens of thousands of pieces of software consistently across many different operating systems.
In combination of the two there is a complete stack to compose a system with, from operating system to a suite of 3rd party software (including Chrome and Mozilla based browsers, FYI! 🙂 ) or to take selected components and extend other systems with.
As an example, a feature of NetBSD is a tool called Rump Kernel. Rump allows you to instantiate an instance of the NetBSD kernel in the user space of another operating system instance. A common use of this in NetBSD is for testing, it is possible to perform tests on vital components of a system, safely, and on failure the result at worse is a failed system process, rather than a system crash. This saves valuable time between iterations when debugging, especially on larger systems where boot processes run into minutes (think about a server with a large number of disks attached, easily ~ 10 minutes or more to POST and probe a full shelf of disks before even getting to booting the operating system). Rump can also be used to supplement functionality on operating systems, saving development time of device drivers or subsystems. An example of this is the use of Rump in the GNU/Hurd operating system to provide a sound system and drivers for sound cards.
pkgsrc with its support for a range of operating systems means that it is possible to unify your workflow across a range of systems with relation to deploying software. This makes it possible to run the same variety of software with identical changes regardless of operating system. pkgsrc also provides the flexibility to select where dependencies are satisfied from, where possible. That is, if the host operating system provides a component as standard, pkgsrc could make use of it rather than building yet another copy of it, or as time goes on, with legacy systems it may be preferred not to use any such components provided by the host operating system but to only make use of components from pkgsrc, this is also possible. Like pkgsrc, NetBSD has its own build framework which makes it easy to build a release or cross build from another operating system which has a compiler installed. It feels very much like NetBSD comes to you and you work on it from your environment of choice rather that you having to change your environment to it in order to work on it, and the tools you become comfortable with you get to take with you to other platforms. You end up with a toolbox with for solving problems.
The GNU eco system itself is a vast toolbox to pick from also but I’m missing the integration and struggling with the fragmentation and the differences in project management if any. Source code up on a project hosting site alone is no good, neither is just a project site without access to the source code repository, you need both to engage with a project, to be able to track changes and to participate in the community. One doesn’t replace the other.

How I ended up here is that I installed Ubuntu because it provided ZFS support out of the box and I didn’t need to worry about things like pinning kernel versions to prevent kernel updates from rendering my machine un-bootable until I built new modules somehow and I thought it would be the easiest way to work through the BPF performance book. My experience with Linux has been with traditional distros, started on Slackware, then onto RedHat 5.x, Suse 6.x, Debian (Woody) and now Ubuntu 20.04. Tried Gentoo once about 15 years ago but never got past building an unbootable system from their live cd env I recall. I have not tried more recent distributions like Arch, void and such. I’m currently playing with Linux From Scratch.

May the source be with you

Marshall Kirk McKusick

Trying to operate macOS in single user mode

Wednesday lunch time, I opened up my laptop and in the middle of writing an email my machine froze and after a few seconds rebooted. Uh oh, the system sat at the grey screen for a few seconds and then the dreaded folder with a question mark began flashing which means there was no bootable disk found.

Turned the machine off, turned it back on again, ah, a message that the machine had crashed and restarting once more it booted as usual, making it back into macOS before it did the same freeze and reset again. I ended up spending the rest of the day trying various things to get my data off the disk before the SSD stopped responding all together Wednesday night.

I thought to rule out file system issues first.
Booting single user mode and running fsck_apfs(8) didn’t get very far when I first tried. SIGINFO reported that bash was waiting meaning that it never got to executing fsck_apfs.
Restarting and trying to boot in recovery mode to run the file system check using Disk Utility didn’t work out too well. Upon reaching the GUI, recovery mode began to spin, if things worked ok I would have been greeted with a file vault encrypted disk to unlock but it didn’t and the spinning spiral would go around endlessly, so it was back to single user mode.

In single user mode all data is accessible but the file systems are mounted read only with the exception of /private/var/vm which is writable. At the end of booting into single user mode, the system reports:
To mount the root device as read-write:
$ /sbin/mount -X /

But the mount command on Catalina 10.15.7 has no such option, the old advised method of using -uw instead of -X still worked however.

While I was experimenting I noticed that I had spent a considerable amount of time in single user mode and the system never hard reset like when I booted normally.
I intended to copy the data to another machine, however ifconfig reported no interfaces.
I mistakenly thought that I could load the relevant kernel extensions and could slowly bring things up bit by bit that way. Except can’t do any of that because SIP prevents you.
localhost:/ root# kextload /System/Library/Extensions/some.kext
/System/Library/Extensions/some.kext has invalid signature: Trust code is disabled.
Untrusted kexts are not allowed
Kext rejected due to invalid signature: <OSKext 0xSOMEHEX [0xSOMEHEX]> { URL = "file:///System/Library/Extensions/some.kext", ID = "com.apple.foo.bar" }
/System/Library/Extensions/some.kext failed security checks; failing.

I tried several times again to get back into recovery mode environment in order to disable SIP using csrutil disable from the terminal there, out of sheer luck the disk behaved long enough one time that I managed to get the disk unlocked and make it in. I disabled SIP and while I was there I checked the disk with Disk Utility. Things started off ok and while it was spending some time checking the Data volume it hard reset. This definitely wasn’t a file system issue and an indicator that the hardware is misbehaving, which meant it probably wouldn’t be long before I lost access to the data on there.

Back in single user mode I confirmed SIP was disabled
localhost:/ root# csrutil status
System Integrity Protection status: disabled.

After working my way through loading what I thought were relevant extensions I gave up and started looking up how to bring up the system, I was trying to get either an Apple USB Ethernet adapter, Thunderbolt Gigabit adapter or the builtin Airport card to work.

To get the baseline system going you need to start kextd(8), notifyd(8), configd(8). Once diskarbitrationd(8) is loaded, it pulls the relevant dependencies to get networking running.

launchctl load /System/Library/LaunchDaemons/com.apple.kextd.plist
launchctl load /System/Library/LaunchDaemons/com.apple.notifyd.plist
launchctl load /System/Library/LaunchDaemons/com.apple.configd.plist
launchctl load /System/Library/LaunchDaemons/com.apple.diskarbitrationd.plist

To configure your wireless card use airport(8) which can be found at /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport

macOS airport(8) manual

I opted for the Thunderbolt Gigabit adapter as that was my fastest option, the interface would autoconf with DHCP/RS, and could be configured with just ifconfig(8) but I couldn’t get an NFS share mounted which I now suspect was because I did not specify the use of a reserved port when mounting the share on the Mac.
localhost:/ root# mount_nfs -o resvport my-nfs-server:/share /net
As I was racing against time I ended up cobbling together a USB disk which was HFS+ formatted and used rsync to clone my home directory. Since the system was in single user mode, new disks would not be auto mounted (that’s what diskarbitrationd does normally) and issuing diskutil list would not work. Without diskarbitrationd loaded it complains
Unable to run because unable to use the DiskManagement framework.
Common reasons include, but are not limited to, the DiskArbitration framework being unavailable due to being booted in single-user mode.

and with diskarbitrationd loaded it complains
Could not start up a DiskManagement session
You can instead use fstyp(8) by pointing it at device nodes to find out the file system type on the other side of the device node.
Before connecting a disk, run ls /dev/disk* to see what’s there already, attach the disk, repeat ls /dev/disk* to see which new nodes have been created. Point fstyp at those device nodes to find the correct one with the filesystem, in this case it was HFS.
localhost:/ root# fstyp /dev/disk2s2

localhost:/ root# mount_hfs /dev/disk2s2 /net
I then began to rsync my data to the external disk with rsync -av /Users/sme /net/ and after a while the disk I/O stopped and the kernel reported
IOAHCIBlockStorageDrive: could not recover SATA HDD after 5 attempts. terminating
completionRead: 1598: Failed read request b88146000, 4096: e00002c0
disk1s1: no such device.

Well, there’s the hardware misbehaving.

apfs_vnop_read: 7261: ### obj-id longnumber/anotherlongnumber retval 6 filesize 16388 offset 0 resid 16388 ###
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync54.120.1/rsync/rsync/rsync.c(244) [sender=2.6.9]
rsync: writefd_unbuffered failed to write 185 bytes [generator]: Broken pipe (32)
disk1s5: media not present.
nx_buf_bread:592: buf_biowait() failed, error = 6, b_error = 6, buf_flags_after_io = 0x101, crypto = [encrypted composite]
_vnode_dev_read:811: *** got err 6 reading blknum 54480 (num read errs: 1)
localhost:/ root# apfs_vfsop_sync:3357: /dev/disk1: failed to finish all transactions to sync() - Device not configured(6

At this point there was nothing else possible to do but power cycle.
Over several iterations I managed to get most of my home directory copied across to the external disk with rsync before the SSD stopped responding all together.

Book review: BPF Performance Tools: Linux System and Application Observability

It’s more than 11 years since the shouting in the data centre video landed and I still manage to surprise folks in 2020 who have never seen it with what is possible.
The idea that such transparency is a reality in some circles comes as a shock.

Without the facility to be able to dynamically instrument a system the operator is severely limited of insight into what is happening on a system using conventional tools, solely. Having to resort to debugging tools to gain insight is a non option usually for several reasons
1) disruptive (may need for application to be re-invoked via tooling).
2) considerable performance impact.
3) unable to provide a holistic view (may provides insight into one component leaving it operator to correlate information from other sources).
If you do have the luxury, the problem is how do you instrument the system?
The mechanism offers the ability to ask questions about the system, but can you formulate the right question?? This book hopefully helps with that.

Observation of an application, you need both resource analysis and application-level analysis. With BPF tracing, this allows you to study the flow from the application and its code and context, through libraries and syscalls, kernel services, and device drivers. Imagine taking the various ways disk I/O was instrumented and adding query string as another dimension for breakdowns.

The BPF performance tools book centres around bpftrace but covers BCC as well. bpftrace gives a DTrace like tool for one liners and writing scripts similarly to D, so if you are comfortable with DTrace, syntax should be familiar though it is slightly different.
BCC provides a more powerful and complex interface for writing scripts which leverage other languages to compose a desired tool. I believe the majority of the BCC tools use Python though Luajit is supported too.
Either way, in the background everything end up as LLVM IR and goes through libLLVM to compile to BPF.

The first part of the book covers the technology, starting with introducing eBPF and moving down to cover the history, interfaces, how things work, and the tooling which compliment eBPF such as PMCs, flamegraphs, perf_events and more.
A quick introduction to performance analysis followed by a BCC and bpftrace introduction rounds off the first part of the book in preparation for applying them to different parts of a system, broken down by chapter, starting with CPU.

The methodology is clear cut. Use the traditional tools commonly available to gauge the state of the system and then use bpftrace or BCC to hone in on the problem, iterating through the layers of the system to find the root cause. As opposed to trying to solve thing purely with eBPF.

I did not read the third and fourth sections of the book which covered additional topics and appendixes but I suspect I will be returning to read the “tips, tricks and common problems” chapter.
From the first sixteen chapters which I read, the CPU chapter really helped me understand the way CPU usage is measured on Linux. I enjoyed the chapter dedicated to languages, especially the Bash Shell section.
Given a binary (in this case bash):
how you go about extracting information from it, whether it has been compiled with or without frame pointers preserved.
How you could expand the shell to add USDT probes.
I did not finish the Java section, too painful to read about what’s needed to be done due to the nature of Java being a C++ code base and the JIT runtime (the book states it is a complex target to trace) and couldn’t contain myself to read the containers *yawn* chapter.
All the scripts covered in the book have their history covered in the footnotes of the page which was nice to see (I like history)

I created the first execsnoop using DTrace on 24-Mar-2004, to solve a common performance problem I was seeing with short-lived processes in Solaris environments. My prior analysis technique was to enable process accounting or BSM auditing and pick the exec events out of the logs, but both of these came with caveats: Process accounting truncated the process name and arguments to only eight characters. By comparison, my execsnoop tool could be run on a system immediately, without needing special audit modes, and could show much more of the command string. execsnoop is installed by default on OS X, and some Solaris and BSD versions. I also developed the BCC version on 7-Feb-2016, and the bpftrace version on 15-Nov-2017, and for that I added the join() built-in to bpftrace.

and a heads up is given on the impact of running the script is likely to have, because some will have a noticeable impact.

The performance overhead of offcputime(8) can be significant, exceeding 5%, depending on the rate of context switches. This is at least manageable: it could be run for short periods in production as needed. Prior to BPF, performing off-CPU analysis involved dumping all stacks to user-space for post processing, and the overhead was usually prohibitive for production use.

I followed the book with a copy of Ubuntu 20.04 installed on my ThinkPad x230 and it mostly went smoothly, the only annoying thing was that user space stack traces were usually broken due to things such as libc not being built with frame pointers preserved (-fno-omit-frame-pointer).
Section 13.2.9 discusses the issue with libc and libpthread rebuild requirement as well as pointing to the Debian bug tracking the issue.
I’m comfortable compiling and installing software but didn’t want to go down the rabbit hole of trying to rebuild my OS as I worked through the book just yet, the thought of maintaining such a system alongside binary updates from vendor seemed like a hassle in this space. My next step is to address that so I have working stack traces. 🙂

Besides that, I enjoyed reading the book especially the background/history parts and look forward to Systems Performance: Enterprise and the Cloud, 2nd Edition, which is out in a couple of months.

Heads up for RSS subscribers

I’m going to be experimenting with the migrating from WordPress to Hugo this week, if you subscribe to the RSS feeds on this site and wish to continue to do so, you might want to check everything is ok at your end after Monday the 25th. One of the key factors of migrating to Hugo is to preserve URLs for existing posts so hopefully there should be minimal fallout.

Full name of root account in BSD

NetBSD now has a users(7) and groups(7) manual. Looking into what entries existed in the passwdand groupfiles I wondered about root’s full name who we now know as Charlie Root in the BSDs.

Root was called Ernie Co-vax in 3BSD. An Ernie Kovacs also shows up in adduser(8). The 4.4BSD System Manager’s Manual mentions “at Berkeley we have machines named Ernie (Co-VAX), Kim (No-VAX)”.

Charlie Root in 4.2BSD.

actually 4.1c, but the entries differ:

root::0:10:Charlie,458E,7750:/:/bin/csh in 4.1c
root::0:10:Charlie &:/:/bin/csh in 4.2

The ampersand is replace by a capitalised login name, hence Charlie Root

The Man in 2.10BSD.

Something blogged (on pkgsrcCon 2019)

pkgsrcCon 2019 was held in Cambridge this year. The routine as usual was social on the Friday evening, day of talks on the Saturday, hacking on things on the Sunday.
Armed with a bag of bits to record the talks, I headed up to Cambridge on Friday evening to meet up with sborrill before heading to the Castle Inn. Not much to report from Friday night, drinks were drunk, food was ate, PowerPC hardware was passed on.

Saturday morning started off with a brief intro by sborrill and pr1w1, followed by the first talk of the day by nia about audio(9) in NetBSD and her work to improve support in 3rd party software. [slides]

bsiegert was next, speaking about spellcheckers and how without careful consideration when developing an API, one can prevent downstream consumers from moving forward with a project. [slides]

This was preceeded by agc who gave the first of two talks on the work and methodoligies the team working on the OpenConnect appliances at Netflix use – life consuming FreeBSD-CURRENT and their development model (regular releases from the bleeding edge). The release they produce caters for several generation of hardware which is continuosly evolving, ranging from appliances based on conventional hard disk drives to all flash SSD appliances, and now, nvme based appliances.

We breaked for lunch at this point and went for a wonder to find the canteen.
First talk after lunch was by Natasa Milic-Frayling on Software preservation and digital continuity and the challenges with keeping legacy software running. Having successfully virtualised Windows based systems running all the way back to Windows 95, there were other challenges such as archiving distributed systems where proprietory software to be archived ran on one workstation, but data was sourced from other systems such as via ODBC or Active Directory (LDAP), increasing the scope of systems required to be archived in order for an application to function. As we move forward in time, legacy hardware becomes more fragile and harder to source parts for, unmaintained software which is no longer supported becomes a growing security risk, so, system get decommissioned but the data that lived on such systems may still be useful for reference.

After Natasa’s talk it was my turn to speak, I gave an update on the state pkgsrc support on OS X Tiger, followed by various books I’d read over the last 8 months and what I’d been working on in $dayjob, linking things back to Cambridge one way or another, turns out I had been reading about folks in Cambridge doing incredible work, it just happened that they were in Cambridge, Massachusetts along with the Mac Mini I was working form 🙂 [slides]

wiz followed on after me and spoke about the various groups in The NetBSD Foundation and the role they play in the day to day running of the project. From the board to the pkg-bug-handler teams and many others. [slides]

schmonz was our remote speaker, he gave a status update on the qmail distribution in pkgsrc and a new project called notqmail which provides a home as the upstream of this initiative. [demo and further info]

khorben revisited the topic of package signing with the work that happened in the past such as in EdgeBSD, improvements in NetBSD which could ease it such as developments in ptrace(2), and then opened up to a discussion around what’s required to provide signed packages.

agc wrapped up the day with the final talk, “large scale packaging” about how the packages which are shipped to the OpenConnect appliances are put together, using a cut back version of FreeBSD ports containing just the packages required.

Saturday night I headed back to London as I had a club night to attend before heading back on the Sunday. Somewhat drained, I made it home with the kit I’d carried up, showered, changed, and was back in Farringdon before midnight.

Come 2:30am, there was a powercut and that was the end of the night. I was a bit miffed as I missed the same event the previous year due to a clash of events so I was really looking forward to this.

1/08/19 – The recording of the set is now available

I got some rest and made it back to Cambridge for lunch on Sunday. We then headed back to the Centre for Computing History for the last part of the con. I spent some time looking at the machines on display, played Sonic the Hedgehog and Street Fighter 2 both of which I hadn’t done in a long while and then helped debug an issue with booting NetBSD on QEMU as a guest when QEMU is invoked with --cpu=host. The host system in this case was flashed with libreboot which lacks microcode for the CPU and NetBSD crashes on probing the CPU. Explicitly specifying a CPU as a workaround in this case allowed the system to boot e.g --cpu=coreduo.

IBM ThinkPad X61s without microcode update

cpu0 at mainbus0 apid 0
cpu0: Genuine Intel(R) CPU 1400 @ 1.66GHz, id 0x6e8
cpu1 at mainbus0 apid 1
cpu1: Genuine Intel(R) CPU 1400 @ 1.66GHz, id 0x6e8

IBM ThinkPad X61s with microcode update

cpu0 at mainbus0 apid 0
cpu0: Genuine Intel(R) CPU L2400 @ 1.66GHz, id 0x6e8
cpu1 at mainbus0 apid 1
cpu1: Genuine Intel(R) CPU L2400 @ 1.66GHz, id 0x6e8

Post con, I spent a couple of days gallivanting around London with wiedi, weather was really hot but we managed to get a lot of milage down, from Brick Lane to Notting Hill Gate via the South Bank and Maida Vale, followed by a visit to Wembley for the London Hack Space. As always, it was a fun few days.

pkgsrcCon 2019 group photo

A week of pkgsrc #13

With the lead up to the release of pkgsrc-2019Q2 I picked up the ball with the testing on OS X Tiger again. It takes about a month for a G4 Mac Mini to attempt a bulk build of the entire pkgsrc tree with compilers usually taking up most days without success. To reduce the turnaround time, I switched to attempting a small subset of packages for a quicker turnaround using meta-pkgs/bulk-large. After a couple of days of compiling I received a report in my mailbox showing breakages in key packages such as OpenSSL and libxml2.

security/openssl issue was a regression upstream which was resolved by bringing the packages up to date.

textproc/libxml2 breakage was due to -Wno-array-bounds being passed to the compiler and the ancient version of GCC in Tiger not supporting it, resulting in a hard cc1: error: unrecognized command line option "-Wno-array-bounds" error. The use of BUILDLINK_TRANSFORM.Darwin here allowed the option to be removed dynamically, on the fly, confined to only being applied during builds on Darwin.

security/libgpg-error needed the definition of __DARWIN_UNIX03 to make use of unsetenv(3) which returns an integer, unsetenv is called inside an if statement and is unable to test the result as Tiger still used the old implementation by default, which returns void. This results in the error sysutils.c:178: error: void value not ignored as it ought to be when building. The fix for this came from macports.

There was many breakages due to the lack of support of strnlen(3) in Tiger, Apple introduced support for this function in Lion, as a workaround, pkgtools/libnbcompat now includes an implementation which will be used in place for packages which specify strnlen as a requirement using USE_FEATURES and the OS is marked as missing the features using _OPSYS_MISSING_FEATURES.

databases/sqlite3 has issues with readline included with Tiger, as a workaround locally, I switched to using gnu readline.

As find(1) in Tiger lacks support for the + parameter for -exec, the ability to install Python egg modules is currently broken, I worked around this locally to progress with the bulk build. Next step possibly is to make pkgsrc aware of find as a tool and substitute legacy versions with a version from sysutils/coreutils perhaps.

devel/re2c was broken due to the ageing version of GNU Bison being called, which again lacked support for a feature.

  GEN      src/ast/parser.cc
/usr/bin/bison: option `--defines' doesn't allow an argument
Usage: /usr/bin/bison [-dhklntvyV] [-b file-prefix] [-o outfile] [-p name-prefix]
       [--debug] [--defines] [--fixed-output-files] [--no-lines]
       [--verbose] [--version] [--help] [--yacc]
       [--no-parser] [--token-table]
       [--file-prefix=prefix] [--name-prefix=prefix]
       [--output=outfile] grammar-file

Making use of the version in pkgsrc instead resolved the issue by specifying bison in USE_TOOLS.

With the advancement of language development and new standards being defined, pkgsrc grew support for specifying which versions of C & C++ language standards a package may require e.g USE_LANGUAGES=c++03. This in turn passes the relevant standard to the compiler using --std= option. If the compiler being used doesn’t support the standard specified all tests in GNU configure to determine the compiler and language support will fail, resulting in a cryptic configure: error: C++ preprocessor "/lib/cpp" fails sanity check (cpp is in /usr/bin on Tiger). As a local workaround I have commented out the block in pkgsrc/mk/compiler.mk so that the standard is not set. Not sure whether a knob to ignore setting the stand standard is worth it or to move forward by enforcing the use of a new (external) toolchain.

With these change, the bulk build result went from 673 packages out of 1878 to 1067 out of 1878. The resulting packages and bootstrap kit are now up on sevan.mit.edu.

Next step is to address support for find in the pkgsrc tools infrastructure, remove setting 64bit mode on G5 macs as a bootstrap mode as Tiger really doesn’t support it.

Thanks to viva PowerPC for the plug.

A glimpse of sevan.mit.edu as it runs currently:

EuroBSDcon 2018

Last year for EuroBSDcon 2017 in Paris, I caught a horrendous cold on the first day and spent the week in a fragile state, this year I was well prepared and caught the terrible cold in advance. Things kicked off on Monday when I had to go and see a man about a goat at the train station as I was the chosen mule for this leg of the conference. No curried goat for dinner that night.

This was to be my first time in Bucharest and also the first time flying from London Luton Airport. I packed the goat and cough medicine and headed for the train station again on Wednesday morning. By lunch time I had met up with sborrill from the NetBSD release engineering team, we discussed the changes happening in NetBSD-HEAD and the tooling built on top of NetBSD at Precedence before setting off.

Thursday was about the FreeBSD devsummit for me or at least it was meant to be. For the flexibility of being able to use any computer without having to spread keys everywhere and the safety of travelling lightly without important keys, I use a Yubikey which contains a key that provides access to a few non-critical systems. At the devsummit, when it came time for our first break, I unloaded my key from the system and ejected the Yubikey, I don’t know what happened next in my head but when I came back I tried to load my key again from the Yubikey and entered an incorrect pin consecutively until the device was locked. This was a total disaster as I was not carrying the admin pin to unlock the Yubikey and I wasn’t sure what I had done at the time. I spent most of the devsummit trying to understand why I was unable to load my key and how to tell if the device was actually locked or not, documentation is pretty sparse and somewhat poor. It doesn’t help that there are multiple components required to use and manage a security token, all of which are independently developed with separate sets of documentation.

My favourite piece of documentation was from GnuPG.

PIN retry counter

This field saves how many tries still are left to enter the right PIN. They are decremented whenever a wrong PIN is entered. They are reset whenever a correct AdminPIN is entered. The first and second PIN are for the standard PIN. gpg makes sure that the two numbers are synchronized. The second PIN is only required due to peculiarities of the ISO-7816 standard; gpg tries to keep this PIN in sync with the first PIN. The third PIN represents the retry counter for the AdminPIN.

Hopefully, it might be edited at some point in the future.

I eventually gave up, conceding that I am locked out until I get home.

On the Friday, it was time for the NetBSD devsummit. We first covered hypervisors and support of different types of virtualisation in Xen, joerg gave a status update on his ongoing repository conversion work and what avenues it could potentially enable as well as a core and TNF board status update which segued into more technical details such as the DRM update. It was a fun day as we had the opportunity to ask questions and cover the answer in detail, something which is not possible in the conference talk setting. The discussion continued over dinner and late into the night.

For the first day of the conference, I spent the day in room 2. I heard the first of a series of talks on sanitisers, David Carlier and kamil co-presented on the state of sanitiser support. Sanitisers featured heavily at the conference this year which shows the importance of such tooling. Kristaps Dzonsons gave a talk about trying to utilise an open source stack for use with diving, from photo and video management to GPS and the rough edges with some options. Andrew von Dollen gave a talk about utilising the NPF Lua binding to provide a simple high-level interface to the firewall and in the spirit of the Scriptable Operating Systems with Lua paper, using the interface to explore different filtering scenarios with ease. NPF scripting with Lua was previously presented at EuroBSDcon 2014.

I was the last speaker of the day in room 2, I had hoped to present off NetBSD/macppc 8.0 on my 12″ G4 iBook but was unable to get it working with the projector due to genfb(4) not recognising it as connected. Instead I presented using maya‘s Dell XPS 15″ using the latest DRM update in NetBSD-HEAD and everything just worked. In hindsight I should have booted a kernel from HEAD to get radeonfb(4) and try again, but at the time I was actually thinking about recompiling my kernel! 🙂

The day wrapped up with the second keynote by Ron Broersma, he came equipped with a lot of historical memorabilia, It was cool to see a first edition copy of Computer Lib (you can order a copy from Ted Nelson here).

Ron spoke about the evolution of ARPANET & The Internet with an emphasis on the use of Berkeley UNIX. It was amusing to hear how  UNIX tape images had been provisioned out to sites from government agencies, Bob Morris also mentioned the subject in An Oral History of UNIX and getting the initial image to the agencies.

The actually, pieces of the government, peddled the idea of using UNIX to
national security agencies. I kind of laughed at the people there. Because, are
they aware of the fact that the UNIX that they are now running actually got to
NSA in the trunk of my car.

As with Ted Nelson, Ron mentioned priesthoods which I believe is still prevalent in tech communities especially networking. Me and khorben discussed Patterns in Network Architecture: A Return to Fundamentals by John Day which provides insights about how things came to be and an alternative approach to address technical issues in modern day networking.

My favourite historical tidbit from the keynote was that in the pre DNS era when a hosts file was circulated, unofficial revisions were a thing and Berkeley hosts also appeared as Berserkeley on some.  🙂

Day two of the conference began with a heavy dose of sanitisers, kamil this time speaking about finding kernel bugs through the use of sanitisers followed by Yang Zheng on integrating libfuzzer with the NetBSD userland for Google Summer of Code and some of the bug he’d found in NetBSD as a result.

After lunch, khorben spoke about how he got into operating system development, his DeforaOS project and the approach in simplicity it takes with reference to John Day’s Pattern in Network Architecture book.

With a quick change over, maya spoke of the various bugs she ran into on NetBSD and her approach to debug them, covering everything from bugs on DEC Alpha to MIPS to drivers.

For the last stretch of the conference, I headed down to Being a BSD user by Roller Angel. It was a talk about growing within a community and the personal challenges one goes through as they develop. Roller also provided support for users in a technical role and covered the tooling he used to help users learn, such as screen-casting. Afterwards, I headed back up to room 2 for agc‘s talk on source code tracking and the experience with various tools used over the years at Netflix. Things starting life as a bunch of scripts which were added to subversion, migrated to mercurial and now stored in git. Of the dingbats in the talk, my favourite was Jose which the intro for the section on OCA.  When the conference finished, we headed out for dinner. Having heard Scott Long’s talk about Netflix at NYCBSDCon 2014, I asked agc about his experiences in the early days of the appliance project, over dinner we heard about suffering disk firmware issues and building a strategy to re-flash appliances, extending tooling like camcontrol(8) and improving build performance.

Snapped in London, when I returned on Monday:

maya & kamil also wrote up about their experiences here and here.

Slides for my talk “What’s in store for NetBSD 9.0” are available here.

A blast from the past – Amit Singh

I spent most of today tied up with working on getting an upspin instance operational again. Instead of running things as a standalone daemon, I proxy through a web server so that the upspinserver can run as a unprivileged user, on a unprivileged port and everything should be good and happy. Except it’s not, on the client side is macOS with osxfuse and the upspin space expose to the OS via upspinfs, browsing works great, manipulation through finder, not so much. You can copy a file in to place and when the transfer is finished, the file disappears from finder however the upspin tool is able to show things correctly as they should appear. I spent some time looking in to the osxfuse source repositories and installer package as there is a separate repo for the kext which I wondered where it fits in, in a world with SIP.

While going through the osxfuse source, Amit Singh’s name popped up in the licenses. I was a big fan of his blog kernelthread.com when it was active and would look forward to new posts. I fondly remember trying out the test app he wrote to demonstrate the sudden motion sensor working on Mac OS X with my 17″ PowerBook G4 and the posts to complement the Mac OS X Internals book.

Turns out he gave a talk at Google back in the day on MacFuse which is the predecessor to osxfuse.

It’s a nice talk which provides some history on how the project came about and why he decided to work on it, there were moments which made me smile especially when discussing conditioning and deeply held (mis?)beliefs.

I also found that Apple still runs their mailman instance with a small number of list being served from it, most importantly Darwin related ones. 🙂

Back to more wrestling tomorrow, gnite!

Something blogged (on pkgsrcCon 2018)

pkgsrcCon 2018 badge

For this years pkgsrcCon, the baton was passed on to Pierre Pronchery & Thomas Merkel, location Berlin. It wasn’t clear whether I would be able to attend this year until the very last minute, booking plane tickets and accommodation a couple of days before. The day before I flew out was really hectic and I did not get any sleep. I left home around 1:30am and made my way to St Pancras for the train to the  airport as It was an early morning flight. Once we’d boarded the flight, I passed out without any recollection and came to just before we were going to descend.  With less than two hours sleep and plenty of time to spare until the evening social I slowly made my way from the airport to the city centre. I roamed the city centre looking at the street art and jumping on and off metro stations between Jannowitzbruke and Westkreuz.

In the evening I headed to the social event and met up with others.  I heard about the state of The Unleashed Operating System, the latest buzzword soup to add to current projects for instant success (block chain, AI and something else I forgot the to note down), debug work flow and lots of other things over food and drinks. I had taken my ThinkPad X60s alongside a 12″ iBook G4 to Berlin, with the plan to see if jdolecek with the machine in person could shed any light on a deadlock issue I experience when hammering the CPU with a kernel build while 2 or more CVS operations are in progress and to also get setup with Yubikey on NetBSD so I could commit from any machine with a USB port without having to copy my keys around to different machines.

On the first day of talks, between preparing slides, spz pointed out what I needed to get the Yubikey working with SSH on NetBSD. The current Yubikeys support multiple features (U2F, OTP, CCID). In order to use a Yubikey as a CCID device with pcscd, pcscd expects a ugen(4) device which doesn’t work on NetBSD because the USB keyboard driver ukbd(4) attaches to the Yubikey and prevents other modes of access. The workaround is to roll a new kernel with the device explicitly hard coded to use ugen(4). With that in place, I was able to checkout the developers source repo on the iBook. For the X60s, jdolecek rolled a fresh kernel with with the DIAGNOSTIC and LOCKDEBUG which we booted and reproduced the deadlock, unfortunately, there was no new information provided by these option, possibly indicating that it’s not a locking issue. A new kernel was rolled with an extra printf() which I tested with, however I couldn’t reproduce the issue with this change. Instead the CVS update just got slower and slower while the system operated normally (without deadlock). Unsure if it was the lack of bandwidth or a new behaviour, I gave up after several hours of the cvs update running. To rule out connectivity, my next plan is to test locally using a mirror of the repository using rsync and perform the CVS operations from that source instead.

I gave a talk titled “Something Old, Something New, Something Borrowed”. Something Old was about NetBSD/macppc, a homage to my first pkgsrcCon back in 2014 where I gave a short talk about pkgsrc on Tiger PowerPC with my 12″ PowerBook, Something New was about Upspin, Something Borrowed was about Minix3. Slides.

For coverage of the different talks and leot‘s experience in Berlin, see this blog post.

After a day of presentations we headed over to a restaurant where we chatted some more over dinner. While we waited for our food to arrive, we were given a demo of  the J programming language by Martin, he described J as a pocket calculator on drugs. I’d previously heard of the K programming language and its cryptic syntax but had not seen anything with such a minimal and cryptic syntax in action.

For the following morning, uwe gave a talk about the Forth programming language and experiences with it, especially with OpenFirmware which originates from Sun Microsystems and was also used by Apple on the PowerPC based Macs. The cool thing about OpenFirmware is that hardware can itself contain a driver for OpenFirmware which gets loaded when the device is initialised, making OpenFirmware aware of how to interact with the device without any manual loading of software by the operator.uwe mentioned NetBSD/ofppc which builds on that by having a kernel without drivers for on-board hardware, instead relying on OpenFirmware to communicate with devices instead. He also described how it is possible to learn about how a system works by using the ccommand in OpenFirmware to disassemble modules, though he warned that unfortunately this didn’t work so well on Macs as symbols had been stripped from OpenFirmware environment. There was lots of references to follow up material to read up on, from The Evolution of Forth paper to uwe‘s notes.

After lunch time, we packed up and headed to a computer museum. On the way we spoke about emacs and workflows like using M-x make-frame-on-display.

On the Monday, I met up with sborrillfor breakfast before heading to the spy museum and doing some sightseeing. I headed back to c-base where I met Pierre, Youri & Sebastian. As it approached the evening, It was time to make way to the airport again. I was back home by around 1am, Tuesday morning, so glad I went 🙂

21/07/18 10:20BST – There has been backlash from residents to stop Google from setting up a new campus in Kreuzberg, Berlin.

A week of NetBSD #1

I wanted to resume writing up notes about what I’d been working on as with the “A week of pkgsrc” series, this actually spans the last few weeks. 🙂

Things basically came together around the event of a late 2009 13″ MacBook entering my life. First port of call was to see state of NetBSD support since I last visited it.
I’m still running FreeBSD on my 11″ Air and so was keen to see if there had been any progress in NetBSD with regards to supporting the Intel based Macs. Unfortunately the same issue is still present, the system panics very early in the boot process kern/52229 when using the UEFI image and fails when enumerating CPUs present. I was able to get the system to boot using the conventional (non-UEFI) image by opting to boot without SMP & ACPI enabled. I used the MacBook install wiki article as a rough guide on gpt partitioning and got a daily image installed. The wiki article needed some attention as the syntax for commands did not apply but through this process I discovered a crash in gpt where a missing check in the source means that a null pointer is passed to one of the gpt commands.
e.g gpt show -a

Running the system without ACPI was not a good idea, turns out the thermal management does not get initialised and the system eventually switches off as a failsafe measure. I found this out through leaving the machine bulk building some of my packages, only to return to a machine that’s powered off.

GCC 6 has landed in NetBSD-HEAD and work is in progress to bring things into shape as the next revision of GCC shipped in NetBSD. Unfortunately cross compilation from macOS does not work at the moment (toolchain/53013) which limited my ability to experiment with different NetBSD kernel configuration while I was trying to look into the MacBook issue.

The one frustration with crash was that when the system panicked, the system would drop to DDB (the kernel debugger) but the keyboard would not be functioning. I realised that when running into panics on NetBSD/macppc in the past I’d always have a backtrace to include in my bug report. This was because the macppc kernel was built with the option to execute a backtrace command when entering the debugger and the amd64 one wasn’t. Asking on the tech-kern list to enable this option across the board I learnt of the shortfall of this approach and received suggestion on extending ddb, thus the dumpstack option was born and enabled by default. With this, setting ddb.onpanic sysctl to 2 for backtraces went away as well as setting the DDB_COMMANDONENTER option to run backtrace explicitly in kernel configuration files. The next step now is to extend DDB to show the panic message after the backtrace so that the panic message and the tail end of potentially lengthy trace is visible.

Up until very recently there was only support for PCIe based G5 PowerMacs using the POWERMAC_G5_11_2 kernel configuration in NetBSD, I am lacking such a system but do have a first gen G5 iMac which is PCI-X based. The initial work to bring up NetBSD on the G5 was actually done on a PCI-X based system long ago so I was curious what had diverged since then. Previous GSoC project participants used a repo in the NetBSD-gsoc sourecforge project to share their work, the two G5 related GSoC projects are there in a single repo. Some further work also took place on the port-macppc mailing list in 2013. It was a fun weekend albeit no further progress to seeing even a copyright notice on my part. However I learnt lots about the kernel build process, poked at lowcore.S and the boot process. I also learnt of an emulator called Mambo. Mambo was a full PowerPC system simulator, produced by IBM research. There are kernel configuration files to support Mambo (theoretically) in NetBSD & FreeBSD but unfortunately, I couldn’t find any binaries to try Mambo, along with that I also found the links for the PowerPC 970FX (G5 CPU) documentation all dead and the documentation removed from the IBM site. 🙁
To rule out the kernel working on a G5 but console being an issue, I tried experimenting with different frame buffers and found macofcons(4) broke the build (port-macppc/53004). After discussing with macallan@, macofcons(4) and the OFB_ENABLE_CACHE option have now been removed [1] [2], on the basis that though they may have worked at one point they cause more problems that solve.

macallan@ has been working on extending support for the G5 based systems  and it is now possible to boot NetBSD on both the early PCI-X based systems as well as the PCIe models thanks to his work. I was finally able to netboot NetBSD on my first gen G5 iMac, this should come in very handy for pkgsrc work.

The NetBSD/macppc website has a supported system page which was due a review. After a call for feedback on any discrepancies , the awacs(4) soundcard driver is now enabled and I’m looking for a source to find out which systems shipped with which chipsets. Along the way NetBSD gained modem drivers for some of these systems but the page still states they are not supported.

In NetBSD, there is support for different buffer queue strategies for disk I/O, on the tier 1 ports such as i386 & amd64, the per-priority cyclical scan strategy is enabled by default, to bring macppc on par, it is now also enabled there by default too. Now to document the option so that it’s somewhat like the description of another strategy for read priority. See BUFQ_READPRIO and BUFQ_PRIOCSCAN in options(4).

This weekend I’ve been testing the HEAD-llvm builds on i386 & macppc as well as ATF testing, but I’ll write about that in another time.

Thanks to jmcneil, martin, mrg, pgoyette, uwe for the help and suggestions.