Book review: BPF Performance Tools: Linux System and Application Observability

It’s more than 11 years since the shouting in the data centre video landed and I still manage to surprise folks in 2020 who have never seen it with what is possible.
The idea that such transparency is a reality in some circles comes as a shock.

Without the facility to be able to dynamically instrument a system the operator is severely limited of insight into what is happening on a system using conventional tools, solely. Having to resort to debugging tools to gain insight is a non option usually for several reasons
1) disruptive (may need for application to be re-invoked via tooling).
2) considerable performance impact.
3) unable to provide a holistic view (may provides insight into one component leaving it operator to correlate information from other sources).
If you do have the luxury, the problem is how do you instrument the system?
The mechanism offers the ability to ask questions about the system, but can you formulate the right question?? This book hopefully helps with that.

Observation of an application, you need both resource analysis and application-level analysis. With BPF tracing, this allows you to study the flow from the application and its code and context, through libraries and syscalls, kernel services, and device drivers. Imagine taking the various ways disk I/O was instrumented and adding query string as another dimension for breakdowns.

The BPF performance tools book centres around bpftrace but covers BCC as well. bpftrace gives a DTrace like tool for one liners and writing scripts similarly to D, so if you are comfortable with DTrace, syntax should be familiar though it is slightly different.
BCC provides a more powerful and complex interface for writing scripts which leverage other languages to compose a desired tool. I believe the majority of the BCC tools use Python though Luajit is supported too.
Either way, in the background everything end up as LLVM IR and goes through libLLVM to compile to BPF.

The first part of the book covers the technology, starting with introducing eBPF and moving down to cover the history, interfaces, how things work, and the tooling which compliment eBPF such as PMCs, flamegraphs, perf_events and more.
A quick introduction to performance analysis followed by a BCC and bpftrace introduction rounds off the first part of the book in preparation for applying them to different parts of a system, broken down by chapter, starting with CPU.

The methodology is clear cut. Use the traditional tools commonly available to gauge the state of the system and then use bpftrace or BCC to hone in on the problem, iterating through the layers of the system to find the root cause. As opposed to trying to solve thing purely with eBPF.

I did not read the third and fourth sections of the book which covered additional topics and appendixes but I suspect I will be returning to read the “tips, tricks and common problems” chapter.
From the first sixteen chapters which I read, the CPU chapter really helped me understand the way CPU usage is measured on Linux. I enjoyed the chapter dedicated to languages, especially the Bash Shell section.
Given a binary (in this case bash):
how you go about extracting information from it, whether it has been compiled with or without frame pointers preserved.
How you could expand the shell to add USDT probes.
I did not finish the Java section, too painful to read about what’s needed to be done due to the nature of Java being a C++ code base and the JIT runtime (the book states it is a complex target to trace) and couldn’t contain myself to read the containers *yawn* chapter.
All the scripts covered in the book have their history covered in the footnotes of the page which was nice to see (I like history)

I created the first execsnoop using DTrace on 24-Mar-2004, to solve a common performance problem I was seeing with short-lived processes in Solaris environments. My prior analysis technique was to enable process accounting or BSM auditing and pick the exec events out of the logs, but both of these came with caveats: Process accounting truncated the process name and arguments to only eight characters. By comparison, my execsnoop tool could be run on a system immediately, without needing special audit modes, and could show much more of the command string. execsnoop is installed by default on OS X, and some Solaris and BSD versions. I also developed the BCC version on 7-Feb-2016, and the bpftrace version on 15-Nov-2017, and for that I added the join() built-in to bpftrace.

and a heads up is given on the impact of running the script is likely to have, because some will have a noticeable impact.

The performance overhead of offcputime(8) can be significant, exceeding 5%, depending on the rate of context switches. This is at least manageable: it could be run for short periods in production as needed. Prior to BPF, performing off-CPU analysis involved dumping all stacks to user-space for post processing, and the overhead was usually prohibitive for production use.

I followed the book with a copy of Ubuntu 20.04 installed on my ThinkPad x230 and it mostly went smoothly, the only annoying thing was that user space stack traces were usually broken due to things such as libc not being built with frame pointers preserved (-fno-omit-frame-pointer).
Section 13.2.9 discusses the issue with libc and libpthread rebuild requirement as well as pointing to the Debian bug tracking the issue.
I’m comfortable compiling and installing software but didn’t want to go down the rabbit hole of trying to rebuild my OS as I worked through the book just yet, the thought of maintaining such a system alongside binary updates from vendor seemed like a hassle in this space. My next step is to address that so I have working stack traces. 🙂

Besides that, I enjoyed reading the book especially the background/history parts and look forward to Systems Performance: Enterprise and the Cloud, 2nd Edition, which is out in a couple of months.

Lessons learnt from adding OpenBSD/x86_64 support to pkgsrc

Before even getting into the internals of operating systems to learn about differences among a group of operating systems, It’s fairly evident that something as simple as naming is different between operating systems.

For example, the generations of trusty 32bit x86 PC is commonly named i386 in most operating systems, FreeBSD may also refer to it as just pc, Solaris & derivatives refer to it as i86pc, Mac OS X refers to it as i486 (NeXTSTEP never ran on a 386, it needed a minimum of a 486 and up until Sierra, machine(1) would report i486 despite being on a Core i7 system), this is one of the many architectures which needed to hadled within pkgsrc. To simplify things and reduce lengthy statements, all variants for an arch are translated to a common name whiche is then used for reference in pkgsrc. This means that all the examples above are grouped together under the MACHINE_ARCH i386. In the case of 64bit x86 or commonly referred to as amd64, we group under x86_64 or at least tried to. The exception to this grouping was OpenBSD/amd64, this resulted in the breakage of many packages because any special attention required was generally handled under the context of MACHINE_ARCH=x86_64. In some packages, developers had added a new exception for MACHINE_ARCH=amd64 when OPSYS=OPENBSD but it was not a sustainable strategy because to be affective, the entire tree would need to be handled. I covered the issue at the time in A week of pkgsrc #11 but to summarise, $machine_arch may be set at the start in the bootstrap script but as the process works through the list of tasks, the value of this variable is overriden despite being passed down the chain at the begining of a step. After some experimentation and the help of Jonathan Perkin, the hurdles were removed and thus OpenBSD/x86_64 was born in pkgsrc 😉

The value of this exercise for me was that I learnt the number of places within the internals of pkgsrc I could set something (by the nature of coupling components which share the same conventions (pkgtools, bsd make)) and really the only place I should be seeking to set something is at the start of the process and have that carry through, rather than trying to short circuit the process and repeat myself.

Thanks to John Klos, I was given control of a IBM Power 8+ S822LC running Ubuntu, which started setting up for pkgsrc bulk builds.
First issue I hit was pkgsrc not being able to find, this turned out to be the lack of handling for the multilib paths found on Debian & derivates for PowerPC based systems.
This system is a little endian 64bit PowerPC machine which is a new speciality in itself and so I set out to make my first mistake. Adding a new check for the wrong MACHINE_ARCH, long forgotten about the previous battle with OpenBSD/x86_64 I added a new statment to resolve the relevant paths for ppc64le systems. Bootstrap was happy with that & things moved forward. At this point I was pointed to lang/python27 most likely being borken by Maya Rashish, John had previously reported the issue and we started to poke at things. As we started rummaging through the internals of pkgsrc (pkgsrc/mk) I started to realise we’re heading down the wrong path of marking things up in multiple places again, rather than setting things once & propogating through.

It turned out that I only need to make 3 changes to add support for Linux running on little endian 64bit PowerPC to pkgsrc (2 additions & 1 correction 😉 )
First, add a case in the pkgsrc/bootstrap/bootstrap script to set $machine_arch to what we want to group under when the relevant machine type is detected. In this case it was when Linux running on a ppc64le host, set $machine_arch to powerpc64le. As this is a new machine arch, also ensure it’s listed in the correct endianness category of pkgsrc/mk/, in this case add powerpc64le to _LITTLEENDIANCPUS.
Then correct the first change to replace the reference to ppc64le for handling the multilib paths in pkgsrc/mk/platform/

The bulkbuild is still in progress as I write this post but 5708/18148 packages in an the only fall out so far appears to be the ruby interepreters.

Goodbye Alphastation

My second cool legacy UNIX workstation which got me started on FreeBSD & OpenBSD, I obtained this workstation back in the summer of 2002, I first tried Redhat Linux 7.2 which was available as a free download as a promotion to demonstrate the optimisation ability of the Compaq compiler suite for the Alpha. It was a terrible experience consistent with my previous attempts at running Linux up to that point ( I’d started off on Slackware in 96, moved onto Redhat 5.2 followed by Suse 6.2 ), I soon dropped it & moved onto Debian 3.0 (Woody) which was ok but the 7 cd set was a bit too much hassle for doing package installs, the performance wasn’t much better with the compared to the “optimised” Redhat so I moved onto NT 4.0 workstation & FX32! & ran that for a bit before getting bored. In the new year FreeBSD 5.0 release was announced & Alpha was a supported platform so I gave it try on this machine, armed with a copy of the handbook & the help of IRC I made a lot of progress, first by dropping 5.0 & going back to version 4.7 after being told either x was broken in 5 or y was a bug in 5 too many times. I was blown away by how much faster it was compared to the so-called “optimised” edition of Redhat.
Towards the end of 2003 I started thinking about trying OpenBSD as a firewall after hearing about PF & deployed it when 3.4 was released, the Alphastation served as my gateway connected to a 512k/128k cable modem connection but ended up dropping it & moving to i386 when 3.5 was released because php mysql extension was broken on alpha & I wanted to launch this blog.
After that the Alphastation was used less & less over the years so I passed it onto a fellow techie who would appreciate it.

iPodLinux on iPod Classic

I’ve kept an eye on the iPodLinux project since I got my 120GB iPod Classic back in 2007, I was never able to try out the fruits of the project as the last supported model was the one prior to the Classic & from the description of the site, the reason was the Classic & newer models used an encrypted firmware.
I was bored tonight & decided to revisit the project to see if any progress had been made & found the site no longer loaded, reading up on the wikipedia page revealed freemyipod which lists the device as supported, so I gave it a go.

Why would you want to do this?

  • Support for file formats not offered by Apple e.g FLAC & OGG
  • Not being tied to an instance of iTunes on a specific computer
  • Installation is only supported via Linux or Windows & is fairly straightforward, I went with the “no iTunes installed” path on Windows and was done in a few minutes. Only sightly annoying thing is that the device needs to be formatted as part of the install process.

    Flashing iPod Classic

    Why would you not want to do this?

  • Rockbox interface is clunkier than the Apple one
  • Losing the ability to use iTunes to sync music (device presents itself as just another drive to computer, you need to manage getting the music on the device yourself)

    I think It was worth the effort to have gained some flexibility & if the interface is really an issue, it is an open source project, so just roll up the sleeves and get involved!

  • My 1st Patch!

    I’ve just created my 1st patch, to add support for Slackware to the iSCSI Enterprise Target software

    Read this guide if youre interested in rolling out your patches

    --- Makefile.orig 2004-11-22 10:30:57.000000000 +0000
    +++ Makefile 2004-11-22 10:35:16.000000000 +0000
    @@ -28,6 +28,8 @@
    install -v -m 755 scripts/initd.debian /etc/init.d/iscsi-target;
    elif [ -f /etc/redhat-release ]; then
    install -v -m 755 scripts/initd.redhat /etc/init.d/iscsi-target;
    + elif [ -f /etc/slackware-version ]; then
    + install -v -m 755 scripts/initd /etc/rc.d/iscsi-target;
    install -v -m 755 scripts/initd /etc/init.d/iscsi-target;

    iSCSI On a budget!

    Following the Quick Guide to iSCSI on Linux I managed to setup a iSCSI Target host on Slackware 10 running on a virtual machine on VMware then connected to it from the Windows 2000 box which was the VMware host! 🙂

    I used the iSCSI Enterprise Target rather then the Ardis Target which the guide covers but as the Enterprise Target is a fork of the Ardis Target there is no variation in steps carried out.

    The Windows Initiator can be dowloaded from here