Book review: BPF Performance Tools: Linux System and Application Observability

It’s more than 11 years since the shouting in the data centre video landed and I still manage to surprise folks in 2020 who have never seen it with what is possible.
The idea that such transparency is a reality in some circles comes as a shock.

Without the facility to be able to dynamically instrument a system the operator is severely limited of insight into what is happening on a system using conventional tools, solely. Having to resort to debugging tools to gain insight is a non option usually for several reasons
1) disruptive (may need for application to be re-invoked via tooling).
2) considerable performance impact.
3) unable to provide a holistic view (may provides insight into one component leaving it operator to correlate information from other sources).
If you do have the luxury, the problem is how do you instrument the system?
The mechanism offers the ability to ask questions about the system, but can you formulate the right question?? This book hopefully helps with that.

Observation of an application, you need both resource analysis and application-level analysis. With BPF tracing, this allows you to study the flow from the application and its code and context, through libraries and syscalls, kernel services, and device drivers. Imagine taking the various ways disk I/O was instrumented and adding query string as another dimension for breakdowns.

The BPF performance tools book centres around bpftrace but covers BCC as well. bpftrace gives a DTrace like tool for one liners and writing scripts similarly to D, so if you are comfortable with DTrace, syntax should be familiar though it is slightly different.
BCC provides a more powerful and complex interface for writing scripts which leverage other languages to compose a desired tool. I believe the majority of the BCC tools use Python though Luajit is supported too.
Either way, in the background everything end up as LLVM IR and goes through libLLVM to compile to BPF.

The first part of the book covers the technology, starting with introducing eBPF and moving down to cover the history, interfaces, how things work, and the tooling which compliment eBPF such as PMCs, flamegraphs, perf_events and more.
A quick introduction to performance analysis followed by a BCC and bpftrace introduction rounds off the first part of the book in preparation for applying them to different parts of a system, broken down by chapter, starting with CPU.

The methodology is clear cut. Use the traditional tools commonly available to gauge the state of the system and then use bpftrace or BCC to hone in on the problem, iterating through the layers of the system to find the root cause. As opposed to trying to solve thing purely with eBPF.

I did not read the third and fourth sections of the book which covered additional topics and appendixes but I suspect I will be returning to read the “tips, tricks and common problems” chapter.
From the first sixteen chapters which I read, the CPU chapter really helped me understand the way CPU usage is measured on Linux. I enjoyed the chapter dedicated to languages, especially the Bash Shell section.
Given a binary (in this case bash):
how you go about extracting information from it, whether it has been compiled with or without frame pointers preserved.
How you could expand the shell to add USDT probes.
I did not finish the Java section, too painful to read about what’s needed to be done due to the nature of Java being a C++ code base and the JIT runtime (the book states it is a complex target to trace) and couldn’t contain myself to read the containers *yawn* chapter.
All the scripts covered in the book have their history covered in the footnotes of the page which was nice to see (I like history)

I created the first execsnoop using DTrace on 24-Mar-2004, to solve a common performance problem I was seeing with short-lived processes in Solaris environments. My prior analysis technique was to enable process accounting or BSM auditing and pick the exec events out of the logs, but both of these came with caveats: Process accounting truncated the process name and arguments to only eight characters. By comparison, my execsnoop tool could be run on a system immediately, without needing special audit modes, and could show much more of the command string. execsnoop is installed by default on OS X, and some Solaris and BSD versions. I also developed the BCC version on 7-Feb-2016, and the bpftrace version on 15-Nov-2017, and for that I added the join() built-in to bpftrace.

and a heads up is given on the impact of running the script is likely to have, because some will have a noticeable impact.

The performance overhead of offcputime(8) can be significant, exceeding 5%, depending on the rate of context switches. This is at least manageable: it could be run for short periods in production as needed. Prior to BPF, performing off-CPU analysis involved dumping all stacks to user-space for post processing, and the overhead was usually prohibitive for production use.

I followed the book with a copy of Ubuntu 20.04 installed on my ThinkPad x230 and it mostly went smoothly, the only annoying thing was that user space stack traces were usually broken due to things such as libc not being built with frame pointers preserved (-fno-omit-frame-pointer).
Section 13.2.9 discusses the issue with libc and libpthread rebuild requirement as well as pointing to the Debian bug tracking the issue.
I’m comfortable compiling and installing software but didn’t want to go down the rabbit hole of trying to rebuild my OS as I worked through the book just yet, the thought of maintaining such a system alongside binary updates from vendor seemed like a hassle in this space. My next step is to address that so I have working stack traces. 🙂

Besides that, I enjoyed reading the book especially the background/history parts and look forward to Systems Performance: Enterprise and the Cloud, 2nd Edition, which is out in a couple of months.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.