TIL: macOS Ships with an Excellent Profiler

…and a little about the impact of the D garbage collector.

Today I learned that macOS ships with an excellent profiler by the name of Instruments. Yes technically, Instruments does not ship with macOS as it requires installing Xcode. A quick trip to the App Store though and you’ve got yourself a capable sidekick for performance optimization work.

I’ve been working on a project (written in D) as a sandbox for learning what goes into a performant web framework. Until recently I had been relying solely on ab for testing whether optimizations were working but knew I’d need a profiler to dig deeper. I’d been putting off setting one up because I didn’t want to deal with another CLI tool and getting Graphviz running. Thankfully I stumbled across a System Trace option in the Services menu for an app I was using. Clicking it took me to Instruments which I hadn’t even realized was installed.

Using Instruments is easy. Open it, select a template, select a target, then hit the record button. For my work I’ve been using the Time Profiler template. For my target I compile an app using my framework, run it, and find its PID using Activity Monitor. Then I select the right process by PID using the System Process section of Instruments’ top-mounted cookie crumb navigation bar. After hitting record I use ab to exercise the app a few times then stop recording. Instruments then presents a suite of data to pour over.

In the top-most section Instruments displays a timeline for CPU usage broken down by thread within your app or in aggregate. In the section below Instruments displays typical profile data: call trees with symbols alongside absolute and percentage time in each symbol. Clicking on any symbol updates a Heaviest Stack Trace display that helps with zeroing in on your highest resource users in the call tree downstream of that symbol. Clicking on a thread in the top section filters symbols to just those called in that thread. Once done Instruments allows you to save profile data for later analysis or sharing with others.

So how did Instruments help me? So far it’s allowed me to see the impact of a free list I implemented on HttpContext, a type that I allocate and discard for each request. Free lists are supposed to assist with this sort of type by allowing reuse of objects without additional allocations/deallocations after the objects are created. They also work really well with D’s garbage collector. In D, the GC will run only during allocations. Reducing allocations with a free list directly reduces the occurrences of GC actions in places where you don’t want them. You pay for the first allocation of an object with no need to worry about GC interruption after that.

The impact of the free list was, as a percentage, spending only 25% of time allocating HttpContext as the framework did before. In absolute time however, free list vs non-free list allocation time were a negligible portion of the framework’s overall request handling time. This drilled home:

The need to measure before optimization
An extension of #2: Garbage collectors don’t need to be avoided by default. Measure to figure out if and where they affect you before trying to limit their impact in your code.

Update 2024-10-23: The test profiled above was for a sub-1ms endpoint in a single-threaded processing model with no concurrency. I decided to test using an endpoint with 50-150ms latency also in a single-threaded model but with cooperative concurrency. The hope here was to have more requests in flight simultaneously with greater chance for them to be impacted by GC collection. In this test, using my free list resulted in 1.89x more time spent in allocation/deallocation than leaving the GC to its own devices. What did not change was that the absolute time spent in both approaches was negligible compared to the run time of the test.

Share this:

Leave a comment Cancel reply