Usually, your journey will start by detecting that an allocation test regressed. So let's first start by looking at how to run the allocation counter tests.
## Running allocation counter tests
The allocation counter tests are usually run by NIO's integration test framework which lives in `IntegrationTests/`. In there, you first want to identify the test suite and test case that runs all the allocation counter tests. For the `swift-nio` repository that is
Next to the test case (`test_01_allocation_counts.sh`) that runs the allocation counter tests should be a resource directory which contains the individual allocation counter tests. For `swift-nio` that is
From here on, this document expects that you are within this directory, so you might want to start with
cd IntegrationTests/tests_04_performance/test_01_resources/
To run all the allocation tests, simply invoke the runner script in there
./run-nio-alloc-counter-tests.sh
The invocation of this script will take a while because it has to build all of SwiftNIO in release mode first, then it compiles the integration tests and runs them all, multiple times...
### Understanding the output of the allocation counter tests
The output of this script will look something like
with this kind of block repeated for each allocation counter test. Let's go through and understand each line. The first line is the name of the specific allocation test. The most relevant part is the file name (`test_future_lots_of_callbacks.swift`). For this test, we seem to be testing how many allocations futures with many callbacks are doing.
which are the aggregate values. The first line says `remaining_allocations: 0`, which means that this allocation test didn't leak, which is good! Then, we see
which means that by the end of the test, we saw 75001 allocations in total. As a rule, we usually run the workload of every allocation test 1000 times. That means you want to divide 75001 by the 1000 runs to get the number of allocations per run. In other words: each iteration of this allocation test allocated 75 times. The extra one allocation is just some noise that we have to ignore: these are usually inserted by operations in the Swift runtime that need to initialize some state. In many (especially multi-threaded test cases) there is some noise, which is why we run the workload repeatedly, which makes it easy to tell the signal from the noise.
which is literally just the exact numbers of each of the 10 runs we considered. Note, the lines are prefixed with `DEBUG:` to make analysis of the output easier, it does _not_ mean we're running in debug mode. If you see a lot of fluctuation between the runs, then the allocation test doesn't allocate deterministically which would be something you want to fix. In the example above, we're totally stable to the last byte of allocated memory, so we have nothing to worry about with this test.
## Important notes
The exact allocation counts are _not_ directly comparable between different operating systems and even different versions of the same operating system. Every new Swift version will also introduce differences (hopefully decreasing over time).
This means that if you get hit by a CI failure, you will have to first reproduce the regression locally by running the allocation counter tests once on `main` and then again on your failing branch.
Because it takes quite a while to run the whole allocation counter test suite, there are some shortcuts you definitely want to know about. First, let's have a look at how we can run just single test.
### Running a single test
Running a single test is very straightforward, pass the test's `.swift` file to the runner script. For example:
You will notice that it now runs faster but also that it still compiles SwiftNIO for every single time you run the program. So that's still rather slow. If you look closely at the output you will notice that it prints messages such as
If you change your working directory to common directory above (`/private/tmp/.nio_alloc_counter_tests_5jMMhk`), then you will find a regular Swift package there and you can just re-run the allocation counter tests using
```
cd /tmp/.nio_alloc_counter_tests_5jMMhk
swift run -c release
```
which should run very quickly. But _do not_ forget to pass `-c release`, otherwise the results will be meaningless.
Often, you will want to use external tools such as Instruments or `dtrace` to analyse why a performance test regressed. Unfortunately, that is a little harder than you might expect because to implement the allocation tests, the allocation counter test framework actually [hijacks](https://github.com/apple/swift-nio/blob/main/IntegrationTests/tests_04_performance/test_01_resources/README.md) functions like `malloc` and `free`. The counting of allocations happens in the hijacked or 'hooked' functions.
See all the zeroes? Those are all zero because the hooking is disabled. In this mode the output itself is not very useful but you can now switch to using other tools like Intruments or `dtrace` to debug your allocations.
As explained before, you can change your working directory in the temporary directory (for example `cd /private/tmp/.nio_alloc_counter_tests_0UDNXS`) and invoke `swift run -c release` from there to run the allocation counter benchmark. Note that if you disabled hooking (by passing `-- -n`) it will still print only zeroes.
### Debugging with Instruments
1. Start Instruments
2. Choose the "Allocations" instrument
3. Tap the executable selector in the toolbar
4. Click "Choose Target..."
5. Tick "Show Hidden Files"
6. Press Cmd+Shift+G
7. Enter the temporary directory path of a non-hooked version of the allocation counter tests (for example `/private/tmp/.nio_alloc_counter_tests_0UDNXS`)
8. Select `.build/release/`
9. Find the binary that is named like your allocation counter test (eg. `test_lots_of_future_callbacks`)
10. Click "Choose"
11. Press the record button
Now you should have a full Instruments allocations trace of the test run. To make sense of what's going on, switch the allocation lifespan from "Created & Persisted" to "Created & Destroyed". After that you should be able to see the type of the allocation and how many times it got allocated. Clicking on the little arrow next to the type name will reveal each individual allocation of this type and the responsible stack trace.
### Debugging with `malloc-aggregation.d` (macOS only)
SwiftNIO ships a `dtrace` script called `dev/malloc-aggregation.d` which can give you an aggregation of all allocations by stack trace. This means that you will see the stack trace of each allocation that happened and how many times this particular stack trace allocated.
test_1_reqs_1000_conn`closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:selectorFactory:initializer:)+0x106
test_1_reqs_1000_conn`partial apply for closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:selectorFactory:initializer:)+0x2d
test_1_reqs_1000_conn`thunk for @escaping@callee_guaranteed (@guaranteed NIOThread) -> ()+0xf
test_1_reqs_1000_conn`partial apply for thunk for @escaping@callee_guaranteed (@guaranteed NIOThread) -> ()+0x11
test_1_reqs_1000_conn`closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:)+0x1e4
libsystem_pthread.dylib`_pthread_start+0x94
libsystem_pthread.dylib`thread_start+0xf
231000
```
repeated many times. Each block means that the specific stack trace is responsible for some number of allocations: in the example above, `231000` allocations. Remember that the allocation counter tests run each test 11 times (1 warm up run and 10 measured runs), so this means that in each test run, the above stack trace allocated 21000 times. Also remember that we usually run the workload we want to test 1000 times. That means that in the real world, the above stack trace accounts for 21 out of the total number of allocations (231000 allocations / 11 runs / 1000 executions per run = 21).
When you are looking at allocations in the setup of the test the numbers may be split into one allocation set of 10000 and another of 1000 - for measured code vs warm up run.
The output from `malloc-aggreation.d` can also be diffed using the `stackdiff-dtrace.py` script. This can be helpful to track down where additional allocations were made. The `stdout` from `malloc-aggregation.d` for the two runs to compare should be written to files, and then passsed to `stackdiff-dtrace.py`:
Where `stack_aggregation.old` and `stack_aggregation.new` are the two outputs from `malloc-aggregation.d` to compare. The script will output the aggregated stacks which appear:
- only in `stack_aggregation.old`,
- only in `stack_aggregation.new`, and
- any differences in numbers for stacks that appear in both.
Similar stack traces will be aggregated into a headline number for matching but where different the full stack traces which made up the collection will be printed indented.
Allocations of less than 1000 in number are ignored - as the tests all run multiples of 1000 runs any values below this are almost certainly from setup in libraries we are using rather than as a consequence of our measured run.
These stacks are only *slighly* different; differing only in the name similar to `$s3NIO7Channel_pxq_q0_r1_lyypypAA15EventLoopFutureCyytGIsegnnr_Iegnr_AaB_pypypAEIegnno_Ieggo_TR`. They're otherwise the same so we can ignore them and look at more output.
Now we see there's another stacktrace in the `AFTER` section which has no corresponding stacktrace in `BEFORE`. From the stack we can see it's originating from `doRequests(group:number:)`. In this instance we were working on options applied in this function so it appears we have added allocations. We have also increased the number of allocations which are reported in the different numbers section where stack traces have been paired but with different numbers of allocations.
Unfortunately we don't have dtrace or Instruments on Linux, but there is the [heaptrack](https://github.com/KDE/heaptrack) tool which supports diff analysis of two versions.
Tested on Ubuntu 20.04 (and likely working on other distributions) Install it with:
```
sudo apt-get install heaptrack
```
Then using the instructions previously, you need to build the test without hooks such that heaptrack can hook instead, e.g.:
```
cd IntegrationTests/tests_04_performance/test_01_resources/