# Motivation
We landed the async bridge types a while back but never added allocation and performance tests. Since we expect these types to be used performance critical paths we really should cover those with tests.
# Modification
Extends the allocation counter scaffolding to support async tests. Furthermore, add allocations tests for both the writer and producer. Lastly, I a also added a performance test for the producer.
# Result
We now have baseline tests for the `NIOAsyncWriter` and `NIOAsyncSequenceProducer`
Co-authored-by: Cory Benfield <lukasa@apple.com>
* Implement a `NIOAsyncWriter`
# Motivation
We previously added the `NIOAsyncProducer` to bridge between the NIO channel pipeline and the asynchronous world. However, we still need something to bridge writes from the asynchronous world back to the NIO channel pipeline.
# Modification
This PR adds a new `NIOAsyncWriter` type that allows us to asynchronously `yield` elements to it. On the other side, we can register a `NIOAsyncWriterDelegate` which will get informed about any written elements. Furthermore, the synchronous side can toggle the writability of the `AsyncWriter` which allows it to implement flow control.
A main goal of this type is to be as performant as possible. To achieve this I did the following things:
- Make everything generic and inlinable
- Use a class with a lock instead of an actor
- Provide methods to yield a sequence of things which allows users to reduce the amount of times the lock gets acquired.
# Result
We now have the means to bridge writes from the asynchronous world to the synchronous
* Remove the completion struct and incorporate code review comments
* Fixup some refactoring leftovers
* More code review comments
* Move to holding the lock around the delegate and moved the delegate into the state machine
* Comment fixups
* More doc fixes
* Call finish when the sink deinits
* Refactor the writer to only yield Deques and rename the delegate to NIOAsyncWriterSinkDelegate
* Review
* Fix some warnings
* Fix benchmark sendability
* Remove Failure generic parameter and allow sending of an error through the Sink
* NIOPerformanceTester: Add DeadlineNowBenchmark for NIODeadline.now()
Signed-off-by: Si Beaumont <beaumont@apple.com>
* fixup: Return counter from benchmark
Signed-off-by: Si Beaumont <beaumont@apple.com>
* Use unbuffered IO for stdout in NIOPerformanceTester
Signed-off-by: Si Beaumont <beaumont@apple.com>
* CRASH PERF TESTS: Up task count to 1M to test logging change
* Revert "CRASH PERF TESTS: Up task count to 1M to test logging change"
This reverts commit d8fa50eb15.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
Protocols like Sequence have "private" hooks that can be implemented to
provide fast-paths for some operations. We missed a few on BBV, and I'd
like to add them. This is one use-case where they can help.
Modifications:
- Add an allocation-counter benchmark and a runtime benchmark for
copying BBV to Array.
Result:
We have some benchmarks.
### Motivation:
In issue https://github.com/apple/swift-nio/issues/1316, we see a large number of allocations to happen when scheduling tasks. This can definitely be optimized. This PR adds a number of baseline allocation and performance tests for both `scheduleTask` and `execute`. In the next PRs, I am going to try a few optimizations to reduce the number of allocations.
### Modifications:
Added baseline performance and allocation tests for `scheduleTask` and `execute`
Motivation:
To judge the cost of `PTHREAD_MUTEX_ERRORCHECK` as well as
`os_unfair_lock` vs `pthread_mutex_t` it's useful to have a few simple
benchmarks.
Modification:
Add lock benchmarks.
Result:
Better data.
Motivation:
Many network protocols (especially for example NFS) have quite a number
of integer values next to each other. In NIO, you'd normally parse/write
them with multiple read/writeInteger calls.
Unfortunately, that's a bit wasteful because we're checking the bounds
as well as the CoW state every time.
Modifications:
- Provide read/writeMultipleIntegers for up to 15 FixedWidthIntegers.
- Benchmarks
Result:
Faster code. For 10 UInt32s, this is a 5x performance win on my machine,
see benchmarks.
Motivation:
To justify performance changes we need to measure the code being
changed. We believe that `HTTPHeaders.subscript(canonicalForm:)` is a
little slow.
Modifications:
- Add allocation and performance tests for fetching header values in
their canonical form
Results:
More benchmarks!
Motivation:
The remaining NIO code really conceptually belongs in a module called
NIOPosix, and NIOCore should really be called NIO. We can't really do
that last step, but we can prepare by pushing the bulk of the remaining
code into a module called NIOPosix.
Modifications:
- Move NIO to NIOPosix
- Make NIO an umbrella module.
Result:
NIOPosix exists.
Motivation:
As we've largely completed our move to split out our core abstractions,
we now have an opportunity to clean up our dependencies and imports. We
should arrange for everything to only import NIO if it actually needs
it, and to correctly express dependencies on NIOCore and NIOEmbedded
where they exist.
We aren't yet splitting out tests that only test functionality in
NIOCore, that will follow in a separate patch.
Modifications:
- Fixed up imports
- Made sure our protocols only require NIOCore.
Result:
Better expression of dependencies.
Co-authored-by: George Barnett <gbarnett@apple.com>
* Add performance test for web socket client random request key
* use `reduce` function to minimise memory consumption of the test
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
In #1733 I accidentally commented out a perf benchmark (who knows why).
Modification:
Bring it back.
Result:
Benchmark is back.
Co-authored-by: Cory Benfield <lukasa@apple.com>
Motivation:
B2MD called out to the decoder's `shouldReclaimBytes` after every
parsing attempt, even if the parser said `.continue`.
That's quite pointless because we won't add any bytes into the buffer
before we're trying the parser again.
Modifications:
Only ask the decoder if we should reclaim bytes if the decoder actually
says `.needMoreData`.
Result:
Faster, better, more sensible.
Motivation:
I'm sick of typing `.init(major: 1, minor: 1)`.
Modifications:
- Added static vars for common HTTP versions.
Result:
Maybe I'll never type `.init(major: 1, minor: 1)` ever again.
Motivation:
Many other systems don't like non ascii characters.
Modifications:
Change a letter 'a' to a letter 'a' but with a more normal encoding.
Result:
Test names should be in ascii
Motivation:
In all contemporary Swift versions, the `class` and the `AnyObject`
protocol restriction is the same. And `class` is deprecated which warns
on newer Swift compilers.
Modifications:
Replace `class` with `AnyObject`.
Result:
No warnings on newer Swift compilers.
Motivation:
The names of most of the BSD socket option values were brought over into
their NIO helper properties, with their namespace removed. This vastly
increases the risk of collision, particularly for things like TCP_INFO,
which just became .info.
Modifications:
- Added the prefixes back, e.g. `.info` is now `.tcp_info`.
- Renamed socket type `.dgram` to `.datagram`, as this is the nice clean
API and we should use nice clean names.
Result:
Better APIs
* NIO: Add new `BSDSocket` namespace
This starts the split of the POSIX/Linux/Darwin/BSD interfaces and the
BSD Socket interfaces in order to support Windows. The constants on
Windows are not part of the C standard library and need to be explicitly
prefixed. Use the import by name to avoid the `#if` conditions on the
wrapped versions. This will allow us to cleanly cleave the dependency
on the underlying C library.
This change adds a `BSDSocket.OptionLevel` and `BSDSocket.Option` types
which allow us to get proper enumerations for these values and pipe them
throughout the NIO codebase.
* Apply suggestions from code review
Co-Authored-By: Cory Benfield <lukasa@apple.com>
Motivation:
getInteger used the normal, generic implementation for UInt8s, however
they can be made considerably faster by using a specialised method.
ByteBufferIterator was also affected by this.
Modifications:
Use a specialised version for UInt8s for getInteger.
Result:
- ByteBufferView can be iterated about 20% faster.
Motivation:
In Swift, writing
var something: T?
init() {
self.something = someValue
}
means that the compiler will first set `self.something` to `nil` and
then in the init override it with `self.someValue`
(https://bugs.swift.org/browse/SR-11777). Unfortunately, because of
https://bugs.swift.org/browse/SR-11768 , stored property initialisation
cannot be made `@inlinable` (short of using `@frozen` which isn't
available in Swift 5.0).
The combination of SR-11768 and SR-11777 leads to `var something: T?`
having much worse code than `var something: Optional<T>` iff the `init`
is `public` and `@inlinable`.
Modifications:
Change all `var something: T?` to `var something: Optional<T>`
Result:
Faster code, sad NIO developers.
Motivation:
It's faster and contiguous storage is trivially available
Modifications:
- Implement WebSocketMaskingKey.withContiguousStorageIfAvailable
- Make it @inlinable, requiring promoting WebSocketMaskingKey.key to internal
so renamed to _key.
Result:
Writing WebSocketMaskingKey into ByteBuffer is about 2x faster.
Motivation:
We want to have some extra WebSocket decoder benchmarks, for
a WebSocketFrame with a maskingKey, a WebSocketFrame with a 16-bit
length and a WebSocketFrame with a 64-bit length
Modifications:
Changed WebSocketFrameDecoderBenchmark to have ability for dynamic
WebSocketFrame length through the initializer parameter dataSize.
Added ability for WebSocketFrameDecoderBenchmark to benchmark
a WebSocketFrame with a maskingKey by passing withMaskKey parameter to
the initializer.
Added new benchmarks in the main for WebSocketFrame decoding with a
16-bit length, WebSocketFrame decoding with a 64-bit length and with
a maskingKey.
Result:
Better WebSocket decoder benchmarks
Motivation:
When calling ByteBuffer setBytes/writeBytes with CircularBuffer<Uint8> we take the slow path. Before we make any changes we want a baseline for current performance.
Modifications:
- This gives a very significant performance increase for CircularBuffer
and likely other types that use the slow path.
- Adds ByteBuferBenchmark.swift and tests circular_buffer_into_byte_buffer_1kb, circular_buffer_into_byte_buffer_1mb.
- Make _setSlowPath @inlinable
- Implement == for CircularBuffer.Index to make it @inlinable
Result:
- Extra performance tests
- Improve performance
Motivation:
Previously, NIOPerformanceTester would use Foundation.Date twice to
calculate the time spend in a given function. This is less precise as it
can be but more importantly, it's not resilient against the user
changing the clock or the computer going to sleep.
Modifications:
Use DispatchTime.now().uptimeNanoseconds
Result:
More precise and resilient benchmark measurements.
Motivation:
It's important to know how websocket frame decoder perform.
Modification:
- Wrote simple benchmark for WebSocket frame decoding.
Result:
Better insight
Motivation:
A dark force walks abroad our lands. In a place where well-behaved
processes close their blinds, lock their doors, and try not to make
noise late at night, processes that shine just a little too brightly,
consume just a little too much RAM, attract the attention of a dark
force whose name is spoken only in whispers. Some call this force a
myth, but others say that the deniers of this force have whispered dark
incantations, invoking the power word "oom_adj".
It is time we intervened to keep the law-abiding processes in our CI
systems safe.
Modifications:
- Prevent the websocket performance tests comnsuming 4GB of RAM.
Result:
Peace will reign.
Motivation:
To improve the performance of a given component it is very helpful to
know how it performs! This patch adds benchmarks and allocation counter
tests to NIO for websocket frame encoding.
Modifications:
- Wrote some benchmarks for WebSocket frame encoding.
- Wrote some allocation counter tests for WebSocket frame encoding.
- Extend the allocation counter tests to properly account for sub-tests
Result:
Better insight
Motivation:
@ravikandhadai suggested some minor changes in #1117 which we should
apply.
Modifications:
Apply all suggested changes all over the code base.
Result:
- fixes#1117
Motivation:
As #1054 shows, we have some low-hanging performance fruit regarding
off-EventLoop future reduces. We should quantify those.
Modifications:
Add benchmarks to quantify the potential wins.
Result:
More benchmarks.
Motivation:
Networking software like SwiftNIO that always has explicit flushes
usually does not benefit from having TCP_NODELAY switched off. The
benefits of having it turned on are usually quite substantial and yet we
forced our users for the longest time to enable it manually.
Quite a bit of engineering time has been lost finding performance
problems and it turns out switching TCP_NODELAY on solves them
magically.
Netty has made the switch to TCP_NODELAY on by default, SwiftNIO should
follow.
Modifications:
Enable TCP_NODELAY by default.
Result:
If the user forgot to enable TCP_NODELAY, their software should now be
faster.
* Significantly improve the performance of `EventLoopFuture.{and,when}AllSucceed`.
* PR improvements.
* Add the fast track for `AllComplete` as well.
* Use `future.inEventLoop` instead of pointer comparison.
Motivation:
`ctx` was always an abbreviation was 'context` and in Swift we don't
really use abbreviations, so let's fix it.
Modifications:
- rename all instances of `ctx` to `context`
Result:
- fixes#483
Motivation:
- `ChannelPipeline.add(name:handler:...)` had a strange order of arguments
- `remove(handler:)` and `remove(ctx:)` both remove `ChannelHandler`s
but they read like they remove different things
So let's just fix the argument order and name them `addHandler` and
`removeHandler` making clear what they do.
Modifications:
- rename all `ChannelPipeline.add(name:handler:...)`s to `ChannelPipeline.addHandler(_:name:...)`
- rename all `ChannelPipeline.remove(...)`s to `ChannelPipeline.removeHandler(...)`
Result:
more readable and consistent code
Motivation:
ByteBuffer methods like `set(string:)` never felt very Swift-like and
also didn't look the same as their counterparts like `getString(...)`.
Modifications:
- rename all `ByteBuffer.set/write(<type>:,...)` methods to
`ByteBuffer.set/write<Type>(...)`
- polyfill the old spellings in `_NIO1APIShims`
Result:
code more Swift-like
Motivation:
The `ELF.cascade` methods have a parameter label `promise` that does not match Swift API Guidelines, and a way to cascade just successes is not available - while for failures there is.
Modifications:
`ELF.cascade*` methods that already exist have had their `promise` label renamed to `to`, and a new `ELF.cascadeSuccess` method has been added.
Result:
EventLoopFuture now has the cascade methods `ELF.cascade(to:)`, `ELF.cascadeFailure(to:)`, and `ELF.cascadeSuccess(to:)`
Motivation:
EventLoopPromise.succeed/fail has extraneous labels that don't add
anything but noise.
Modifications:
remove extraneous labels
Result:
less noise in the code
Motivation:
ELF's API should be as close as possible to the new Result's API.
Therefore, we should rename `then` to `flatMap`
Modifications:
- renamed `then` to `flatMap`
- renamed `thenIfError` to `flatMapError`
- renamed ELF's generic parameter from `T` to `Value`
Result:
- more like Result
- fixes#688
Motivation:
Swift naming guidelines mandate that factory methods start with `make`,
like `makeSomething`. We had a few that were `newSomething`.
Modifications:
make all factories start with make
Result:
more compliant to Swift naming rules
Motivation:
Writing Strings to ByteBuffers is important, let's have some more
benchmarks.
Modifications:
add String to ByteBuffer benchmarks
Result:
more benchmarks
Motivation:
We should be consistent with naming and also choose descriptive names.
Modifications:
- Deprecate old init method that uses numThreads
- Add new init with numberOfThreads param name
- Everywhere use the new init
Result:
More consistent and descriptive naming. Fixes https://github.com/apple/swift-nio/issues/432.