Evidently I struck a nerve with some people on this one, while others simply missed the entire point of the blog post. Let's revisit Swift Resistance and dig deeper into the problem.
I'm going to put my claim right up here so that it will not be missed:
Claim: The performance of DEBUG (that is, -Onone
optimization) builds of Swift can have vastly different performance characteristics depending on the type of Swift intrinsics and foundational types that are being used. Some of these choices can lead down a path where your DEBUG builds are all but useless to use during development.
I'll talk about the implications of this towards the end.
Goals
Problem Statement: Design an algorithm that fills in a buffer of pixel data with a gradient starting with green in the bottom right corner and turning into blue in the top left corner. RGB colors should be used with values in the range [0, 255].
This algorithm must be written in Swift. In addition to that, it is meant to be used in a game loop with a desired framerate of 30 FPS for the software rendered algorithm at a resolution of 960×540 (this is 1/8th the desired rate of the hardware accelerated algorithm at 1920×1080@60Hz).
Additional Information: An existing algorithm already exists in an ObjC program; we will take the algorithm from there and use that as a baseline for both performance and functionality. When rendered to screen, your image should look something like the image below:

In addition, the data must be processed sequentially; parallelization of this algorithm is not allowed.
Strategy
Given that an existing algorithm exists, the first attempt at such an algorithm should be a straight port ignoring the language features. After that algorithm is working, it would be good to explore algorithms that may be more natural to the language.
So the two approaches that will be used are:
- Use an
UnsafeMutablePointer
to create a buffer of memory that will be
manipulated. 2. Use an Array
to act as the buffer so that we are not using "unsafe" Swift
code.
ObjC Baseline
Here is the algorithm for the ObjC version of the code:
#import <Foundation/Foundation.h>
typedef struct {
uint8_t red;
uint8_t blue;
uint8_t green;
uint8_t alpha;
} Pixel;
typedef struct {
Pixel *pixels;
int width;
int height;
} RenderBuffer, *RenderBufferRef;
RenderBufferRef RenderBufferCreate(int width, int height)
{
assert(width > 0);
assert(height > 0);
RenderBufferRef buffer = malloc(sizeof(RenderBuffer));
assert(buffer);
buffer->pixels = malloc(width * height * sizeof(Pixel));
assert(buffer->pixels);
buffer->width = width;
buffer->height = height;
return buffer;
}
void RenderBufferRelease(RenderBufferRef buffer)
{
if (buffer->pixels) {
free(buffer->pixels);
}
buffer->width = 0;
buffer->height = 0;
}
void RenderGradient(RenderBufferRef buffer, int offsetX, int offsetY)
{
int offset = 0;
for (int y = 0, height = buffer->height; y < height; ++y) {
for (int x = 0, width = buffer->width; x < width; ++x) {
Pixel pixel = { 0, y + offsetY, x + offsetX, 0xFF };
buffer->pixels[offset] = pixel;
++offset;
}
}
}
int main(int argc, const char * argv[]) {
uint64_t start = mach_absolute_time();
RenderBufferRef buffer = RenderBufferCreate(960, 540);
const int NUMER_OF_ITERATIONS = 30;
for (int i = 0; i < NUMER_OF_ITERATIONS; ++i) {
RenderGradient(buffer, i, i * 2);
}
RenderBufferRelease(buffer);
uint64_t elapsed = mach_absolute_time() - start;
printf("elapsed time: %fs\n", (float)elapsed / NSEC_PER_SEC);
return 0;
}
This code was compiled as a command-line tool under two different optimization flags: -O0 and -Os. These are the default "debug" and "release" configs.
- The timing output for -O0 (debug) was: 0.099769s
- The timing output for -Os (release) was: 0.020427s
Both of these timings fall well within the target goal of 30Hz.
Swift Implementations
We already know that we are going to need to have multiple algorithms for the Swift version, so it's important to setup out test harness in a reusable way. One of the things to note about Swift is that while types and functions can be private to a file, they still collide in name usage. This means that each test implementation we write will need to be wrapped in a function.
The first thing we'll do is create a command-line tool in Swift, create the two schemes (for debug and release) to make timing easier.
Ok, let's start with what our test rig will look like:
import Foundation
let NUMBER_OF_ITERATIONS = 30
#if DEBUG
let BASELINE: Float = 0.099769
#else
let BASELINE: Float = 0.020427
#endif
func timing(samples: Int, iterations: Int, fn: (Int) -> Float) -> (avg: Float, stddev: Float, diff: Int) {
var timings = [Float](count: samples, repeatedValue: 0.0)
for s in 0..<samples {
timings[s] = fn(iterations)
}
let avg = reduce(timings, 0.0, +) / Float(samples)
let sums = reduce(timings, 0.0) { sum, x in ((x - avg) * (x - avg)) + sum }
let stddev = sqrt(sums / Float(timings.count - 1))
let diff = Int(((BASELINE - avg) / BASELINE * 100.0) + 0.5)
return (avg, stddev, diff)
}
println("Swift Rendering Tests: \(NUMBER_OF_ITERATIONS) iterations per test")
println("---------------------")
There is a simple timing
function that allows us to capture the average time across some number of samples
and some iterations
of the rendering function.
NOTE: You'll need to add a custom build flag (-D DEBUG
) to your Swift compiler options so the #if DEBUG
will match.
The Unsafe Swift Approach
The naïve approach is to simply copy and paste the ObjC code into main.swift
. Then simply make the necessary updates get it to compile.
The result is this (remember, we need to wrap everything in a closure so that the names do not collide as we add more tests that may want to use the same name for a struct but lay it out a little differently):
import Foundation
func unsafeMutablePointerTest(iterations: Int) -> Float {
struct Pixel {
var red: Byte
var green: Byte
var blue: Byte
var alpha: Byte
}
struct RenderBuffer {
var pixels: UnsafeMutablePointer<Pixel>
var width: Int
var height: Int
init(width: Int, height: Int) {
assert(width > 0)
assert(height > 0)
pixels = UnsafeMutablePointer.alloc(width * height * sizeof(Pixel))
self.width = width
self.height = height
}
mutating func release() {
pixels.dealloc(width * height * sizeof(Pixel))
width = 0
height = 0
}
}
func RenderGradient(var buffer: RenderBuffer, offsetX: Int, offsetY: Int)
{
var offset = 0
for (var y = 0, height = buffer.height; y < height; ++y) {
for (var x = 0, width = buffer.width; x < width; ++x) {
let pixel = Pixel(
red: 0,
green: Byte((y + offsetY) & 0xFF),
blue: Byte((x + offsetX) & 0xFF),
alpha: 0xFF)
buffer.pixels[offset] = pixel;
++offset;
}
}
}
let start = mach_absolute_time()
var buffer = RenderBuffer(width: 960, height: 540)
for (var i = 0; i < iterations; ++i) {
RenderGradient(buffer, i, i * 2);
}
buffer.release()
return Float(mach_absolute_time() - start) / Float(NSEC_PER_SEC)
}
Here are the timings:
- DEBUG: avg time: avg time: 0.186799s, stddev: 0.0146862s, diff: -86%
- RELEASE: avg time: 0.0223397s, stddev: 0.00101094s, diff: -8%
The timing code is here (add this to main.swift
):
let timing1 = timing(10, NUMBER_OF_ITERATIONS) { n in unsafeMutablePointerTest(n) }
println("UnsafeMutablePointer<Pixel> avg time: \(timing1.avg)s, stddev: \(timing1.stddev)s, diff: \(timing1.diff)%")
This is not looking too bad; both are well within our target rate in both configurations. However, we can see that both of these implementations are slower than the ObjC version.
Takeaway: While this implementation is slower than the ObjC version, there is nothing blocking us at this time from being able to maintain a solid 30Hz in both debug and release builds. This is great news.
The "Safe" Swift Approach
UPDATE: I made a pretty obvious (well, easy to overlook, but still should
have been obvious) and significant error in this section… of course that would
happen in a post I try to better show the issues. I used var
instead of
inout
on the buffer… which, of course, creates a copy of the buffer array
each time… yeah, it's was that bad. Ironically, it didn't affect the debug
performance, but it did help the release build.
The func RenderGradient(var buffer: RenderBuffer, offsetX: Int, offsetY: Int)
should have been defined as: func RenderGradient(inout buffer: RenderBuffer, offsetX: Int, offsetY: Int)
.
The original implementation created a copy of the buffer each time which left
us with an empty buffer outside of the function call. Not what we wanted.
Again, this mistake only had two repercussions: incorrect functionality and
a 2x regression on the release build. The debug build is still as
painfully slow.
Let me know if you spot any other mistakes.
Now, there is another way that Swift allows us to access a region of contiguous memory: arrays. After all, that is really what the semantics are. So let's give it a try:
import Foundation
func pixelArrayTest(iterations: Int) -> Float {
struct Pixel {
var red: Byte
var green: Byte
var blue: Byte
var alpha: Byte
}
struct RenderBuffer {
var pixels: [Pixel]
var width: Int
var height: Int
init(width: Int, height: Int) {
assert(width > 0)
assert(height > 0)
let pixel = Pixel(red: 0, green: 0, blue: 0, alpha: 0xFF)
pixels = [Pixel](count: width * height, repeatedValue: pixel)
self.width = width
self.height = height
}
}
func RenderGradient(inout buffer: RenderBuffer, offsetX: Int, offsetY: Int)
{
var offset = 0
for (var y = 0, height = buffer.height; y < height; ++y) {
for (var x = 0, width = buffer.width; x < width; ++x) {
let pixel = Pixel(
red: 0,
green: Byte((y + offsetY) & 0xFF),
blue: Byte((x + offsetX) & 0xFF),
alpha: 0xFF)
buffer.pixels[offset] = pixel;
++offset;
}
}
}
let start = mach_absolute_time()
var buffer = RenderBuffer(width: 960, height: 540)
for (var i = 0; i < iterations; ++i) {
RenderGradient(&buffer, i, i * 2);
}
return Float(mach_absolute_time() - start) / Float(NSEC_PER_SEC)
}
The nice thing about this change is that it was super easy to do; it was really only a handful of changes. This method has the benefit that I should not ever leak the buffer because I forgot to call dealloc
on the UnsafeMutablePointer
value.
Let's check out the timings:
- DEBUG: avg time: 27.9754s, stddev: 0.0333994s, diff: -27939%
- RELEASE: avg time: 0.0287606s, stddev: 0.00180078s, diff: -40%
What on earth just happened… 27.9s seconds to compute that loop above 30 times… this loop, right here:
int offset = 0;
for (int y = 0, height = buffer->height; y < height; ++y) {
for (int x = 0, width = buffer->width; x < width; ++x) {
Pixel pixel = { 0, y + offsetY, x + offsetX, 0xFF };
buffer->pixels[offset] = pixel;
++offset;
}
}
That's 960 * 540 (518,400) iterations. This is unacceptable. This is what my previous blog post was entirely about. There is not a SINGLE argument that you can make where you can justify this performance characteristic. Not one.
Now, had this performance been like 300% slower, I might have been able to take that and the safety justifications… maybe. At least if it was only 300% slower I'd still be in a spot where I could run my game at 30Hz with reasonable head room left over for other logic to run.
But no… we are talking about this loop taking 1 entire SECOND to compute. It was nearly 28,000% slower…
Here's a screenshot of the profile with the NUMBER_OF_ITERATIONS
dropped down to 2 (there's no way I was going to sit through another full 10 samples of 30 iterations).

Ok… now, I'm left with a few of choices:
- Say screw it and leave Swift on the table, convulsing from the seizure this
basic loop setting a value in an array just caused it. 2. Go back to the UnsafeMutablePointer
method that was back in the land of all
things sane was in, but then I get to risk all of my consumers forgetting to
call release()
. But you know… we're all adults here (in spirit at least),
we should be able to handle our own memory. And really, if I'm going to need
to resort to naked pointers, I might as well stick with C, yeah? 3. I can create another wrapper around the array so that I can use a backing
array to keep track of the memory for me, but expose an unsafe pointer to
that array. 4. Say screw it with the non-optimized builds and live in the land of crap(ier)
debugging which completely breaks the logic flow of your algorithms. Yes,
there is a time for this land, but that time is not the in the beginning of
your project when you are still prototyping, scaffolding, and shaping your
program into what it will one day be.
Of these options, #3 is the worst choice, in my opinion. This is a choice where you are explicitly said you want a "safe" array, but, in this part of the code, I'm just going to go hog wild. It exists for a reason; I'm guessing that one of those reasons is because performance sucks.
Here's the code update for that option:
buffer.pixels.withUnsafeMutableBufferPointer { (inout p: UnsafeMutableBufferPointer<Pixel>) -> () in
var offset = 0
for (var y = 0, height = buffer.height; y < height; ++y) {
for (var x = 0, width = buffer.width; x < width; ++x) {
let pixel = Pixel(
red: 0,
green: Byte((y + offsetY) & 0xFF),
blue: Byte((x + offsetX) & 0xFF),
alpha: 0xFF)
p[offset] = pixel
++offset;
}
}
}
Let's check out the timings:
- DEBUG: avg time: 1.18535s, stddev: 0.00684964s, diff: -1087%
- RELEASE: avg time: 0.0402743s, stddev: 0.0018447s, diff: -96%
Ok… at least I cannot literally go the bathroom, get a drink from the kitchen, come back to my computer and wait for another few minutes while all the iterations are going. However, it's still too slow in DEBUG mode and it's still twice as slow as the ObjC version in RELEASE mode.
Conclusion
OK, so let's be explicitly clear here: this post is not about how Swift is horrendously slow in builds you'll be giving your customers. No, it's about how terribly slow and painful your life as a developer will be trying to write any amount of Swift code that works on any reasonable amount of data inside of arrays in Swift. However, you can see that none of the Swift options are faster or even as fast as the C version. And frankly, none of them are really that much clearer… but that's a different topic.
And don't tell me to open another Swift bug; I have. I have opened many Swift bugs since last WWDC. This post is a way for me to better reflect the current state of Swift to others. It's a way to let the people at Apple see the real impact developers like myself are feeling when trying to even the most basic of things in Swift, and so that when people do run into these issues, they won't have to bang their heads against the wall trying to figure out to solve the problem.
Here is the source for both the Swift and the ObjC versions: SwiftResistance.zip. I release it all in the public domain; do whatever you want with it.